Skip to content

Complete Kohya LoRA conversion for Qwen and Z-Image#14080

Open
dxqb wants to merge 2 commits into
huggingface:mainfrom
dxqb:fix-zimage-kohya-lora-conversion
Open

Complete Kohya LoRA conversion for Qwen and Z-Image#14080
dxqb wants to merge 2 commits into
huggingface:mainfrom
dxqb:fix-zimage-kohya-lora-conversion

Conversation

@dxqb

@dxqb dxqb commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

This PR adds to the already implement kohya-to-diffusers conversion code some missing layers.
These are layers that mostly live outside the transformer blocks, but also one inside the transformer block.

I guess these weren't initially included because kohya-ss/musubi-tuner doesn't train them by default, but you can train them, with kohya-ss/musubi-tuner and other trainers that output the kohya format.

Details:

Qwen — top-level (non-block) modules
convert_key assumes every key lives under transformer_blocks and strips/re-prepends
that prefix. The six top-level modules (img_in, txt_in, proj_out,
norm_out.linear, time_text_embed.timestep_embedder.linear_1/2) collapse onto each
other and trip the state_dict should be empty check. They're now resolved via an
explicit flattened→dotted map before the block logic, preserving the
.lora_down/.lora_up/.alpha suffix.

Z-Image — module names that contain underscores
The blanket _. split over-splits modules whose own names contain underscores
(all_final_layer, all_x_embedder, adaLN_modulation, cap_embedder, t_embedder),
so they arrive as all.final.layer, adaLN.modulation, … and fail with "unexpected
keys". The existing dotted→underscore post-normalization is extended to re-merge these
names (it runs on the full key, so .lora_A/B and .alpha are handled alike).

Who can review?

dxqb and others added 2 commits June 27, 2026 10:01
…erscores

_convert_non_diffusers_z_image_lora_to_diffusers reverses Kohya's `.`->`_`
flattening with a blanket `_`->`.` split, guarded only by a small
protected-n-gram list (attention to_q/k/v/out, feed_forward) plus post-hoc
fixes for context_refiner/noise_refiner. Z-Image's other modules whose names
contain underscores were over-split: all_final_layer, all_x_embedder,
adaLN_modulation, cap_embedder and t_embedder came out as all.final.layer,
adaLN.modulation, ... and failed to load with "unexpected keys".

Extend the existing dot->underscore post-normalization to re-merge these
names, so Kohya (lora_unet_) Z-Image LoRAs load.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
_convert_non_diffusers_qwen_lora_to_diffusers's convert_key hardcodes the
transformer_blocks prefix and assumes every lora_unet_ key lives under a block:
it strips a transformer_blocks_ prefix and re-prepends transformer_blocks.,
which collapses the top-level modules (img_in, txt_in, proj_out, norm_out.linear,
time_text_embed.timestep_embedder.linear_1/2) onto each other. They end up as
transformer_blocks..weight / ...a.down.weight and trip the 'state_dict should be
empty' guard.

Resolve these six modules via an explicit flattened->dotted map before the block
logic runs, preserving the .lora_down/.lora_up/.alpha suffix, so Kohya (lora_unet_)
Qwen LoRAs load.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lora size/S PR with diff < 50 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant