Complete Kohya LoRA conversion for Qwen and Z-Image by dxqb · Pull Request #14080 · huggingface/diffusers

dxqb · 2026-06-27T10:30:34Z

What does this PR do?

This PR adds to the already implement kohya-to-diffusers conversion code some missing layers.
These are layers that mostly live outside the transformer blocks, but also one inside the transformer block.

I guess these weren't initially included because kohya-ss/musubi-tuner doesn't train them by default, but you can train them, with kohya-ss/musubi-tuner and other trainers that output the kohya format.

Details:

Qwen — top-level (non-block) modules
convert_key assumes every key lives under transformer_blocks and strips/re-prepends
that prefix. The six top-level modules (img_in, txt_in, proj_out,
norm_out.linear, time_text_embed.timestep_embedder.linear_1/2) collapse onto each
other and trip the state_dict should be empty check. They're now resolved via an
explicit flattened→dotted map before the block logic, preserving the
.lora_down/.lora_up/.alpha suffix.

Z-Image — module names that contain underscores
The blanket _→. split over-splits modules whose own names contain underscores
(all_final_layer, all_x_embedder, adaLN_modulation, cap_embedder, t_embedder),
so they arrive as all.final.layer, adaLN.modulation, … and fail with "unexpected
keys". The existing dotted→underscore post-normalization is extended to re-merge these
names (it runs on the full key, so .lora_A/B and .alpha are handled alike).

Who can review?

PEFT: @sayakpaul @BenjaminBossan

…erscores _convert_non_diffusers_z_image_lora_to_diffusers reverses Kohya's `.`->`_` flattening with a blanket `_`->`.` split, guarded only by a small protected-n-gram list (attention to_q/k/v/out, feed_forward) plus post-hoc fixes for context_refiner/noise_refiner. Z-Image's other modules whose names contain underscores were over-split: all_final_layer, all_x_embedder, adaLN_modulation, cap_embedder and t_embedder came out as all.final.layer, adaLN.modulation, ... and failed to load with "unexpected keys". Extend the existing dot->underscore post-normalization to re-merge these names, so Kohya (lora_unet_) Z-Image LoRAs load. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

_convert_non_diffusers_qwen_lora_to_diffusers's convert_key hardcodes the transformer_blocks prefix and assumes every lora_unet_ key lives under a block: it strips a transformer_blocks_ prefix and re-prepends transformer_blocks., which collapses the top-level modules (img_in, txt_in, proj_out, norm_out.linear, time_text_embed.timestep_embedder.linear_1/2) onto each other. They end up as transformer_blocks..weight / ...a.down.weight and trip the 'state_dict should be empty' guard. Resolve these six modules via an explicit flattened->dotted map before the block logic runs, preserving the .lora_down/.lora_up/.alpha suffix, so Kohya (lora_unet_) Qwen LoRAs load. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

dxqb and others added 2 commits June 27, 2026 10:01

github-actions Bot added size/S PR with diff < 50 LOC lora labels Jun 27, 2026

dxqb mentioned this pull request Jun 27, 2026

Fix Kohya UNet LoRA key conversion for conv_in/conv_out/time_embedding #14006

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Complete Kohya LoRA conversion for Qwen and Z-Image#14080

Complete Kohya LoRA conversion for Qwen and Z-Image#14080
dxqb wants to merge 2 commits into
huggingface:mainfrom
dxqb:fix-zimage-kohya-lora-conversion

dxqb commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dxqb commented Jun 27, 2026

What does this PR do?

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant