Commit 67e69cb
fix: compute image_seq_len from spatial dims, not channel dim in Lumina2 pipeline
Fixes #12913
`image_seq_len` was computed as `latents.shape[1]`, which gives the
channel dimension (e.g. 16) since Lumina2 latents have shape
`(batch, channels, height, width)` and are NOT packed/reshaped before
this point. The Lumina2 transformer internally patchifies the latents
with `patch_size=2`, so the correct spatial sequence length is
`(H // patch_size) * (W // patch_size)`.
This incorrect value was passed to `calculate_shift()`, which computes
the `mu` parameter for the flow-matching scheduler. Using channel count
instead of token count produces a completely wrong shift, degrading
generation quality.
The fix reads `patch_size` from `self.transformer.config.patch_size` and
computes `image_seq_len` from the last two (spatial) dimensions of the
latents tensor, matching how the transformer itself computes its input
sequence length.
For reference, the Flux pipeline correctly uses `latents.shape[1]`
because Flux latents are pre-packed into `(batch, seq_len, channels)`
before this computation. Lumina2 does not pre-pack, so the same indexing
does not apply.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent e5aa719 commit 67e69cb
File tree
2 files changed
+64
-1
lines changed- src/diffusers/pipelines/lumina2
- tests/pipelines/lumina2
2 files changed
+64
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
696 | 696 | | |
697 | 697 | | |
698 | 698 | | |
699 | | - | |
| 699 | + | |
| 700 | + | |
700 | 701 | | |
701 | 702 | | |
702 | 703 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
102 | 164 | | |
103 | 165 | | |
104 | 166 | | |
| |||
0 commit comments