Skip to content

Commit 661d2b1

Browse files
authored
Merge branch 'main' into device-map-direct
2 parents 59ac2f3 + b0dc51d commit 661d2b1

File tree

89 files changed

+4990
-2432
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+4990
-2432
lines changed

docs/source/en/modular_diffusers/guiders.md

Lines changed: 1 addition & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -89,29 +89,15 @@ t2i_pipeline.guider
8989

9090
## Changing guider parameters
9191

92-
The guider parameters can be adjusted with either the [`~ComponentSpec.create`] method or with [`~ModularPipeline.update_components`]. The example below changes the `guidance_scale` value.
92+
The guider parameters can be adjusted with the [`~ComponentSpec.create`] method and [`~ModularPipeline.update_components`]. The example below changes the `guidance_scale` value.
9393

94-
<hfoptions id="switch">
95-
<hfoption id="create">
9694

9795
```py
9896
guider_spec = t2i_pipeline.get_component_spec("guider")
9997
guider = guider_spec.create(guidance_scale=10)
10098
t2i_pipeline.update_components(guider=guider)
10199
```
102100

103-
</hfoption>
104-
<hfoption id="update_components">
105-
106-
```py
107-
guider_spec = t2i_pipeline.get_component_spec("guider")
108-
guider_spec.config["guidance_scale"] = 10
109-
t2i_pipeline.update_components(guider=guider_spec)
110-
```
111-
112-
</hfoption>
113-
</hfoptions>
114-
115101
## Uploading custom guiders
116102

117103
Call the [`~utils.PushToHubMixin.push_to_hub`] method on a custom guider to share it to the Hub.

docs/source/en/modular_diffusers/modular_diffusers_states.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,7 @@ This guide explains how states work and how they connect blocks.
2525

2626
The [`~modular_pipelines.PipelineState`] is a global state container for all blocks. It maintains the complete runtime state of the pipeline and provides a structured way for blocks to read from and write to shared data.
2727

28-
There are two dict's in [`~modular_pipelines.PipelineState`] for structuring data.
29-
30-
- The `values` dict is a **mutable** state containing a copy of user provided input values and intermediate output values generated by blocks. If a block modifies an `input`, it will be reflected in the `values` dict after calling `set_block_state`.
28+
[`~modular_pipelines.PipelineState`] stores all data in a `values` dict, which is a **mutable** state containing user provided input values and intermediate output values generated by blocks. If a block modifies an `input`, it will be reflected in the `values` dict after calling `set_block_state`.
3129

3230
```py
3331
PipelineState(

docs/source/en/modular_diffusers/modular_pipeline.md

Lines changed: 242 additions & 173 deletions
Large diffs are not rendered by default.

docs/source/en/modular_diffusers/pipeline_block.md

Lines changed: 115 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -25,81 +25,151 @@ This guide will show you how to create a [`~modular_pipelines.ModularPipelineBlo
2525
2626
A [`~modular_pipelines.ModularPipelineBlocks`] requires `inputs`, and `intermediate_outputs`.
2727

28-
- `inputs` are values provided by a user and retrieved from the [`~modular_pipelines.PipelineState`]. This is useful because some workflows resize an image, but the original image is still required. The [`~modular_pipelines.PipelineState`] maintains the original image.
28+
- `inputs` are values a block reads from the [`~modular_pipelines.PipelineState`] to perform its computation. These can be values provided by a user (like a prompt or image) or values produced by a previous block (like encoded `image_latents`).
2929

3030
Use `InputParam` to define `inputs`.
3131

32-
```py
33-
from diffusers.modular_pipelines import InputParam
34-
35-
user_inputs = [
36-
InputParam(name="image", type_hint="PIL.Image", description="raw input image to process")
37-
]
38-
```
32+
```py
33+
class ImageEncodeStep(ModularPipelineBlocks):
34+
...
35+
36+
@property
37+
def inputs(self):
38+
return [
39+
InputParam(name="image", type_hint="PIL.Image", required=True, description="raw input image to process"),
40+
]
41+
...
42+
```
3943

4044
- `intermediate_outputs` are new values created by a block and added to the [`~modular_pipelines.PipelineState`]. The `intermediate_outputs` are available as `inputs` for subsequent blocks or available as the final output from running the pipeline.
4145

4246
Use `OutputParam` to define `intermediate_outputs`.
4347

44-
```py
45-
from diffusers.modular_pipelines import OutputParam
48+
```py
49+
class ImageEncodeStep(ModularPipelineBlocks):
50+
...
4651

47-
user_intermediate_outputs = [
48-
OutputParam(name="image_latents", description="latents representing the image")
49-
]
50-
```
52+
@property
53+
def intermediate_outputs(self):
54+
return [
55+
OutputParam(name="image_latents", description="latents representing the image"),
56+
]
57+
58+
...
59+
```
5160

5261
The intermediate inputs and outputs share data to connect blocks. They are accessible at any point, allowing you to track the workflow's progress.
5362

63+
## Components and configs
64+
65+
The components and pipeline-level configs a block needs are specified in [`ComponentSpec`] and [`~modular_pipelines.ConfigSpec`].
66+
67+
- [`ComponentSpec`] contains the expected components used by a block. You need the `name` of the component and ideally a `type_hint` that specifies exactly what the component is.
68+
- [`~modular_pipelines.ConfigSpec`] contains pipeline-level settings that control behavior across all blocks.
69+
70+
```py
71+
class ImageEncodeStep(ModularPipelineBlocks):
72+
...
73+
74+
@property
75+
def expected_components(self):
76+
return [
77+
ComponentSpec(name="vae", type_hint=AutoencoderKL),
78+
]
79+
80+
@property
81+
def expected_configs(self):
82+
return [
83+
ConfigSpec("force_zeros_for_empty_prompt", True),
84+
]
85+
86+
...
87+
```
88+
89+
When the blocks are converted into a pipeline, the components become available to the block as the first argument in `__call__`.
90+
5491
## Computation logic
5592

5693
The computation a block performs is defined in the `__call__` method and it follows a specific structure.
5794

58-
1. Retrieve the [`~modular_pipelines.BlockState`] to get a local view of the `inputs`
95+
1. Retrieve the [`~modular_pipelines.BlockState`] to get a local view of the `inputs`.
5996
2. Implement the computation logic on the `inputs`.
6097
3. Update [`~modular_pipelines.PipelineState`] to push changes from the local [`~modular_pipelines.BlockState`] back to the global [`~modular_pipelines.PipelineState`].
6198
4. Return the components and state which becomes available to the next block.
6299

63100
```py
64-
def __call__(self, components, state):
65-
# Get a local view of the state variables this block needs
66-
block_state = self.get_block_state(state)
101+
class ImageEncodeStep(ModularPipelineBlocks):
102+
103+
def __call__(self, components, state):
104+
# Get a local view of the state variables this block needs
105+
block_state = self.get_block_state(state)
67106

68-
# Your computation logic here
69-
# block_state contains all your inputs
70-
# Access them like: block_state.image, block_state.processed_image
107+
# Your computation logic here
108+
# block_state contains all your inputs
109+
# Access them like: block_state.image, block_state.processed_image
71110

72-
# Update the pipeline state with your updated block_states
73-
self.set_block_state(state, block_state)
74-
return components, state
111+
# Update the pipeline state with your updated block_states
112+
self.set_block_state(state, block_state)
113+
return components, state
75114
```
76115

77-
### Components and configs
116+
## Putting it all together
78117

79-
The components and pipeline-level configs a block needs are specified in [`ComponentSpec`] and [`~modular_pipelines.ConfigSpec`].
118+
Here is the complete block with all the pieces connected.
80119

81-
- [`ComponentSpec`] contains the expected components used by a block. You need the `name` of the component and ideally a `type_hint` that specifies exactly what the component is.
82-
- [`~modular_pipelines.ConfigSpec`] contains pipeline-level settings that control behavior across all blocks.
120+
```py
121+
from diffusers import ComponentSpec, AutoencoderKL
122+
from diffusers.modular_pipelines import InputParam, ModularPipelineBlocks, OutputParam
123+
124+
125+
class ImageEncodeStep(ModularPipelineBlocks):
126+
127+
@property
128+
def description(self):
129+
return "Encode an image into latent space."
130+
131+
@property
132+
def expected_components(self):
133+
return [
134+
ComponentSpec(name="vae", type_hint=AutoencoderKL),
135+
]
136+
137+
@property
138+
def inputs(self):
139+
return [
140+
InputParam(name="image", type_hint="PIL.Image", required=True, description="raw input image to process"),
141+
]
142+
143+
@property
144+
def intermediate_outputs(self):
145+
return [
146+
OutputParam(name="image_latents", type_hint="torch.Tensor", description="latents representing the image"),
147+
]
148+
149+
def __call__(self, components, state):
150+
block_state = self.get_block_state(state)
151+
block_state.image_latents = components.vae.encode(block_state.image)
152+
self.set_block_state(state, block_state)
153+
return components, state
154+
```
155+
156+
Every block has a `doc` property that is automatically generated from the properties you defined above. It provides a summary of the block's description, components, inputs, and outputs.
83157

84158
```py
85-
from diffusers import ComponentSpec, ConfigSpec
159+
block = ImageEncoderStep()
160+
print(block.doc)
161+
class ImageEncodeStep
86162

87-
expected_components = [
88-
ComponentSpec(name="unet", type_hint=UNet2DConditionModel),
89-
ComponentSpec(name="scheduler", type_hint=EulerDiscreteScheduler)
90-
]
163+
Encode an image into latent space.
91164

92-
expected_config = [
93-
ConfigSpec("force_zeros_for_empty_prompt", True)
94-
]
95-
```
165+
Components:
166+
vae (`AutoencoderKL`)
96167

97-
When the blocks are converted into a pipeline, the components become available to the block as the first argument in `__call__`.
168+
Inputs:
169+
image (`PIL.Image`):
170+
raw input image to process
98171

99-
```py
100-
def __call__(self, components, state):
101-
# Access components using dot notation
102-
unet = components.unet
103-
vae = components.vae
104-
scheduler = components.scheduler
105-
```
172+
Outputs:
173+
image_latents (`torch.Tensor`):
174+
latents representing the image
175+
```

docs/source/en/modular_diffusers/quickstart.md

Lines changed: 52 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -39,17 +39,44 @@ image
3939
[`~ModularPipeline.from_pretrained`] uses lazy loading - it reads the configuration to learn where to load each component from, but doesn't actually load the model weights until you call [`~ModularPipeline.load_components`]. This gives you control over when and how components are loaded.
4040

4141
> [!TIP]
42-
> [`ComponentsManager`] with `enable_auto_cpu_offload` automatically moves models between CPU and GPU as needed, reducing memory usage for large models like Qwen-Image. Learn more in the [ComponentsManager](./components_manager) guide.
42+
> `ComponentsManager` with `enable_auto_cpu_offload` automatically moves models between CPU and GPU as needed, reducing memory usage for large models like Qwen-Image. Learn more in the [ComponentsManager](./components_manager) guide.
43+
>
44+
> If you don't need offloading, remove the `components_manager` argument and move the pipeline to your device manually with `to("cuda")`.
4345
4446
Learn more about creating and loading pipelines in the [Creating a pipeline](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#creating-a-pipeline) and [Loading components](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#loading-components) guides.
4547

4648
## Understand the structure
4749

48-
A [`ModularPipeline`] has two parts:
49-
- **State**: the loaded components (models, schedulers, processors) and configuration
50-
- **Definition**: the [`ModularPipelineBlocks`] that specify inputs, outputs, expected components and computation logic
50+
A [`ModularPipeline`] has two parts: a **definition** (the blocks) and a **state** (the loaded components and configs).
5151

52-
The blocks define *what* the pipeline does. Access them through `pipe.blocks`.
52+
Print the pipeline to see its state — the components and their loading status and configuration.
53+
```py
54+
print(pipe)
55+
```
56+
```
57+
QwenImageModularPipeline {
58+
"_blocks_class_name": "QwenImageAutoBlocks",
59+
"_class_name": "QwenImageModularPipeline",
60+
"_diffusers_version": "0.37.0.dev0",
61+
"transformer": [
62+
"diffusers",
63+
"QwenImageTransformer2DModel",
64+
{
65+
"pretrained_model_name_or_path": "Qwen/Qwen-Image",
66+
"revision": null,
67+
"subfolder": "transformer",
68+
"type_hint": [
69+
"diffusers",
70+
"QwenImageTransformer2DModel"
71+
],
72+
"variant": null
73+
}
74+
],
75+
...
76+
}
77+
```
78+
79+
Access the definition through `pipe.blocks` — this is the [`~modular_pipelines.ModularPipelineBlocks`] that defines the pipeline's workflows, inputs, outputs, and computation logic.
5380
```py
5481
print(pipe.blocks)
5582
```
@@ -87,7 +114,8 @@ The output returns:
87114

88115
### Workflows
89116

90-
`QwenImageAutoBlocks` is a [`ConditionalPipelineBlocks`], so this pipeline supports multiple workflows and adapts its behavior based on the inputs you provide. For example, if you pass `image` to the pipeline, it runs an image-to-image workflow instead of text-to-image. Let's see this in action with an example.
117+
This pipeline supports multiple workflows and adapts its behavior based on the inputs you provide. For example, if you pass `image` to the pipeline, it runs an image-to-image workflow instead of text-to-image. Learn more about how this works under the hood in the [AutoPipelineBlocks](https://huggingface.co/docs/diffusers/modular_diffusers/auto_pipeline_blocks) guide.
118+
91119
```py
92120
from diffusers.utils import load_image
93121

@@ -99,20 +127,21 @@ image = pipe(
99127
).images[0]
100128
```
101129

102-
Use `get_workflow()` to extract the blocks for a specific workflow. Pass the workflow name (e.g., `"image2image"`, `"inpainting"`, `"controlnet_text2image"`) to get only the blocks relevant to that workflow.
130+
Use `get_workflow()` to extract the blocks for a specific workflow. Pass the workflow name (e.g., `"image2image"`, `"inpainting"`, `"controlnet_text2image"`) to get only the blocks relevant to that workflow. This is useful when you want to customize or debug a specific workflow. You can check `pipe.blocks.available_workflows` to see all available workflows.
103131
```py
104132
img2img_blocks = pipe.blocks.get_workflow("image2image")
105133
```
106134

107-
Conditional blocks are convenient for users, but their conditional logic adds complexity when customizing or debugging. Extracting a workflow gives you the specific blocks relevant to your workflow, making it easier to work with. Learn more in the [AutoPipelineBlocks](https://huggingface.co/docs/diffusers/modular_diffusers/auto_pipeline_blocks) guide.
108135

109136
### Sub-blocks
110137

111138
Blocks can contain other blocks. `pipe.blocks` gives you the top-level block definition (here, `QwenImageAutoBlocks`), while `sub_blocks` lets you access the smaller blocks inside it.
112139

113-
`QwenImageAutoBlocks` is composed of: `text_encoder`, `vae_encoder`, `controlnet_vae_encoder`, `denoise`, and `decode`. Access them through the `sub_blocks` property.
140+
`QwenImageAutoBlocks` is composed of: `text_encoder`, `vae_encoder`, `controlnet_vae_encoder`, `denoise`, and `decode`.
114141

115-
The `doc` property is useful for seeing the full documentation of any block, including its inputs, outputs, and components.
142+
These sub-blocks run one after another and data flows linearly from one block to the next — each block's `intermediate_outputs` become available as `inputs` to the next block. This is how [`SequentialPipelineBlocks`](./sequential_pipeline_blocks) work.
143+
144+
You can access them through the `sub_blocks` property. The `doc` property is useful for seeing the full documentation of any block, including its inputs, outputs, and components.
116145
```py
117146
vae_encoder_block = pipe.blocks.sub_blocks["vae_encoder"]
118147
print(vae_encoder_block.doc)
@@ -165,7 +194,7 @@ class CannyBlock
165194
Canny map for input image
166195
```
167196

168-
UUse `get_workflow` to extract the ControlNet workflow from [`QwenImageAutoBlocks`].
197+
Use `get_workflow` to extract the ControlNet workflow from [`QwenImageAutoBlocks`].
169198
```py
170199
# Get the controlnet workflow that we want to work with
171200
blocks = pipe.blocks.get_workflow("controlnet_text2image")
@@ -182,9 +211,8 @@ class SequentialPipelineBlocks
182211
...
183212
```
184213

185-
The extracted workflow is a [`SequentialPipelineBlocks`](./sequential_pipeline_blocks) - a multi-block type where blocks run one after another and data flows linearly from one block to the next. Each block's `intermediate_outputs` become available as `inputs` to subsequent blocks.
186214

187-
Currently this workflow requires `control_image` as input. Let's insert the canny block at the beginning so the pipeline accepts a regular image instead.
215+
The extracted workflow is a [`SequentialPipelineBlocks`](./sequential_pipeline_blocks) and it currently requires `control_image` as input. Insert the canny block at the beginning so the pipeline accepts a regular image instead.
188216
```py
189217
# Insert canny at the beginning
190218
blocks.sub_blocks.insert("canny", canny_block, 0)
@@ -211,7 +239,7 @@ class SequentialPipelineBlocks
211239

212240
Now the pipeline takes `image` as input instead of `control_image`. Because blocks in a sequence share data automatically, the canny block's output (`control_image`) flows to the denoise block that needs it, and the canny block's input (`image`) becomes a pipeline input since no earlier block provides it.
213241

214-
Create a pipeline from the modified blocks and load a ControlNet model.
242+
Create a pipeline from the modified blocks and load a ControlNet model. The ControlNet isn't part of the original model repository, so load it separately and add it with [`~ModularPipeline.update_components`].
215243
```py
216244
pipeline = blocks.init_pipeline("Qwen/Qwen-Image", components_manager=manager)
217245

@@ -241,6 +269,16 @@ output
241269
## Next steps
242270

243271
<hfoptions id="next">
272+
<hfoption id="Learn the basics">
273+
274+
Understand the core building blocks of Modular Diffusers:
275+
276+
- [ModularPipelineBlocks](./pipeline_block): The basic unit for defining a step in a pipeline.
277+
- [SequentialPipelineBlocks](./sequential_pipeline_blocks): Chain blocks to run in sequence.
278+
- [AutoPipelineBlocks](./auto_pipeline_blocks): Create pipelines that support multiple workflows.
279+
- [States](./modular_diffusers_states): How data is shared between blocks.
280+
281+
</hfoption>
244282
<hfoption id="Build custom blocks">
245283

246284
Learn how to create your own blocks with custom logic in the [Building Custom Blocks](./custom_blocks) guide.

0 commit comments

Comments
 (0)