Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
c09bbcb
Add Boogu-Image generation and editing pipeline
Boogu-Team Jun 18, 2026
4da1be8
Boogu: remove TaylorSeer cache, keep official scheduler + TeaCache
Boogu-Team Jun 22, 2026
f765d13
Use official FlowMatchEulerDiscreteScheduler for Boogu via time-shift…
Boogu-Team Jun 22, 2026
614f3a0
Slim Boogu pipeline: inline scheduler adapter, drop LoRA mixin and re…
Boogu-Team Jun 22, 2026
74a0fad
examples/boogu: add negative_instruction to edit examples
Boogu-Team Jun 22, 2026
1741e80
Boogu: mechanical cleanup for upstream PR (dead code, logging, conven…
Boogu-Team Jun 22, 2026
059451f
Boogu: remove prompt-tuning subsystem
Boogu-Team Jun 22, 2026
21cf345
Boogu: drop except-Exception fallback in instruction encoding
Boogu-Team Jun 22, 2026
ecc389b
Boogu: route attention through dispatch_attention_fn
Boogu-Team Jun 22, 2026
ee3738f
Boogu: make BooguImageTurboPipeline a standalone pipeline
Boogu-Team Jun 22, 2026
aa1252f
Boogu: second-pass cleanup for upstream PR
Boogu-Team Jun 22, 2026
8952e6d
Boogu: apply ruff format
Boogu-Team Jun 22, 2026
150ad3b
Boogu: drop stale ruff ignore for removed static-skills file
Boogu-Team Jun 22, 2026
9e672c2
Boogu: drop triton fused RMSNorm, use torch.nn.RMSNorm
Boogu-Team Jun 23, 2026
d202a23
Boogu: consolidate model into a single file (single-file convention)
Boogu-Team Jun 23, 2026
5cef903
Boogu: use base-class device + offload management
Boogu-Team Jun 23, 2026
05cc899
Boogu: drop weight-init, follow lazy __init__ convention
Boogu-Team Jun 25, 2026
a597eed
Boogu: rename Lumina-derived layers to BooguImage*
Boogu-Team Jun 25, 2026
36298d3
Boogu: collapse RoPE namespace class into a get_freqs_cis function
Boogu-Team Jun 25, 2026
c9a6b5d
Boogu: remove TeaCache, defer to CacheMixin
Boogu-Team Jun 25, 2026
662e6ca
Boogu: compute time-shift mu via calculate_shift
Boogu-Team Jun 25, 2026
2977b2d
Boogu: introduce BooguImageAttention module, dispatch via processors
Boogu-Team Jun 25, 2026
3b56e27
Boogu: remove boosted orthogonal guidance, keep standard CFG
Boogu-Team Jun 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -491,6 +491,8 @@
title: AnimateDiff
- local: api/pipelines/aura_flow
title: AuraFlow
- local: api/pipelines/boogu
title: Boogu-Image
- local: api/pipelines/bria_3_2
title: Bria 3.2
- local: api/pipelines/bria_fibo
Expand Down
153 changes: 153 additions & 0 deletions docs/source/en/api/pipelines/boogu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
-->

# Boogu-Image

## Overview

Boogu-Image is an instruction-driven image generation and editing model. Rather than a
plain text prompt, it is conditioned on a natural-language *instruction* that is encoded
by a Qwen3-VL multimodal LLM, which can also attend to optional reference images. A
single/double-stream transformer denoiser then predicts the latent updates, and a
flow-matching scheduler with training-aligned time shifting controls the denoising
trajectory. The VAE maps between image and latent space.

The model is released in several variants:

- **Base** (`Boogu/Boogu-Image-0.1-Base`) — text-to-image, full sampling schedule.
- **Turbo** (`Boogu/Boogu-Image-0.1-Turbo`) — DMD student model for few-step
text-to-image generation.
- **Edit** (`Boogu/Boogu-Image-0.1-Edit`) — instruction-based image editing conditioned
on one or more reference images.

FP8-quantized checkpoints are also available for each variant (the `-fp8` suffix).

There are two pipeline classes:

- [`BooguImagePipeline`] — text-to-image and instruction editing.
- [`BooguImageTurboPipeline`] — a subclass adding the DMD few-step inference path. It
defaults the guidance scales to the DMD-required values (`text_guidance_scale=1.0`,
`image_guidance_scale=1.0`, `empty_instruction_guidance_scale=0.0`).

## Usage examples

### Text-to-image

```python
import torch
from diffusers.pipelines.boogu import BooguImagePipeline

pipe = BooguImagePipeline.from_pretrained("Boogu/Boogu-Image-0.1-Base", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
instruction="A serene Chinese ink-wash landscape of the Guilin mountains bathed in golden light, layered peaks, mirror-like river, glowing golden contours.",
height=1024,
width=1024,
num_inference_steps=50,
text_guidance_scale=4.0,
).images[0]

image.save("base.png")
```

### Few-step generation (Turbo)

```python
import torch
from diffusers.pipelines.boogu import BooguImageTurboPipeline

pipe = BooguImageTurboPipeline.from_pretrained("Boogu/Boogu-Image-0.1-Turbo", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
instruction="A serene Chinese ink-wash landscape of the Guilin mountains bathed in golden light.",
height=1024,
width=1024,
num_inference_steps=4,
).images[0]

image.save("turbo.png")
```

### Instruction-based editing

Pass one or more reference images through `input_images`:

```python
import torch
from PIL import Image
from diffusers.pipelines.boogu import BooguImagePipeline

pipe = BooguImagePipeline.from_pretrained("Boogu/Boogu-Image-0.1-Edit", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
instruction="Turn the image into a colored-pencil illustration.",
input_images=[Image.open("base.png").convert("RGB")],
height=1024,
width=1024,
num_inference_steps=50,
text_guidance_scale=4.0,
image_guidance_scale=1.0,
).images[0]

image.save("edit.png")
```

### FP8 checkpoints

FP8 weights are stored in a non-safetensors format, so load the transformer separately
with `use_safetensors=False` and pass it to the pipeline:

```python
import torch
from diffusers import BooguImageTransformer2DModel
from diffusers.pipelines.boogu import BooguImagePipeline

transformer = BooguImageTransformer2DModel.from_pretrained(
"Boogu/Boogu-Image-0.1-Base-fp8",
subfolder="transformer",
torch_dtype=torch.bfloat16,
use_safetensors=False,
)
pipe = BooguImagePipeline.from_pretrained(
"Boogu/Boogu-Image-0.1-Base-fp8", torch_dtype=torch.bfloat16, transformer=transformer
)
pipe = pipe.to("cuda")
```

Runnable scripts for every variant are available in
[`examples/boogu`](https://github.com/huggingface/diffusers/tree/main/examples/boogu).

> [!TIP]
> The transformer uses fused `triton` (RMSNorm) and `flash_attn` (SwiGLU, variable-length
> attention) kernels when they are installed, and falls back to pure PyTorch otherwise.

## BooguImagePipeline

[[autodoc]] pipelines.boogu.pipeline_boogu.BooguImagePipeline
- all
- __call__

## BooguImageTurboPipeline

[[autodoc]] pipelines.boogu.pipeline_boogu_turbo.BooguImageTurboPipeline
- all
- __call__

## FMPipelineOutput

[[autodoc]] pipelines.boogu.pipeline_boogu.FMPipelineOutput
81 changes: 81 additions & 0 deletions examples/boogu/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Boogu-Image

[Boogu-Image](https://huggingface.co/Boogu) is an instruction-driven image generation and editing model. It pairs a Qwen3-VL multimodal LLM (instruction encoder) with a single/double-stream transformer denoiser and a flow-matching scheduler with training-aligned time shifting.

This directory contains minimal inference scripts for the released checkpoints.

## Environment installation
[Boogu-Image-quick-start](https://github.com/boogu-project/Boogu-Image/blob/main/quick_start.sh)

## Pipelines

| Pipeline | Class | Use case |
|---|---|---|
| Base | `BooguImagePipeline` | Text-to-image (50 steps) |
| Turbo | `BooguImageTurboPipeline` | Few-step DMD text-to-image (4 steps) |
| Edit | `BooguImagePipeline` | Instruction-based image editing (pass `input_images`) |

## Scripts

| Script | Checkpoint |
|---|---|
| `inference_base.py` | `Boogu/Boogu-Image-0.1-Base` |
| `inference_turbo.py` | `Boogu/Boogu-Image-0.1-Turbo` |
| `inference_edit.py` | `Boogu/Boogu-Image-0.1-Edit` |
| `inference_base_fp8.py` | `Boogu/Boogu-Image-0.1-Base-fp8` |
| `inference_turbo_fp8.py` | `Boogu/Boogu-Image-0.1-Turbo-fp8` |
| `inference_edit_fp8.py` | `Boogu/Boogu-Image-0.1-Edit-fp8` |

## Usage

Text-to-image:

```bash
python inference_base.py
```

Few-step (Turbo):

```bash
python inference_turbo.py
```

Image editing (reads `base.png` as the reference image, so run `inference_base.py` first):

```bash
python inference_edit.py
```

## FP8 checkpoints

FP8 weights are stored in a non-safetensors format, so the transformer is loaded
separately with `use_safetensors=False` and passed to the pipeline:

```python
import torch
from diffusers import BooguImageTransformer2DModel
from diffusers.pipelines.boogu import BooguImagePipeline

transformer = BooguImageTransformer2DModel.from_pretrained(
"Boogu/Boogu-Image-0.1-Base-fp8",
subfolder="transformer",
torch_dtype=torch.bfloat16,
use_safetensors=False,
)
pipe = BooguImagePipeline.from_pretrained(
"Boogu/Boogu-Image-0.1-Base-fp8", torch_dtype=torch.bfloat16, transformer=transformer
)
pipe = pipe.to("cuda")
```

The FP8 scripts also disable the DeepGEMM kernel for the FP8 VLM (forcing a Triton
finegrained-fp8 fallback) for broader hardware compatibility — see
`_disable_deepgemm_for_fp8_vlm()` in each FP8 script.

## Optional performance dependencies

The transformer can use fused kernels when available; without them it falls back to
pure PyTorch and prints a one-time warning:

- `triton` — fused RMSNorm
- `flash_attn` — fused SwiGLU and variable-length flash attention
20 changes: 20 additions & 0 deletions examples/boogu/inference_base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import torch

from diffusers.pipelines.boogu import BooguImagePipeline


MODEL_PATH = "Boogu/Boogu-Image-0.1-Base"

pipe = BooguImagePipeline.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

images = pipe(
instruction="一幅国风琉金风格的山水画作,展现了桂林山水在金光普照下的壮丽景象。远山层叠,江水如镜,山峰边缘勾勒着发光的金色线条。画面采用石青石绿岩彩与鎏金质感相结合,局部有厚涂油画笔触,空中飘浮着金色粒子,营造出梦幻朦胧而又磅礴大气的意境。",
height=1024,
width=1024,
num_inference_steps=50,
text_guidance_scale=4.0,
).images

images[0].save("base.png")
print("Inference OK, saved base.png")
52 changes: 52 additions & 0 deletions examples/boogu/inference_base_fp8.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import os

import torch

from diffusers import BooguImageTransformer2DModel
from diffusers.pipelines.boogu import BooguImagePipeline


def _disable_deepgemm_for_fp8_vlm() -> None:
# For transformers >= 5.11.0
os.environ["TRANSFORMERS_DISABLE_DEEPGEMM_LINEAR"] = "1"

try:
import transformers.integrations.finegrained_fp8 as fg_fp8
except Exception:
return

def _raise_import_error(*args, **kwargs):
raise ImportError("DeepGEMM disabled; forcing Triton finegrained-fp8 fallback.")

if hasattr(fg_fp8, "deepgemm_fp8_fp4_linear"):
# For 5.10.1 <= transformers < 5.11.0
fg_fp8.deepgemm_fp8_fp4_linear = _raise_import_error
elif hasattr(fg_fp8, "_load_deepgemm_kernel"):
# For 5.5.0 <= transoformers < 5.10.1
fg_fp8._load_deepgemm_kernel = _raise_import_error


_disable_deepgemm_for_fp8_vlm()

MODEL_PATH = "Boogu/Boogu-Image-0.1-Base-fp8"

transformer = BooguImageTransformer2DModel.from_pretrained(
MODEL_PATH,
subfolder="transformer",
torch_dtype=torch.bfloat16,
use_safetensors=False,
)
pipe = BooguImagePipeline.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16, transformer=transformer)
pipe = pipe.to("cuda")

images = pipe(
instruction="一幅国风琉金风格的山水画作,展现了桂林山水在金光普照下的壮丽景象。远山层叠,江水如镜,山峰边缘勾勒着发光的金色线条。画面采用石青石绿岩彩与鎏金质感相结合,局部有厚涂油画笔触,空中飘浮着金色粒子,营造出梦幻朦胧而又磅礴大气的意境。",
height=1024,
width=1024,
num_inference_steps=50,
text_guidance_scale=4.0,
).images

assert len(images) == 1
images[0].save("base_fp8.png")
print("Inference OK, saved base_fp8.png")
38 changes: 38 additions & 0 deletions examples/boogu/inference_edit.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import os

import torch
from PIL import Image

from diffusers.pipelines.boogu import BooguImagePipeline


MODEL_PATH = "Boogu/Boogu-Image-0.1-Edit"

# Negative prompt steering quality away from common artifacts. With text_guidance_scale > 1
# the model guides away from this prompt, so it noticeably improves style adherence.
NEGATIVE_INSTRUCTION = (
"(((deformed))), blurry, over saturation, bad anatomy, disfigured, poorly drawn face, "
"mutation, mutated, (extra_limb), (ugly), (poorly drawn hands), fused fingers, messy drawing, "
"broken legs censor, censored, censor_bar"
)

if not os.path.exists("base.png"):
raise FileNotFoundError("base.png not found — run inference_base.py first to generate the reference image.")

pipe = BooguImagePipeline.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

images = pipe(
instruction="把图片风格调整为彩铅插画。",
negative_instruction=NEGATIVE_INSTRUCTION,
input_images=[Image.open("base.png").convert("RGB")],
height=1024,
width=1024,
num_inference_steps=50,
text_guidance_scale=4.0,
image_guidance_scale=1.0,
).images

assert len(images) == 1
images[0].save("edit.png")
print("Inference OK, saved edit.png")
Loading
Loading