PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models

Official implementation of PCPO, a novel reinforcement learning approach for aligning diffusion/flow models with human preferences.

📖 Overview

PCPO (Proportionate Credit Policy Optimization) improves upon GRPO by using (1) log-hinge loss, and (2) proportionate credit assignment.
This repository contains implementations for Stable Diffusion (SD 1.5, SD 3.5) and FLUX.

PCPO builds upon and extends several foundational works: DDPO, DanceGRPO, and Flow-GRPO.

🎯 Key Features

✅ Various Backbones: Train SD1.5 (ddpo, dancegrpo), FLUX (dancegrpo), and SD3.5-M (flowgrpo)
✅ Various Reward Models: Support for Aesthetic Score & BERTScore (ddpo), HPSv2.1, CLIPScore (dancegrpo), PickScore, OCR (flowgrpo)
✅ Efficient Training: Preprocess SD3.5-M embeddings beforehand, so that training can run on GPUs with 24GB VRAM

🚀 Quick Start

Prerequisites

Python 3.10 (recommended)
CUDA 12.6+ (recommended)
GPUs with 40GB+ VRAM (full fine-tuning) or 24GB+ VRAM (LoRA fine-tuning)
- dancegrpo requires 8 x 40GB GPUs for full fine-tuning (SD1.x, FLUX), or 8 x 24GB GPUs for LoRA fine-tuning (FLUX).
- ddpo, flowgrpo can run on 1 x 24GB GPU.

Installation

Clone the repository:

git clone https://github.com/jaylee2000/pcpo.git
cd pcpo

📊 Training

Each implementation has its own configuration and training procedures. Please refer to the respective README files for detailed instructions:

SD1.5 / FLUX (DanceGRPO): See dancegrpo/README.md for configuration, checkpoint downloads, and training scripts
SD3.5-M (Flow-GRPO): See flowgrpo/README.md for embedding preprocessing and training with PCPO/GRPO
DDPO Baseline: See ddpo/ddpo-main/README.md for DDPO training and configuration

📂 Repository Structure

pcpo/
├── dancegrpo/                      # DanceGRPO-based (SD1.x & FLUX) implementations
│   ├── fastvideo/                  # Core training and inference code
│   ├── scripts/                    # Training and preprocessing scripts
│   └── assets/                     # Prompts and datasets
├── flowgrpo/                       # FlowGRPO-based (SD3.5) implementation
│   ├── flow_grpo/                  # Core implementation
│   ├── config/                     # Training configurations
│   ├── scripts/                    # Training and preprocessing scripts
│   └── dataset/                    # Dataset utilities
└── ddpo/
    ├── ddpo-main/                  # DDPO-based (SD1.x) implementation
    │   ├── diffusion/              # Core implementation
    │   ├── diffusion_doublerg/     # Implicit Reward Guidance (IRG) implementation (Appendix F)
    │   ├── configs/                # Training and inference configurations
    │   ├── utils/                  # Helper functions, prompts & rewards
    └── qwen-server/                # Server-based BERTScore reward model

📝 Citation

If you find this work useful for your research, please cite:

@article{pcpo2025,
  title={PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models},
  author={Lee, Jeongjae and Ye, Jong Chul},
  journal={arXiv preprint arXiv:2509.25774},
  year={2025}
}

Related Works

@article{black2023ddpo,
  title={Training Diffusion Models with Reinforcement Learning},
  author={Black, Kevin and Janner, Michael and Du, Yilun and Kostrikov, Ilya and Levine, Sergey},
  journal={arXiv preprint arXiv:2305.13301},
  year={2023}
}

@article{xue2025dancegrpo,
  title={DanceGRPO: Unleashing GRPO on Visual Generation},
  author={Xue, Zeyue and Wu, Jie and Gao, Yu and Kong, Fangyuan and Zhu, Lingting and Chen, Mengzhao and Liu, Zhiheng and Liu, Wei and Guo, Qiushan and Huang, Weilin and others},
  journal={arXiv preprint arXiv:2505.07818},
  year={2025}
}

@article{liu2025flow,
  title={Flow-grpo: Training flow matching models via online rl},
  author={Liu, Jie and Liu, Gongye and Liang, Jiajun and Li, Yangguang and Liu, Jiaheng and Wang, Xintao and Wan, Pengfei and Zhang, Di and Ouyang, Wanli},
  journal={arXiv preprint arXiv:2505.05470},
  year={2025}
}

📜 License

This project is licensed under the Apache License 2.0.

🙏 Acknowledgments

This work builds upon several excellent open-source projects:

📧 Contact

For questions and discussions, please open an issue or contact [jaysquirrel2000@gmail.com].

🔖 Updates

[2025.12.06]: 🔥 Code made public!
[2025.11.24]: 🔥 Code released on Github!
[2025.09.30]: 🔥 Paper released on arXiv!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
dancegrpo		dancegrpo
ddpo		ddpo
flowgrpo		flowgrpo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models

📖 Overview

🎯 Key Features

🚀 Quick Start

Prerequisites

Installation

📊 Training

📂 Repository Structure

📝 Citation

Related Works

📜 License

🙏 Acknowledgments

📧 Contact

🔖 Updates

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models

📖 Overview

🎯 Key Features

🚀 Quick Start

Prerequisites

Installation

📊 Training

📂 Repository Structure

📝 Citation

Related Works

📜 License

🙏 Acknowledgments

📧 Contact

🔖 Updates

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages