Skip to content

jaylee2000/pcpo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models

Paper License

Official implementation of PCPO, a novel reinforcement learning approach for aligning diffusion/flow models with human preferences.


📖 Overview

PCPO (Proportionate Credit Policy Optimization) improves upon GRPO by using (1) log-hinge loss, and (2) proportionate credit assignment.
This repository contains implementations for Stable Diffusion (SD 1.5, SD 3.5) and FLUX.

PCPO builds upon and extends several foundational works: DDPO, DanceGRPO, and Flow-GRPO.

🎯 Key Features

  • Various Backbones: Train SD1.5 (ddpo, dancegrpo), FLUX (dancegrpo), and SD3.5-M (flowgrpo)
  • Various Reward Models: Support for Aesthetic Score & BERTScore (ddpo), HPSv2.1, CLIPScore (dancegrpo), PickScore, OCR (flowgrpo)
  • Efficient Training: Preprocess SD3.5-M embeddings beforehand, so that training can run on GPUs with 24GB VRAM

🚀 Quick Start

Prerequisites

  • Python 3.10 (recommended)
  • CUDA 12.6+ (recommended)
  • GPUs with 40GB+ VRAM (full fine-tuning) or 24GB+ VRAM (LoRA fine-tuning)
    • dancegrpo requires 8 x 40GB GPUs for full fine-tuning (SD1.x, FLUX), or 8 x 24GB GPUs for LoRA fine-tuning (FLUX).
    • ddpo, flowgrpo can run on 1 x 24GB GPU.

Installation

Clone the repository:

git clone https://github.com/jaylee2000/pcpo.git
cd pcpo

📊 Training

Each implementation has its own configuration and training procedures. Please refer to the respective README files for detailed instructions:

📂 Repository Structure

pcpo/
├── dancegrpo/                      # DanceGRPO-based (SD1.x & FLUX) implementations
│   ├── fastvideo/                  # Core training and inference code
│   ├── scripts/                    # Training and preprocessing scripts
│   └── assets/                     # Prompts and datasets
├── flowgrpo/                       # FlowGRPO-based (SD3.5) implementation
│   ├── flow_grpo/                  # Core implementation
│   ├── config/                     # Training configurations
│   ├── scripts/                    # Training and preprocessing scripts
│   └── dataset/                    # Dataset utilities
└── ddpo/
    ├── ddpo-main/                  # DDPO-based (SD1.x) implementation
    │   ├── diffusion/              # Core implementation
    │   ├── diffusion_doublerg/     # Implicit Reward Guidance (IRG) implementation (Appendix F)
    │   ├── configs/                # Training and inference configurations
    │   ├── utils/                  # Helper functions, prompts & rewards
    └── qwen-server/                # Server-based BERTScore reward model

📝 Citation

If you find this work useful for your research, please cite:

@article{pcpo2025,
  title={PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models},
  author={Lee, Jeongjae and Ye, Jong Chul},
  journal={arXiv preprint arXiv:2509.25774},
  year={2025}
}

Related Works

@article{black2023ddpo,
  title={Training Diffusion Models with Reinforcement Learning},
  author={Black, Kevin and Janner, Michael and Du, Yilun and Kostrikov, Ilya and Levine, Sergey},
  journal={arXiv preprint arXiv:2305.13301},
  year={2023}
}

@article{xue2025dancegrpo,
  title={DanceGRPO: Unleashing GRPO on Visual Generation},
  author={Xue, Zeyue and Wu, Jie and Gao, Yu and Kong, Fangyuan and Zhu, Lingting and Chen, Mengzhao and Liu, Zhiheng and Liu, Wei and Guo, Qiushan and Huang, Weilin and others},
  journal={arXiv preprint arXiv:2505.07818},
  year={2025}
}

@article{liu2025flow,
  title={Flow-grpo: Training flow matching models via online rl},
  author={Liu, Jie and Liu, Gongye and Liang, Jiajun and Li, Yangguang and Liu, Jiaheng and Wang, Xintao and Wan, Pengfei and Zhang, Di and Ouyang, Wanli},
  journal={arXiv preprint arXiv:2505.05470},
  year={2025}
}

📜 License

This project is licensed under the Apache License 2.0.

🙏 Acknowledgments

This work builds upon several excellent open-source projects:

📧 Contact

For questions and discussions, please open an issue or contact [jaysquirrel2000@gmail.com].

🔖 Updates

  • [2025.12.06]: 🔥 Code made public!
  • [2025.11.24]: 🔥 Code released on Github!
  • [2025.09.30]: 🔥 Paper released on arXiv!

About

[ICLR 2026] An official implementation of PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors