Skip to content

Request: Open-source the DPO (Direct Preference Optimization) related code #1

@ansWhite

Description

@ansWhite

Dear maintainers,
First of all, thank you very much for open-sourcing this project and sharing the codebase—it has been extremely helpful for our research on task-oriented planning with LLMs!
While reviewing the code, we noticed that the current repository only includes logic for:
Generating synthetic conversations
Rewriting conversations (basic/advanced summarization)
Generating task plans
Pairwise preference evaluation of plans (LLM-based annotation)
Sensitivity analysis of plan graphs
However, we did not find any code related to Direct Preference Optimization (DPO) (e.g., DPO loss function definition, model fine-tuning with preference data, DPO training loops, or dependency libraries like trl/torch for DPO). This seems inconsistent with the content mentioned in the associated paper (where DPO is presumably referenced/used).
We are particularly interested in how the preference data (from evaluate_plan.py) is used to optimize the model via DPO, and would greatly appreciate it if you could:
Confirm whether the paper’s DPO-related content corresponds to this codebase;
Open-source the DPO implementation code (e.g., DPO training pipeline, model fine-tuning logic, data processing for DPO, etc.);
If the DPO code cannot be open-sourced, provide brief documentation on how to integrate DPO with the existing preference annotation data in this project.
This will help us reproduce the full pipeline described in the paper and advance related research. Thank you again for your time and efforts!
Best regards,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions