Sotopia-Verifiable: Werewolf Training Pipeline

This repository integrates the Sotopia Werewolf environment with the Verl Reinforcement Learning framework to train a Qwen agent using PPO.

🏗 Architecture Overview

We use Verl's Async Agent Loop architecture. This differs from standard RLHF (which typically does single-turn Q&A) by moving the entire game interaction inside the inference worker.

The Pipeline Flow

Orchestrator (train_werewolf.py):
- Starts the Ray cluster.
- Loads the PPO config (werewolf_ppo.yaml).
- Distributes work to Ray workers.
Make Work (data/werewolf_train.parquet):
- Verl is data-driven. It asks "What prompts should I train on?".
- Since Werewolf is an environment we just "start", we feed Verl a Dummy Dataset (created by create_dummy_data.py).
- Each row in the .parquet file triggers one game episode.
Rollout Worker (vLLM + AgentLoop):
- Verl sends a "prompt" (e.g., "Game 1") to the vLLM worker.
- Instead of just generating text, the worker triggers our Custom Agent Loop (werewolf_agent_loop.py).
The Game Loop (WerewolfAgentLoop):
- Initializes: Calls SotopiaWerewolfWrapper to create a fresh Werewolf game.
- Interacts:
  - Action: Uses vLLM generate() to get the Trainee's move.
  - Environment: Calls wrapper.step() to process that move and simulate other agents (using GPT-4 or other policies defined in Sotopia).
  - Loop: Repeats until the game ends.
- Returns: A full trajectory of (Prompt, Response, Reward) tokens back to the PPO trainer.

📂 Key Files Explained

1. `examples/train_werewolf.py`

The entry point. It registers our custom werewolf worker type so Verl knows it exists, then launches the standard Verl PPO main function.

2. `config/werewolf_ppo.yaml`

The main configuration file (Hydra format). Key settings:

rollout.name: vllm: We use vLLM for fast inference.
rollout.mode: async: CRITICAL. Tells Verl we are using the Agent Loop system.
rollout.agent.default_agent_loop: werewolf: Points to our custom class.

3. `config/agent_loop/werewolf.yaml`

A small config file that maps the name "werewolf" to the Python class DidacticAgentLoop. Loaded dynamically by the AgentLoopManager.

4. `sotopia_verifiable/workers/werewolf_agent_loop.py`

The Core Logic. This class inherits from AgentLoopBase.

It manages the conversation history (User/Assistant turns).
It handles the masking (we only train on the Agent's output, not the Environment's observations).
It calculates the final reward.

5. `sotopia_verifiable/envs/werewolf_env.py`

The Bridge. Sotopia is a complex multi-agent system. This wrapper makes it look like a simple environment:

setup_game(): Creates a scenario.
step(): Handles the Trainee's action and automatically runs all other agents (Imposters, Villagers) to finish the round.
_parse_action(): Ensures the LLM's text output ("I vote for X") becomes a valid game action.

6. `examples/create_dummy_data.py`

Generates data/werewolf_train.parquet.

Why? Verl requires an input dataset to define the "epoch".
We generate 100 "dummy" items. This effectively means "Run 100 parallel game episodes per epoch".

🚀 How to Run

Generate Data:
```
python examples/create_dummy_data.py
```

Start Training:

python examples/train_werewolf.py --config-path ../config --config-name werewolf_ppo

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
data		data
dependencies		dependencies
examples		examples
outputs/2026-01-30		outputs/2026-01-30
patches		patches
scripts		scripts
sotopia_verifiable		sotopia_verifiable
.DS_Store		.DS_Store
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sotopia-Verifiable: Werewolf Training Pipeline

🏗 Architecture Overview

The Pipeline Flow

📂 Key Files Explained

1. `examples/train_werewolf.py`

2. `config/werewolf_ppo.yaml`

3. `config/agent_loop/werewolf.yaml`

4. `sotopia_verifiable/workers/werewolf_agent_loop.py`

5. `sotopia_verifiable/envs/werewolf_env.py`

6. `examples/create_dummy_data.py`

🚀 How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sotopia-Verifiable: Werewolf Training Pipeline

🏗 Architecture Overview

The Pipeline Flow

📂 Key Files Explained

1. examples/train_werewolf.py

2. config/werewolf_ppo.yaml

3. config/agent_loop/werewolf.yaml

4. sotopia_verifiable/workers/werewolf_agent_loop.py

5. sotopia_verifiable/envs/werewolf_env.py

6. examples/create_dummy_data.py

🚀 How to Run

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `examples/train_werewolf.py`

2. `config/werewolf_ppo.yaml`

3. `config/agent_loop/werewolf.yaml`

4. `sotopia_verifiable/workers/werewolf_agent_loop.py`

5. `sotopia_verifiable/envs/werewolf_env.py`

6. `examples/create_dummy_data.py`

Packages