Skip to content

Add Sotopia Arena (auth, Werewolf, ELO, EC2 deployment)#346

Open
nikitachaudharicodes wants to merge 87 commits intosotopia-lab:mainfrom
nikitachaudharicodes:frontend-sotopia
Open

Add Sotopia Arena (auth, Werewolf, ELO, EC2 deployment)#346
nikitachaudharicodes wants to merge 87 commits intosotopia-lab:mainfrom
nikitachaudharicodes:frontend-sotopia

Conversation

@nikitachaudharicodes
Copy link
Copy Markdown

Closes #

📑 Description

Adds the Sotopia Arena: a production-style web platform where humans play social games (starting with Werewolf) against LLM agents, to support research on human–AI social interaction (deception, coordination, trust in adversarial settings).

Scope of this PR:

  • Auth & users: JWT auth (register/login), OAuth (Google, GitHub, Discord), user profiles, session handling
  • Backend: FastAPI server, ELO rating system, leaderboard, game history, storage abstraction (Redis / JSON / PostgreSQL)
  • Werewolf game: Fixed critical bugs (AgentAction validation, night-phase deadlock), pack visibility, 2-round majority vote for werewolf kill
  • Frontend: Next.js app with pluggable game modules, Werewolf lobby/board, profile and history pages, WebSocket client for live play
  • Deployment: EC2 scripts under scripts/ec2/ (deploy, redeploy, Nginx, cache clear, etc.), Nginx reverse-proxy config

The branch is large by design (single feature set). Commits are organized so reviewers can follow the story (auth → backend → frontend → EC2). Happy to split into smaller follow-up PRs if the team prefers.

✅ Checks

  • My pull request adheres to the code style of this project
  • My code requires changes to the documentation
  • I have updated the documentation as required
  • All the tests have passed
  • Branch name follows type/description (e.g. feature/add-llm-agents)
  • Ready for code review

ℹ Additional Information

  • Live demo: Arena is currently deployed at http://54.89.222.156 (HTTP only; no HTTPS yet).
  • Follow-ups: Matchmaking queue, HTTPS/SSL, additional games, post-game surveys, OAuth credentials for production, data collection schema.

Keyu-He and others added 30 commits January 16, 2026 23:46
with minor bugs, will fix in future iterations
contain minor bugs, will fix in future iterations
Fixes several bugs preventing custom models (via custom/model@url format) from working:

  - Fix parameter name in generate.py: api_base → base_url (line 257)
  - Fix hardcoded "gpt-4" evaluator models in server.py (lines 309, 401)
    Now uses model_dict.get("evaluator", model_dict["env"])
  - Add markdown code block stripping in PydanticOutputParser
    Many local LLMs wrap JSON in ```json...```, parser now handles this
  - Fix format_bad_output to support custom models
    Passes base_url/api_key through error recovery path
    Conditionally uses response_format (custom servers may not support it)
…ility issues in the game

Refactor SocialDeductionGame for real-time history and cleaner prompts

- ParallelSotopiaEnv: Added `include_turn_marker` flag to control environment turn messages.
- SocialDeductionGame:
    - Disabled environment turn markers to avoid duplication.
    - Implemented real-time history appending via `recv_message` override and `agent_message_buffer`.
    - Populated `action_instruction` in `Observation` for dynamic prompt instructions.
- Observation: Added `action_instruction` field.
- generate.py: Added `fill_template` helper for partial string formatting.
- LLMAgent: Updated `aact` to use `fill_template` to inject `action_instructions` into `custom_template`.
- Werewolves: Updated config description to populate `{agent_names}` dynamically.
next step, change script_like to false, and fix the rest errors that may cause
Found and fix the evaluation and generation error on the negotiation arena examples.

- **Termination Fix**: Updated `ParallelSotopiaEnv` to pass the `env` instance to evaluators. Modified `RuleBasedTerminatedEvaluator` to correctly count active agents using `env.agents` instead of relying solely on message history, which caused early termination in the first turn.
- **LiteLLM Support**: Updated `generate.py` to handle OpenAI schema limitations. Added `_fix_schema` to convert `prefixItems` (tuples) to `items` (arrays) and set `strict=False` to support dynamic dictionary outputs (Evaluator maps) while preventing `BadRequestError`.
…r arena games

- generate.py: add action_instruction param to agenerate_action, inject as action_instruction_text in prompts
- output_parsers.py: normalize non-canonical action_type (e.g. vote, kill, lynch) to 'action' with verb in argument
Enables Werewolf phase-specific prompts and robust parsing; extensible for other arena games.
- Register/login with bcrypt + JWT, /auth/me and PUT /auth/me
- /auth/me/games and /auth/me/stats for game history and stats
- Legacy /auth/identity for backward compatibility with existing frontend
- GET /oauth/providers, /oauth/login/{provider}, /oauth/callback/{provider}
- Create/link UserProfile and OAuthAccount, issue JWT for frontend
- POST /oauth/link, GET /oauth/accounts, DELETE /oauth/unlink
- elo.py: ELO calculation, AI base ratings, update_user_elo_after_game,
  process_game_elo, rank tiers (Bronze–Master), get_rank_tier/get_rank_progress
- leaderboard.py: GET /leaderboard, /ranks, /user/{pk}, /top/{count}
- GET /profile/me, /profile/me/elo-history, /profile/me/matches (optional game_type)
- PUT /profile/me; GET /profile/{user_pk}, /profile/{user_pk}/matches
- _build_profile_stats shared by get_my_profile and get_user_profile; role_stats, win streaks, rank_tier
…game routing

- Include auth_router, oauth_router, leaderboard_router, profile_router
- SimulationState: _human_players, _pending_actions, _pack_chats, _game_types; human action queue
- WebSocket: parse game_type, auth_token, human_agent; route to werewolf or Sotopia sim
- /games/queue, /games/leaderboard, /games/history, /memory stubs; /games/werewolf/config, POST /games/werewolf/sessions/create
- WebSocketHumanAgent: human-controlled agent consuming actions from queue; normalize vote/kill/lynch to action type
- get_env_agents: explicit environment_messages fallback
- WebSocketSotopiaSimulator: game_type, human_agent_index, action_queue_getter params
- WerewolfGameEndEvaluator, WerewolfActionHandler, WerewolfEnv
- WebSocketWerewolfRunner: config load, human+LLM agents, run loop, pack chat
- Persist GameResult, process ELO; run_werewolf_simulation entrypoint
- Fix witch_have_poison typo in runner
- config: API_BASE_URL, API_ROUTES for backend and game endpoints
- auth-api: JWT storage, register/login/me/logout, OAuth login URL (/oauth/login/{provider})
- auth-api: getProfile, getEloHistory, getMatchHistory; getLeaderboard with limit/offset, backend-aligned types
- auth-api: handleOAuthTokenFromUrl, authRequest
- SimulationWebSocket: connect, startSimulation, sendClientMessage, on/off, disconnect
- WSMessageType, WSMessage, SimulationStartPayload
- generateSessionToken, getOrCreateSessionToken
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants