Add Sotopia Arena (auth, Werewolf, ELO, EC2 deployment) by nikitachaudharicodes · Pull Request #346 · sotopia-lab/sotopia

nikitachaudharicodes · 2026-03-14T03:48:11Z

Closes #

📑 Description

Adds the Sotopia Arena: a production-style web platform where humans play social games (starting with Werewolf) against LLM agents, to support research on human–AI social interaction (deception, coordination, trust in adversarial settings).

Scope of this PR:

Auth & users: JWT auth (register/login), OAuth (Google, GitHub, Discord), user profiles, session handling
Backend: FastAPI server, ELO rating system, leaderboard, game history, storage abstraction (Redis / JSON / PostgreSQL)
Werewolf game: Fixed critical bugs (AgentAction validation, night-phase deadlock), pack visibility, 2-round majority vote for werewolf kill
Frontend: Next.js app with pluggable game modules, Werewolf lobby/board, profile and history pages, WebSocket client for live play
Deployment: EC2 scripts under scripts/ec2/ (deploy, redeploy, Nginx, cache clear, etc.), Nginx reverse-proxy config

The branch is large by design (single feature set). Commits are organized so reviewers can follow the story (auth → backend → frontend → EC2). Happy to split into smaller follow-up PRs if the team prefers.

✅ Checks

My pull request adheres to the code style of this project
My code requires changes to the documentation
I have updated the documentation as required
All the tests have passed
Branch name follows type/description (e.g. feature/add-llm-agents)
Ready for code review

ℹ Additional Information

Live demo: Arena is currently deployed at http://54.89.222.156 (HTTP only; no HTTPS yet).
Follow-ups: Matchmaking queue, HTTPS/SSL, additional games, post-game surveys, OAuth credentials for production, data collection schema.

with minor bugs, will fix in future iterations

contain minor bugs, will fix in future iterations

Fixes several bugs preventing custom models (via custom/model@url format) from working: - Fix parameter name in generate.py: api_base → base_url (line 257) - Fix hardcoded "gpt-4" evaluator models in server.py (lines 309, 401) Now uses model_dict.get("evaluator", model_dict["env"]) - Add markdown code block stripping in PydanticOutputParser Many local LLMs wrap JSON in ```json...```, parser now handles this - Fix format_bad_output to support custom models Passes base_url/api_key through error recovery path Conditionally uses response_format (custom servers may not support it)

…ility issues in the game Refactor SocialDeductionGame for real-time history and cleaner prompts - ParallelSotopiaEnv: Added `include_turn_marker` flag to control environment turn messages. - SocialDeductionGame: - Disabled environment turn markers to avoid duplication. - Implemented real-time history appending via `recv_message` override and `agent_message_buffer`. - Populated `action_instruction` in `Observation` for dynamic prompt instructions. - Observation: Added `action_instruction` field. - generate.py: Added `fill_template` helper for partial string formatting. - LLMAgent: Updated `aact` to use `fill_template` to inject `action_instructions` into `custom_template`. - Werewolves: Updated config description to populate `{agent_names}` dynamically.

next step, change script_like to false, and fix the rest errors that may cause

…_Sell_custom_models.py

… server

previous commit reverted too much..

Found and fix the evaluation and generation error on the negotiation arena examples. - **Termination Fix**: Updated `ParallelSotopiaEnv` to pass the `env` instance to evaluators. Modified `RuleBasedTerminatedEvaluator` to correctly count active agents using `env.agents` instead of relying solely on message history, which caused early termination in the first turn. - **LiteLLM Support**: Updated `generate.py` to handle OpenAI schema limitations. Added `_fix_schema` to convert `prefixItems` (tuples) to `items` (arrays) and set `strict=False` to support dynamic dictionary outputs (Evaluator maps) while preventing `BadRequestError`.

…r arena games - generate.py: add action_instruction param to agenerate_action, inject as action_instruction_text in prompts - output_parsers.py: normalize non-canonical action_type (e.g. vote, kill, lynch) to 'action' with verb in argument Enables Werewolf phase-specific prompts and robust parsing; extensible for other arena games.

- Register/login with bcrypt + JWT, /auth/me and PUT /auth/me - /auth/me/games and /auth/me/stats for game history and stats - Legacy /auth/identity for backward compatibility with existing frontend

- GET /oauth/providers, /oauth/login/{provider}, /oauth/callback/{provider} - Create/link UserProfile and OAuthAccount, issue JWT for frontend - POST /oauth/link, GET /oauth/accounts, DELETE /oauth/unlink

- elo.py: ELO calculation, AI base ratings, update_user_elo_after_game, process_game_elo, rank tiers (Bronze–Master), get_rank_tier/get_rank_progress - leaderboard.py: GET /leaderboard, /ranks, /user/{pk}, /top/{count}

- GET /profile/me, /profile/me/elo-history, /profile/me/matches (optional game_type) - PUT /profile/me; GET /profile/{user_pk}, /profile/{user_pk}/matches - _build_profile_stats shared by get_my_profile and get_user_profile; role_stats, win streaks, rank_tier

…game routing - Include auth_router, oauth_router, leaderboard_router, profile_router - SimulationState: _human_players, _pending_actions, _pack_chats, _game_types; human action queue - WebSocket: parse game_type, auth_token, human_agent; route to werewolf or Sotopia sim - /games/queue, /games/leaderboard, /games/history, /memory stubs; /games/werewolf/config, POST /games/werewolf/sessions/create

- WebSocketHumanAgent: human-controlled agent consuming actions from queue; normalize vote/kill/lynch to action type - get_env_agents: explicit environment_messages fallback - WebSocketSotopiaSimulator: game_type, human_agent_index, action_queue_getter params

- WerewolfGameEndEvaluator, WerewolfActionHandler, WerewolfEnv - WebSocketWerewolfRunner: config load, human+LLM agents, run loop, pack chat - Persist GameResult, process ELO; run_werewolf_simulation entrypoint - Fix witch_have_poison typo in runner

- config: API_BASE_URL, API_ROUTES for backend and game endpoints - auth-api: JWT storage, register/login/me/logout, OAuth login URL (/oauth/login/{provider}) - auth-api: getProfile, getEloHistory, getMatchHistory; getLeaderboard with limit/offset, backend-aligned types - auth-api: handleOAuthTokenFromUrl, authRequest

- SimulationWebSocket: connect, startSimulation, sendClientMessage, on/off, disconnect - WSMessageType, WSMessage, SimulationStartPayload - generateSessionToken, getOrCreateSessionToken

…omponents

…al dev scripts

Keyu-He and others added 30 commits January 16, 2026 23:46

werewolf game in progress

b4b9f61

with minor bugs, will fix in future iterations

werewolf game in progress

155b794

contain minor bugs, will fix in future iterations

updated prompt

70a7f33

current progress

c640a1a

fix mypy errors

f8841e9

Design Social Game class, werewolf demo working in progress

80954e7

update on the SocialGame class / SocialDeductionGame class

7d1fefe

fix mypy errors

e7aedd3

debugging on the prompts

0fd87ff

werewolf game debug

17651b4

next step, change script_like to false, and fix the rest errors that may cause

Refactor social_game.py and update werewolves example

0afc26d

Add Social Game Engine documentation

ac09af2

Delete examples/experimental/negotiation_arena/NegotiationArena_1_Buy…

e7e6a37

…_Sell_custom_models.py

Revert unnessarily changes in the uniform_sample and server.py

e431c6a

Minor update on werewolf prompt, Compatibility on uniform sampler and…

8dc91c8

… server

update uniform_sampler and server.py to the correct versions

f1616ce

previous commit reverted too much..

move visibility prompt inside werewolf game's config

99d059f

feat: add werewolf Redis state store

a60177d

feat: add werewolf async game loop

e2df06a

feat: scaffold matchmaking + werewolf models

38dadbc

feat: expand fastapi server with werewolf + matchmaking APIs

e4e9df8

feat: add arena game contract dataclasses

7d8fea6

chore: add arena frontend tooling and scaffold

3f0105d

feat: add arena app routes, layouts, and styling

4dde649

feat: add shared arena shell, hooks, and UI components

deb457d

feat: add werewolf frontend module (api, hooks, UI)

665b421

feat: add sotopia chatbot application

32f01ce

nikitachaudharicodes added 30 commits March 12, 2026 23:55

feat(database): storage backend and exports for user/game models

7a40cfa

feat(database): add UserProfile, GameResult, EloHistory, OAuthAccount

6650f91

refactor(core): pass action_instruction to agenerate_action for Arena

3493721

feat(api): JWT auth, register, login, /me, legacy identity

222e8f8

- Register/login with bcrypt + JWT, /auth/me and PUT /auth/me - /auth/me/games and /auth/me/stats for game history and stats - Legacy /auth/identity for backward compatibility with existing frontend

feat(api): OAuth Google, GitHub, Discord

b6b0464

- GET /oauth/providers, /oauth/login/{provider}, /oauth/callback/{provider} - Create/link UserProfile and OAuthAccount, issue JWT for frontend - POST /oauth/link, GET /oauth/accounts, DELETE /oauth/unlink

feat(api): ELO rating and leaderboard endpoints

f4850bb

- elo.py: ELO calculation, AI base ratings, update_user_elo_after_game, process_game_elo, rank tiers (Bronze–Master), get_rank_tier/get_rank_progress - leaderboard.py: GET /leaderboard, /ranks, /user/{pk}, /top/{count}

feat(frontend): WebSocket client for simulation

5365e31

- SimulationWebSocket: connect, startSimulation, sendClientMessage, on/off, disconnect - WSMessageType, WSMessage, SimulationStartPayload - generateSessionToken, getOrCreateSessionToken

feat(frontend): AuthContext and AuthHeader

4b99df9

feat(frontend): login and register pages

39f4004

feat(frontend): profile and history pages

b832dea

feat(frontend): pluggable game module system

605a085

feat(frontend): werewolf game module definition

01b2b1e

feat(frontend): werewolf WebSocket session and actions

1e137a2

feat(frontend): werewolf lobby, board, and supporting UI

fddf84a

feat(frontend): landing page and providers

14b645a

refactor(frontend): identity and memory hooks for Arena

d077814

chore(frontend): Next.js and deployment config

6066689

refactor: remove PD and public-goods stubs and legacy chat_server

1638b26

chore(examples): update werewolf example config and script

a861d84

chore: deps and gitignore

7f302c2

chore: add EC2 deployment scripts under scripts/ec2/

e8a9c73

refactor(frontend): remove unused experiment-root and orphaned chat c…

1e016c3

…omponents

chore(api): remove debug prints; fix(werewolf): ...

cb8c409

chore: ignore root package-lock, test_websocket, redis_agent_bot, loc…

e198c66

…al dev scripts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Sotopia Arena (auth, Werewolf, ELO, EC2 deployment)#346

Add Sotopia Arena (auth, Werewolf, ELO, EC2 deployment)#346
nikitachaudharicodes wants to merge 87 commits intosotopia-lab:mainfrom
nikitachaudharicodes:frontend-sotopia

nikitachaudharicodes commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nikitachaudharicodes commented Mar 14, 2026

📑 Description

✅ Checks

ℹ Additional Information

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants