autoresearch_agents

Standalone autonomous agent/prompt optimisation loop. No Claude Code dependency.

Inspired by Karpathy's autoresearch pattern. Model-agnostic via OpenRouter. MIT-licensed. One file. One metric. One loop.

Set the GOAL → Script runs the LOOP → You wake up to better agents

How It Works

LOOP (forever or N times):
  1. Read current agent config (agent.yaml)
  2. Ask optimiser LLM: "Analyse failures, propose ONE change"
  3. Validate the proposed YAML + check constraints
  4. Apply change → run eval suite → measure pass rate
  5. If improved → git commit, update baseline
  6. If worse    → git reset, try something else
  7. Repeat

The three primitives:

Editable asset → agent.yaml (the system prompt / config the optimiser modifies)
Scalar metric → eval pass rate from evals.yaml (binary assertions)
Git as memory → every improvement is a commit, every regression is a revert

Quick Start

# 1. Create workspace
mkdir my-agent-lab && cd my-agent-lab && git init

# 2. Copy the script
cp /path/to/autoresearch_agents.py .

# 3. Scaffold default files (creates agent.yaml, evals.yaml, program.md)
python autoresearch_agents.py --loops 0

# 4. Edit your files (see examples/ below)

# 5. Set API key and run
export OPENROUTER_API_KEY=sk-or-...
python autoresearch_agents.py --loops 50

Providers

Provider	Env Variable	Default Model	Notes
`openrouter`	`OPENROUTER_API_KEY`	`anthropic/claude-sonnet-4`	Recommended — access to all models
`anthropic`	`ANTHROPIC_API_KEY`	`claude-sonnet-4-20250514`	Direct Anthropic API
`openai`	`OPENAI_API_KEY`	`gpt-4o`	Direct OpenAI API

Pro tip: Use a stronger model as the optimiser and a cheaper model as the agent-under-test:

python autoresearch_agents.py \
  --model anthropic/claude-opus-4 \
  --agent-model anthropic/claude-haiku-4 \
  --loops 50

File Structure

my-agent-lab/
├── agent.yaml              ← THE EDITABLE ASSET (optimiser modifies this)
├── evals.yaml              ← Binary eval assertions (read-only)
├── program.md              ← Research instructions & constraints (read-only)
├── autoresearch_agents.py  ← This script
└── results.jsonl           ← Generated: append-only iteration log

Writing Evals

Assertion types:

Type	Description	Example
`contains`	Output must contain value	`{type: contains, value: "Paris"}`
`not_contains`	Output must NOT contain value	`{type: not_contains, value: "I don't know"}`
`contains_any`	Output must contain at least one	`{type: contains_any, values: ["yes", "correct"]}`
`contains_all`	Output must contain all values	`{type: contains_all, values: ["CO2", "ocean"]}`
`max_length`	Output must be under N chars	`{type: max_length, value: 500}`
`min_length`	Output must be over N chars	`{type: min_length, value: 100}`
`starts_with`	Output must start with value	`{type: starts_with, value: "Here"}`
`regex`	Output must match regex pattern	`{type: regex, value: "\\d{4}"}`

Safety

Lines marked # CONSTRAINT: in agent.yaml are never removed
evals.yaml is never modified by the optimiser
program.md defines boundaries the optimiser must respect
Git revert on any regression — your best config is always recoverable
Stagnation detection stops the loop if no progress after N iterations

Dependencies

pip install pyyaml httpx

That's it. No frameworks. No lock-in.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
autoresearch_agents.py		autoresearch_agents.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autoresearch_agents

How It Works

Quick Start

Providers

File Structure

Writing Evals

Safety

Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

autoresearch_agents

How It Works

Quick Start

Providers

File Structure

Writing Evals

Safety

Dependencies

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages