Skip to content

0xdefence/autoresearchOSpy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

autoresearch_agents

Standalone autonomous agent/prompt optimisation loop. No Claude Code dependency.

Inspired by Karpathy's autoresearch pattern. Model-agnostic via OpenRouter. MIT-licensed. One file. One metric. One loop.

Set the GOAL → Script runs the LOOP → You wake up to better agents

How It Works

LOOP (forever or N times):
  1. Read current agent config (agent.yaml)
  2. Ask optimiser LLM: "Analyse failures, propose ONE change"
  3. Validate the proposed YAML + check constraints
  4. Apply change → run eval suite → measure pass rate
  5. If improved → git commit, update baseline
  6. If worse    → git reset, try something else
  7. Repeat

The three primitives:

  • Editable assetagent.yaml (the system prompt / config the optimiser modifies)
  • Scalar metric → eval pass rate from evals.yaml (binary assertions)
  • Git as memory → every improvement is a commit, every regression is a revert

Quick Start

# 1. Create workspace
mkdir my-agent-lab && cd my-agent-lab && git init

# 2. Copy the script
cp /path/to/autoresearch_agents.py .

# 3. Scaffold default files (creates agent.yaml, evals.yaml, program.md)
python autoresearch_agents.py --loops 0

# 4. Edit your files (see examples/ below)

# 5. Set API key and run
export OPENROUTER_API_KEY=sk-or-...
python autoresearch_agents.py --loops 50

Providers

Provider Env Variable Default Model Notes
openrouter OPENROUTER_API_KEY anthropic/claude-sonnet-4 Recommended — access to all models
anthropic ANTHROPIC_API_KEY claude-sonnet-4-20250514 Direct Anthropic API
openai OPENAI_API_KEY gpt-4o Direct OpenAI API

Pro tip: Use a stronger model as the optimiser and a cheaper model as the agent-under-test:

python autoresearch_agents.py \
  --model anthropic/claude-opus-4 \
  --agent-model anthropic/claude-haiku-4 \
  --loops 50

File Structure

my-agent-lab/
├── agent.yaml              ← THE EDITABLE ASSET (optimiser modifies this)
├── evals.yaml              ← Binary eval assertions (read-only)
├── program.md              ← Research instructions & constraints (read-only)
├── autoresearch_agents.py  ← This script
└── results.jsonl           ← Generated: append-only iteration log

Writing Evals

Assertion types:

Type Description Example
contains Output must contain value {type: contains, value: "Paris"}
not_contains Output must NOT contain value {type: not_contains, value: "I don't know"}
contains_any Output must contain at least one {type: contains_any, values: ["yes", "correct"]}
contains_all Output must contain all values {type: contains_all, values: ["CO2", "ocean"]}
max_length Output must be under N chars {type: max_length, value: 500}
min_length Output must be over N chars {type: min_length, value: 100}
starts_with Output must start with value {type: starts_with, value: "Here"}
regex Output must match regex pattern {type: regex, value: "\\d{4}"}

Safety

  • Lines marked # CONSTRAINT: in agent.yaml are never removed
  • evals.yaml is never modified by the optimiser
  • program.md defines boundaries the optimiser must respect
  • Git revert on any regression — your best config is always recoverable
  • Stagnation detection stops the loop if no progress after N iterations

Dependencies

pip install pyyaml httpx

That's it. No frameworks. No lock-in.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages