Awesome Agentic Engineering Resources

A curated list of high-signal resources — articles, books, courses, cookbooks, papers, playbooks, benchmarks, talks, podcasts, and newsletters — for agentic engineering and AI engineering.

This is a resources list, not a tools list. Open-source tools for building agentic systems live in the sister list awesome-production-agentic-systems; production ML tooling lives in awesome-production-machine-learning. This list covers the learning, design, and operational resources that sit alongside those tools — including both:

Agentic engineering focuses on using AI agents to do software engineering (Copilot, Cursor, Claude Code, Aider, Cline, Windsurf, Codex; spec-driven development; context engineering; agent IDE rules and memory files; SWE benchmarks). AI / agentic systems engineering focuses on building agentic and LLM-powered systems (architecture, RAG, memory, tool use & MCP, orchestration, multi-agent coordination, evaluation, observability, guardrails, safety, fine-tuning, inference, product/UX, economics, teams).

You can keep up to date by watching this repo for the monthly releases summarising newly added resources 🤩

This list was proposed in EthicalML/awesome-production-machine-learning#709 as a sister list focused on resources rather than tools.

Legend

Resources are tagged with icons so you can scan and filter at a glance:

Icon	Meaning
⭐	Editors' pick — start here
🆓	Free to access
💰	Paid
📘	Book
🧑‍🎓	Course
🎥	Video / talk
🎧	Audio / podcast
📄	Paper
🛠️	Hands-on cookbook / tutorial
📋	Playbook / design-pattern catalog
🧪	Benchmark / leaderboard
🏗️	Reference implementation / case study
📰	Newsletter

Quick links to sections on this page


⭐ Trending / What's New	🧭 Core & Foundations	🗓️ Milestones Timeline
👥 Communities	🧑‍🎓 Courses	📘 Books
✍️ Articles & Essays	🛠️ Tutorials & Cookbooks	📋 Playbooks & Patterns
📄 Papers & Research	🧪 Benchmarks & Leaderboards	🏗️ Reference Implementations
🎥 Talks & Conferences	🎧 Podcasts	📰 Newsletters
🛡️ Governance, Safety & Responsible AI	🎨 Product, UX & Economics of AI	🧑‍🤝‍🧑 Teams, Hiring & Org Design

Topic Coverage Matrix

Resources are organised as a matrix: the top-level sections above (rows) are resource types, and each section is sub-divided by topic. The 21 topics, T1–T21, are shared across sections. This lets you read vertically ("what papers exist on RAG?") or horizontally ("where do I find resources on Coding Agents?").

Topics:

#	Topic
T1	Coding Agents & AI-Assisted Development (Copilot, Cursor, Claude Code, Aider, Cline, Windsurf, Codex)
T2	Spec-Driven Development & Context Engineering (AGENTS.md, spec-kit, rules files)
T3	Agent IDE Rules, Memory Files & Developer Workflows
T4	SWE Benchmarks & Coding Evaluation
T5	Autonomous Software Agents & Long-Horizon Engineering Tasks
T6	LLM Application Architecture & System Design
T7	Prompt Engineering
T8	Retrieval-Augmented Generation (RAG)
T9	Memory Systems & Long-Context
T10	Tool Use, Function Calling & MCP
T11	Orchestration, Planning & Design Patterns
T12	Multi-Agent Systems & Coordination
T13	Evaluation & Testing
T14	Observability, Tracing & Debugging
T15	Guardrails & Security (prompt injection, jailbreaks, red-teaming)
T16	Safety, Alignment & Responsible AI
T17	Fine-tuning, Post-training, RLHF & Reasoning Training
T18	Inference, Serving, Cost & Latency
T19	Voice, Multi-modal & Embodied Agents
T20	Product, UX & Human-AI Interaction Design
T21	Economics, Teams, Hiring & Org Design

Coverage (● = populated, ○ = opportunistic / partial, — = out of scope for that row):

Row \ Topic	T1	T2	T3	T4	T5	T6	T7	T8	T9	T10	T11	T12	T13	T14	T15	T16	T17	T18	T19	T20	T21
Core & Foundations	●	●	○	○	○	●	●	●	○	●	●	○	●	○	○	○	○	○	○	○	○
Communities	●	○	○	○	○	●	●	●	○	●	●	○	●	●	○	●	●	●	○	●	●
Courses	●	○	○	●	○	●	●	●	○	●	●	●	●	●	●	●	●	●	○	○	○
Books	●	○	○	—	○	●	●	●	○	●	●	○	●	○	●	●	●	●	○	●	●
Articles & Essays	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●
Tutorials & Cookbooks	●	●	●	○	●	●	●	●	●	●	●	●	●	●	●	○	●	●	●	○	—
Playbooks & Patterns	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	○	●	○	●	●
Papers & Research	●	○	—	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	○
Benchmarks	●	—	—	●	●	○	○	●	○	●	●	●	●	○	●	●	○	●	●	○	—
Reference Impls	●	●	●	●	●	●	○	●	●	●	●	●	●	●	●	○	●	●	●	●	●
Talks & Conferences	●	●	○	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●
Podcasts	●	○	○	○	●	●	●	●	○	●	●	●	●	●	●	●	●	●	○	●	●
Newsletters	●	○	○	○	○	●	●	●	○	●	●	○	●	●	●	●	●	●	○	●	●

The Trending / What's New, Milestones Timeline, Governance & Responsible AI, Product / UX / Economics, and Teams, Hiring & Org Design sections collapse across topics and are presented as curated lists rather than matrix cells.

Contributing to the list

Please review our CONTRIBUTING.md before submitting a PR — it explains the one-line description style, how to pick the right row/topic cell, and the quality bar for inclusion. Thank you to the community for supporting the list's growth 🚀

Want to receive recurring updates on this repo and other advancements

You can join the Machine Learning Engineer newsletter. Join over 70,000 ML professionals and enthusiasts who receive weekly curated articles & tutorials on production Machine Learning.
Also check out Awesome Production Agentic Systems and Awesome Production Machine Learning, the sister lists of open-source tools for agentic systems and production ML respectively.

Main Content

⭐ Trending / What's New

Rotating pinned items: the most-discussed agentic & AI-engineering resources of the current cycle. Refreshed regularly — see CONTRIBUTING.md for nomination criteria.

⭐ 🆓 Building effective agents — Anthropic (2024). The most-cited reference for agent design patterns (augmented LLM, prompt chaining, routing, parallelisation, orchestrator-workers, evaluator-optimiser, autonomous agents). Start here before any other agent reading.
⭐ 🆓 How we built our multi-agent research system — Anthropic (2025). Production retrospective on Claude's multi-agent research mode: orchestrator/subagent split, prompt engineering for agents, evaluation and failure modes.
⭐ 🆓 A practical guide to building agents — OpenAI (2025). 30-page PDF covering when (and when not) to build agents, tool design, guardrails, and human-in-the-loop patterns.
⭐ 🆓 The bitter lesson of AI agents / Agentic Coding: The Future of Software Development with Agents — Armin Ronacher (2025). Widely-shared essays on what it actually feels like to ship with agentic coding tools day-to-day.
🆓 Claude Code: Best practices for agentic coding — Anthropic (2025). CLAUDE.md, slash-commands, headless mode, custom permissions — the canonical how-to-use-Claude-Code reference.
🆓 How to build an agent — Thorsten Ball / Amp (2025). Viral step-by-step implementation of a tool-using coding agent in ~400 lines of Go, demystifying "what is an agent" in code.
🆓 The new code — Sean Grove / OpenAI on Latent Space (2025). Specs-as-code: the spec is the new artefact, models are the compiler. Heavily cited in the AGENTS.md / spec-kit discussion.
🆓 AGENTS.md — Community standard (2025) for per-repo agent instructions, now read by Claude Code, Codex, Aider, Cursor, Cline, Windsurf and others.

🧭 Core & Foundations

Canonical "what is agentic engineering / AI engineering" reading. Start here.

T1 · Coding Agents & AI-Assisted Development

⭐ 🆓 Building effective agents — Anthropic. The reference taxonomy of agent design patterns (workflows vs. agents).
⭐ 🆓 Claude Code: Best practices for agentic coding — Anthropic. CLAUDE.md, tools, slash-commands, headless mode.
🆓 How to build an agent — Thorsten Ball. A working coding agent in ~400 lines; the clearest "agents are not magic" walkthrough.
🆓 Here's how I use LLMs to help me write code — Simon Willison. Grounded, practice-first account of daily LLM-assisted development.

T2 · Spec-Driven Development & Context Engineering

⭐ 🆓 The new code — Sean Grove (OpenAI) on Latent Space. The canonical "specs are the new code" essay.
🆓 AGENTS.md — Community standard for per-repo agent instructions.
🆓 spec-kit — GitHub's toolkit and essay set on spec-driven development with coding agents.
🆓 The rise of "context engineering" — LangChain. Why prompt engineering became context engineering.

T6 · LLM Application Architecture & System Design

⭐ 📘 💰 AI Engineering — Chip Huyen (O'Reilly, 2025). The textbook for building LLM applications end-to-end.
⭐ 🆓 Patterns for Building LLM-based Systems & Products — Eugene Yan. Evaluation, RAG, fine-tuning, caching, guardrails, defensive UX, collecting feedback — the reference pattern catalogue.
🆓 Emerging Architectures for LLM Applications — a16z. The widely-shared reference diagram for the LLM app stack.
🆓 What We Learned from a Year of Building with LLMs — Yan, Bensal, Bhawal, Husain, Shankar (2024). Tactical, operational, and strategic lessons distilled from shipping.

T7 · Prompt Engineering

⭐ 🆓 Prompt Engineering — Lilian Weng (OpenAI). The systematic taxonomy.
🆓 Prompt Engineering Guide — DAIR.AI. Continuously updated, with per-technique deep-dives.
🆓 OpenAI: Prompt engineering — OpenAI official guide.
🆓 Anthropic: Prompt engineering overview — Anthropic's practical guide for Claude.

T8 · Retrieval-Augmented Generation (RAG)

⭐ 📄 🆓 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis et al. (2020). The original RAG paper.
⭐ 🆓 Advanced RAG Techniques / Pinecone Learn — Pinecone. The hub for RAG primers and patterns.
🆓 Retrieval-Augmented Generation for LLMs: A Survey — Gao et al. (2023). The reference survey.
🆓 RAG is more than just embedding search — Jason Liu. Systems-view RAG: query understanding, tool routing, evaluation.

T10 · Tool Use, Function Calling & MCP

⭐ 🆓 Introducing the Model Context Protocol — Anthropic (2024). The canonical introduction to MCP.
⭐ 🆓 Model Context Protocol — Specification — Open protocol docs and SDKs.
📄 🆓 Toolformer: Language Models Can Teach Themselves to Use Tools — Schick et al. (2023). The foundational tool-use paper.
🆓 Function calling guide — OpenAI. The canonical reference for structured tool calls.

T11 · Orchestration, Planning & Design Patterns

⭐ 🆓 Building effective agents — Anthropic. The orchestration pattern taxonomy.
🆓 LLM Powered Autonomous Agents — Lilian Weng. The canonical deep-dive on planning, memory, and tool use in agent loops.
📄 🆓 ReAct: Synergizing Reasoning and Acting in Language Models — Yao et al. (2022). The foundational reason+act loop.
📄 🆓 The Rise and Potential of LLM Based Agents: A Survey — Xi et al. (2023). Survey of agent architectures and components.

T13 · Evaluation & Testing

⭐ 🆓 Your AI Product Needs Evals — Hamel Husain. The most-cited essay on why and how to build evals for LLM products.
🆓 Task-Specific LLM Evals that Do & Don't Work — Eugene Yan. A pragmatic survey of eval techniques per task type.
📄 🆓 Judging LLM-as-a-Judge — Zheng et al. (2023). The foundational LLM-as-judge paper (MT-Bench, Chatbot Arena).
🆓 Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences — Shankar et al. (2024). How to make LLM-judges trustworthy.

🗓️ Milestones Timeline

Dated, field-defining events that shaped agentic & AI engineering.

Date	Event	Reference
2017-06	Transformer architecture introduced	Attention Is All You Need
2020-05	GPT-3 shows in-context learning at scale	Language Models are Few-Shot Learners
2020-05	RAG framework introduced	RAG for Knowledge-Intensive NLP
2021-06	GitHub Copilot preview launches — first mainstream AI coding assistant	GitHub blog
2022-01	Chain-of-Thought prompting	Wei et al.
2022-03	InstructGPT / RLHF	Ouyang et al.
2022-10	ReAct: reasoning + acting agent loop	Yao et al.
2022-11	ChatGPT release — mainstream adoption inflection	OpenAI
2023-03	GPT-4 release	OpenAI
2023-03	HuggingGPT / Toolformer-era tool use	Toolformer
2023-03	LangChain & LlamaIndex hit mainstream	—
2023-05	Voyager: open-ended agents in Minecraft	Voyager
2023-06	Simon Willison coins "prompt injection" as a durable threat category	SW blog
2023-10	SWE-bench released — real-world coding eval	SWE-bench
2023-12	Mixture-of-experts open models (Mixtral)	Mistral
2024-03	Devin demo — autonomous software agent pitch	Cognition
2024-05	GPT-4o: native multi-modal + realtime voice	OpenAI
2024-06	Anthropic's "Building effective agents" publishes	Anthropic
2024-07	SWE-bench Verified launched	OpenAI
2024-09	o1 reveals reasoning-model era	OpenAI
2024-11	Model Context Protocol (MCP) announced	Anthropic
2025-02	Claude Code general availability	Anthropic
2025-05	AGENTS.md published as cross-agent standard	agents.md
2025-06	GitHub spec-kit / "new code" essays formalise spec-driven dev	spec-kit

👥 Communities

Discords, Slacks, forums, and meetups where practitioners gather.

🆓 MLOps Community — Slack + podcast + meetups; the biggest practitioner community at the ops/engineering intersection. Active agent and LLM-ops channels.
🆓 LangChain Discord — Heavy day-to-day Q&A on agent orchestration, RAG, evaluation, MCP.
🆓 LlamaIndex Discord — RAG-centric builder community with active reference-impl discussion.
🆓 r/LocalLLaMA — The definitive open-weights / local-inference forum; fastest signal for new models, quantisation, and serving.
🆓 r/MachineLearning — Academic and practitioner mix; where new papers and threads get dissected.
🆓 Hacker News — Filter for "LLM", "agent", "Claude", "Cursor" — where engineering-side essays trend.
🆓 EleutherAI Discord — Open research community; strong training/interpretability discussion.
🆓 Hugging Face Discord & Forums — Transformers, TRL, PEFT, model-hub discussions.
🆓 AI Engineer World's Fair / Latent Space Discord — Practitioner community anchoring the AI Engineer conference series.
🆓 AI Dev Board — Community-curated hub for AI engineering resources and discussions.
🆓 Cursor Community Forum — User-driven forum for Cursor rules, MCP, and workflows.
🆓 Anthropic Discord — Official Claude / Claude Code / MCP community.

🧑‍🎓 Courses

Structured courses — free and paid, university and industry.

T1 · Coding Agents & AI-Assisted Development

⭐ 🧑‍🎓 🆓 AI Python for Beginners — DeepLearning.AI (Andrew Ng). Gateway to AI-assisted coding.
🧑‍🎓 🆓 Pair Programming with a Large Language Model — DeepLearning.AI + Google.
🧑‍🎓 🆓 GitHub Copilot Fundamentals — Microsoft Learn. Official training path.

T4 · SWE Benchmarks & Coding Evaluation

🧑‍🎓 🆓 Evaluating and Debugging Generative AI — DeepLearning.AI + W&B. Covers coding-eval mechanics.
🧑‍🎓 🆓 Mastering LLMs: Evals — Hamel Husain & Shreya Shankar (Maven). Companion evals-for-LLMs curriculum.
🧑‍🎓 🆓 SWE-bench tutorial — Princeton NLP. Free, self-paced walk-through of running and scoring coding evals.

T6 · LLM Application Architecture & System Design

⭐ 🧑‍🎓 🆓 LLM Bootcamp — Full Stack Deep Learning. Free 2-day bootcamp on building LLM apps end-to-end.
🧑‍🎓 🆓 Building Systems with the ChatGPT API — DeepLearning.AI + OpenAI.
🧑‍🎓 🆓 CS25: Transformers United — Stanford. Seminal deep-dive seminar series.

T7 · Prompt Engineering

⭐ 🧑‍🎓 🆓 ChatGPT Prompt Engineering for Developers — Andrew Ng & Isa Fulford (OpenAI).
🧑‍🎓 🆓 Anthropic Prompt Engineering Interactive Tutorial — Anthropic. Hands-on, notebook-based.
🧑‍🎓 🆓 Prompt Engineering Guide (DAIR.AI) — Self-paced, continuously updated.

T8 · Retrieval-Augmented Generation (RAG)

🧑‍🎓 🆓 Advanced Retrieval for AI with Chroma — DeepLearning.AI.
🧑‍🎓 🆓 Building and Evaluating Advanced RAG Applications — DeepLearning.AI + LlamaIndex + TruEra.
🧑‍🎓 🆓 LangChain Chat with Your Data — DeepLearning.AI + LangChain.
🧑‍🎓 💰 Systematically Improving RAG Applications — Jason Liu on Maven.

T10 · Tool Use, Function Calling & MCP

🧑‍🎓 🆓 Functions, Tools and Agents with LangChain — DeepLearning.AI + LangChain.
🧑‍🎓 🆓 MCP: Build Rich-Context AI Apps with Anthropic — DeepLearning.AI + Anthropic.
🧑‍🎓 🆓 Introduction to MCP — Anthropic official quickstart.

T11 · Orchestration, Planning & Design Patterns

🧑‍🎓 🆓 AI Agents in LangGraph — DeepLearning.AI + LangChain.
🧑‍🎓 🆓 AI Agentic Design Patterns with AutoGen — DeepLearning.AI + Microsoft.
🧑‍🎓 🆓 Hugging Face Agents Course — Hugging Face. Free, certifying course on agent fundamentals.

T12 · Multi-Agent Systems

🧑‍🎓 🆓 Multi AI Agent Systems with crewAI — DeepLearning.AI + crewAI.
🧑‍🎓 🆓 Practical Multi AI Agents and Advanced Use Cases with crewAI — DeepLearning.AI.
🧑‍🎓 🆓 Building Agentic RAG with LlamaIndex — DeepLearning.AI + LlamaIndex.

T13 · Evaluation & Testing

⭐ 🧑‍🎓 💰 AI Evals For Engineers & PMs — Hamel Husain & Shreya Shankar on Maven. The industry-standard evals cohort course.
🧑‍🎓 🆓 Quality and Safety for LLM Applications — DeepLearning.AI + WhyLabs.
🧑‍🎓 🆓 Automated Testing for LLMOps — DeepLearning.AI + CircleCI.

T14 · Observability, Tracing & Debugging

🧑‍🎓 🆓 LLMOps — DeepLearning.AI + Google Cloud.
🧑‍🎓 🆓 Evaluating LLMs with Arize — Arize course hub.
🧑‍🎓 🆓 LangSmith Academy — LangChain. Free self-paced LangSmith courses covering tracing and evals.

T15 · Guardrails & Security

🧑‍🎓 🆓 Red Teaming LLM Applications — DeepLearning.AI + Giskard.
🧑‍🎓 🆓 Safe and Reliable AI via Guardrails — DeepLearning.AI + Guardrails AI.
🧑‍🎓 🆓 Prompt Injection Attacks (Learn Prompting) — Learn Prompting. Open course covering injection/jailbreak taxonomies.

T16 · Safety, Alignment & Responsible AI

🧑‍🎓 🆓 AI Safety Fundamentals — BlueDot Impact. The standard entry curriculum.
🧑‍🎓 🆓 ARENA (Alignment Research Engineer Accelerator) — Hands-on alignment / interpretability.
🧑‍🎓 🆓 Intro to AI Safety, Remastered — Richard Ngo / BlueDot. Free reading curriculum.

T17 · Fine-tuning, Post-training & RLHF

⭐ 🧑‍🎓 🆓 Finetuning Large Language Models — DeepLearning.AI + Lamini.
🧑‍🎓 🆓 Reinforcement Learning from Human Feedback — DeepLearning.AI + Google Cloud.
🧑‍🎓 🆓 Hugging Face NLP Course (incl. RLHF chapter) — Hugging Face.

T18 · Inference, Serving, Cost & Latency

🧑‍🎓 🆓 Efficiently Serving LLMs — DeepLearning.AI + Predibase.
🧑‍🎓 🆓 Quantization Fundamentals with Hugging Face — DeepLearning.AI + HF.
🧑‍🎓 🆓 CUDA Mode lectures — Community lectures on GPU inference internals.

📘 Books

Published and in-progress books covering agentic & AI engineering.

T1 · Coding Agents & AI-Assisted Development

⭐ 📘 💰 AI-Assisted Programming — Tom Taulli (O'Reilly, 2024). Practical coverage of Copilot/Cursor/Claude workflows.
📘 💰 Prompt Engineering for Generative AI — James Phoenix & Mike Taylor (O'Reilly, 2024). Includes heavy coverage of code-generation prompting patterns.

T6 · LLM Application Architecture & System Design

⭐ 📘 💰 AI Engineering: Building Applications with Foundation Models — Chip Huyen (O'Reilly, 2025). The reference textbook for the field.
📘 💰 Designing Machine Learning Systems — Chip Huyen (O'Reilly, 2022). The prior-generation canonical ML-systems text; still essential for data/infra context.
📘 💰 Generative AI on AWS — Chris Fregly, Antje Barth, Shelbee Eigenbrode (O'Reilly, 2023).

T7 · Prompt Engineering

📘 🆓 Prompt Engineering for LLMs — John Berryman & Albert Ziegler (O'Reilly, 2024). From Copilot's original tech-lead.
📘 💰 The Prompt Report — Schulhoff et al. (2024). A 76-page survey that effectively functions as a book-length prompting reference.

T8 · RAG

📘 💰 Building LLM Apps — Valentina Alto (Wiley, 2024). RAG-heavy application text.
📘 🆓 RAG-Driven Generative AI — Denis Rothman (Packt, 2024).

T10 · Tool Use & MCP

📘 💰 Building Intelligent Apps with OpenAI — Olivier Caelen & Marie-Alice Blete (O'Reilly, 2024). Heavy function-calling coverage.

T11 · Orchestration & Design Patterns

📘 💰 Generative AI with LangChain — Ben Auffarth (Packt, 2023). Orchestration patterns end-to-end.

T13 · Evaluation

📘 💰 Prompt Engineering for Generative AI — Phoenix & Taylor (O'Reilly, 2024). Chapter-length eval coverage.

T15 · Guardrails & Security

📘 💰 The Developer's Playbook for Large Language Model Security — Steve Wilson (O'Reilly, 2024). OWASP LLM Top 10 project lead's book.
📘 💰 Generative AI Security — Ken Huang et al. (Apress, 2024).

T16 · Safety, Alignment & Responsible AI

📘 💰 Human Compatible — Stuart Russell (2019). The foundational alignment argument.
📘 💰 The Alignment Problem — Brian Christian (2020). The canonical popular-press primer.

T17 · Fine-tuning & Post-training

⭐ 📘 💰 Build a Large Language Model (From Scratch) — Sebastian Raschka (Manning, 2024). The reference hands-on text.
📘 💰 Hands-On Large Language Models — Jay Alammar & Maarten Grootendorst (O'Reilly, 2024).

T18 · Inference & Serving

📘 💰 Efficient Processing of Deep Neural Networks — Sze et al. (Morgan & Claypool). Hardware/inference reference.

T20 · Product & UX

📘 💰 Designing Machine Learning Systems — Chip Huyen. Includes pragmatic product/UX chapters.
📘 💰 Human-AI Interaction Design — IxDF topic hub.

T21 · Economics, Teams & Org

📘 💰 Managing Machine Learning Projects — Simon Thompson (Manning).
📘 🆓 The Pragmatic Engineer's AI coverage — Gergely Orosz. Regularly-updated editorial that functions as a rolling book on AI-engineering org design.

✍️ Articles & Essays

Long-form writing from canonical authors and engineering teams.

T1 · Coding Agents & AI-Assisted Development

⭐ 🆓 Here's how I use LLMs to help me write code — Simon Willison.
🆓 Agentic Coding: The Future of Software Development — Armin Ronacher.
🆓 Revenge of the junior developer — Steve Yegge (Sourcegraph).
🆓 The death of the stubborn developer — Steve Yegge.

T2 · Spec-Driven Development & Context Engineering

⭐ 🆓 The new code — Sean Grove / Latent Space.
🆓 Context Engineering — LangChain.
🆓 The rise of "context engineering" — LangChain.
🆓 Spec-driven development with AI — GitHub Blog.

T3 · Agent IDE Rules, Memory Files & Workflows

⭐ 🆓 Claude Code: Best practices for agentic coding — Anthropic.
🆓 Cursor rules directory — Community catalogue of .cursorrules files.
🆓 My Claude Code setup — widely-shared CLAUDE.md + slash-command playbook.
🆓 Aider: Tips for using with large codebases — Aider docs.

T4 · SWE Benchmarks & Coding Evaluation

⭐ 🆓 Introducing SWE-bench Verified — OpenAI.
🆓 Why we built Terminal-Bench — Stanford / Laude.
🆓 Measuring an AI system's ability to do ML R&D — METR.
🆓 The leaderboard illusion — Singh et al. on bench-gaming.

T5 · Autonomous Software Agents

⭐ 🆓 How we built our multi-agent research system — Anthropic.
🆓 Devin, a software engineer — Cognition.
🆓 Don't build multi-agents — Cognition. Contrarian but important counterpoint to multi-agent maximalism.
🆓 SWE-agent: Agent-Computer Interfaces — Princeton NLP writeup.

T6 · LLM Application Architecture

⭐ 🆓 Patterns for Building LLM-based Systems & Products — Eugene Yan.
🆓 Emerging Architectures for LLM Applications — a16z.
🆓 What We Learned from a Year of Building with LLMs — Yan/Bensal/Bhawal/Husain/Shankar.
🆓 Twelve factor agents — HumanLayer. The "12-factor app" equivalent for agent apps.

T7 · Prompt Engineering

⭐ 🆓 Prompt Engineering — Lilian Weng.
🆓 Prompting is programming — Eugene Yan.
🆓 A guide to prompting Claude — Anthropic.
🆓 The prompt report — Learn Prompting team summary of their 76-page survey.

T11 · Orchestration & Design Patterns

⭐ 🆓 LLM Powered Autonomous Agents — Lilian Weng.
🆓 Building effective agents — Anthropic.
🆓 Agent design patterns — Andrew Ng, The Batch series.
🆓 AI agent frameworks — Latent Space comparative review.

T12 · Multi-Agent Systems & Coordination

⭐ 🆓 How we built our multi-agent research system — Anthropic.
🆓 Multi-agent workflows — LangChain.
🆓 Don't build multi-agents — Cognition.
🆓 AutoGen: Enabling next-gen LLM applications — Microsoft.

T13 · Evaluation & Testing

⭐ 🆓 Your AI product needs evals — Hamel Husain.
🆓 Task-specific LLM evals that do & don't work — Eugene Yan.
🆓 Creating a LLM-as-a-Judge that drives business results — Hamel Husain.
🆓 LLM evals: everything I learned in 12 months — Shreya Shankar.

T14 · Observability, Tracing & Debugging

⭐ 🆓 So you want to build an LLM observability platform — Hamel Husain (subsection of evals post; foundational).
🆓 The OpenTelemetry Gen AI semantic conventions — OTel.
🆓 How Honeycomb uses LLMs for product experiences — Phillip Carter.
🆓 Logfire: observability for the LLM era — Pydantic.

T15 · Guardrails & Security

⭐ 🆓 Prompt injection series — Simon Willison. Canonical ongoing series.
🆓 OWASP Top 10 for LLM Applications — OWASP.
🆓 Universal and Transferable Adversarial Attacks on Aligned LLMs — Zou et al. (GCG attack).
🆓 Red teaming LLMs — Hugging Face.

T16 · Safety, Alignment & Responsible AI

⭐ 🆓 Core Views on AI Safety — Anthropic.
🆓 Anthropic's Responsible Scaling Policy — Anthropic.
🆓 Preparedness Framework — OpenAI.
🆓 Scalable oversight via debate & recursive reward modelling — DeepMind Safety Research.

T17 · Fine-tuning, Post-training & RLHF

⭐ 🆓 Ahead of AI — Sebastian Raschka. The canonical fine-tuning / post-training deep-dives.
🆓 The Novice's LLM Training Guide — Community reference.
🆓 DPO: Your language model is secretly a reward model — Rafailov et al.
🆓 The alignment handbook — Hugging Face.

T18 · Inference, Serving, Cost & Latency

⭐ 🆓 Transformer Inference Arithmetic — Kipply.
🆓 LLM Inference Speed of Light — Arseny Kapoulkine.
🆓 Everything I've learned about efficient LLM inference — Baseten engineering blog.
🆓 GPU performance for LLM inference — vLLM team blog.

T19 · Voice, Multi-modal & Embodied Agents

⭐ 🆓 Hello GPT-4o — OpenAI.
🆓 Building a voice agent with LiveKit — LiveKit Agents docs.
🆓 Voice-first LLM products — Latent Space.
🆓 Moshi: a speech-text foundation model — Kyutai.

T20 · Product, UX & Human-AI Interaction

⭐ 🆓 Maggie Appleton essays — Canonical AI-UX thinking.
🆓 Microsoft HAX guidelines for human-AI interaction — Microsoft Research.
🆓 Generative AI: Design Patterns (NNGroup) — Nielsen Norman Group.
🆓 Building products with AI: UX lessons / thesephist.com essays — Linus Lee.

T21 · Economics, Teams, Hiring & Org Design

⭐ 🆓 AI engineering org design — Gergely Orosz, Pragmatic Engineer.
🆓 Building an AI team — Eugene Yan.
🆓 a16z AI canon — a16z.
🆓 16 Changes to the Way Enterprises Build Software with AI — a16z.

🛠️ Tutorials & Cookbooks

Hands-on, code-first guides and official cookbooks from model providers and framework authors.

T1 · Coding Agents & AI-Assisted Development

⭐ 🛠️ 🆓 Claude Code cookbook — Anthropic.
🛠️ 🆓 Aider tutorials — Aider docs.
🛠️ 🆓 Continue.dev recipes — Continue.

T2 · Spec-Driven Development

🛠️ 🆓 GitHub spec-kit — The official spec-driven-development toolkit.
🛠️ 🆓 AGENTS.md examples — Example AGENTS.md files for common stacks.

T3 · Agent IDE Rules & Workflows

🛠️ 🆓 awesome-cursorrules — Curated .cursorrules examples.
🛠️ 🆓 Claude Code slash-commands cookbook — Anthropic.

T5 · Autonomous Software Agents

🛠️ 🆓 SWE-agent quickstart — Princeton NLP.
🛠️ 🆓 OpenHands (formerly OpenDevin) — All Hands AI.

T6 · LLM Application Architecture

⭐ 🛠️ 🆓 OpenAI Cookbook — The reference recipe library for OpenAI APIs.
🛠️ 🆓 Anthropic Cookbook — Claude recipes.
🛠️ 🆓 Gemini API Cookbook — Google.
🛠️ 🆓 Hugging Face Open-Source AI Cookbook — Hugging Face.

T7 · Prompt Engineering

🛠️ 🆓 Anthropic prompt-engineering interactive tutorial — Notebook-based.
🛠️ 🆓 Prompt Engineering Guide notebooks — DAIR.AI.

T8 · Retrieval-Augmented Generation (RAG)

⭐ 🛠️ 🆓 LlamaIndex tutorials — LlamaIndex.
🛠️ 🆓 LangChain RAG from scratch — LangChain.
🛠️ 🆓 Pinecone RAG handbook — Pinecone.
🛠️ 🆓 Advanced RAG notebooks — Nir Diamant. 30+ advanced RAG recipes.

T9 · Memory Systems

🛠️ 🆓 Mem0 quickstart — Mem0.
🛠️ 🆓 Letta (MemGPT) cookbook — Letta.
🛠️ 🆓 LangGraph memory — LangChain.

T10 · Tool Use & MCP

⭐ 🛠️ 🆓 MCP quickstart — Anthropic.
🛠️ 🆓 awesome-mcp-servers — Community reference-servers catalogue.
🛠️ 🆓 OpenAI function calling cookbook — OpenAI.

T11 · Orchestration & Patterns

🛠️ 🆓 LangGraph tutorials — LangChain.
🛠️ 🆓 Anthropic building-effective-agents examples — Anthropic.
🛠️ 🆓 LlamaIndex agent tutorials — LlamaIndex.

T12 · Multi-Agent Systems

🛠️ 🆓 CrewAI examples — CrewAI.
🛠️ 🆓 AutoGen notebook gallery — Microsoft.
🛠️ 🆓 LangGraph multi-agent examples — LangChain.

T13 · Evaluation & Testing

⭐ 🛠️ 🆓 Hamel Husain's evals repo — Companion code to the evals course.
🛠️ 🆓 LangSmith evals tutorials — LangChain.
🛠️ 🆓 RAGAS tutorials — RAG-specific eval cookbook.

T14 · Observability

🛠️ 🆓 Langfuse cookbook — Langfuse.
🛠️ 🆓 Arize Phoenix tutorials — Arize.
🛠️ 🆓 Logfire LLM tracing tutorials — Pydantic.

T15 · Guardrails & Security

🛠️ 🆓 Guardrails AI cookbook — Guardrails AI.
🛠️ 🆓 NVIDIA NeMo Guardrails — NVIDIA.
🛠️ 🆓 Prompt injection CTFs (Gandalf) — Lakera. Hands-on red-team practice.

T17 · Fine-tuning & Post-training

⭐ 🛠️ 🆓 Unsloth notebooks — Fast fine-tuning recipes.
🛠️ 🆓 Axolotl examples — Axolotl.
🛠️ 🆓 Hugging Face TRL tutorials — TRL.

T18 · Inference & Serving

🛠️ 🆓 vLLM examples — vLLM.
🛠️ 🆓 TensorRT-LLM tutorials — NVIDIA.
🛠️ 🆓 llama.cpp server — ggerganov.

T19 · Voice & Multimodal

🛠️ 🆓 OpenAI Realtime API cookbook — OpenAI.
🛠️ 🆓 LiveKit Agents examples — LiveKit.
🛠️ 🆓 Pipecat — Daily. Voice-agent framework with extensive cookbook.

📋 Playbooks & Design-Pattern Catalogs

Opinionated, prescriptive guides distilling design patterns and operational practices.

⭐ 📋 🆓 Building effective agents — Anthropic. The canonical pattern taxonomy (T11).
⭐ 📋 🆓 Patterns for Building LLM-based Systems & Products — Eugene Yan (T6).
⭐ 📋 🆓 What We Learned from a Year of Building with LLMs — Yan/Bensal/Bhawal/Husain/Shankar (T6/T13).
📋 🆓 12-Factor Agents — HumanLayer. Opinionated operational principles for agent apps (T6/T11).
📋 🆓 A practical guide to building agents — OpenAI PDF (T11).
📋 🆓 Claude Code: best practices for agentic coding — Anthropic (T1/T3).
📋 🆓 LangGraph design patterns — LangChain (T11/T12).
📋 🆓 Instructor's RAG patterns — Jason Liu (T8).
📋 🆓 OpenAI's prompt-engineering playbook — OpenAI (T7).
📋 🆓 Anthropic's prompt engineering overview — Anthropic (T7).
📋 🆓 RAG-Fusion, HyDE, and other advanced retrieval patterns — Nir Diamant (T8).
📋 🆓 LLM observability playbook — Hamel Husain (T13/T14).
📋 🆓 OWASP Top 10 for LLM Applications — OWASP. The security-pattern catalogue (T15).
📋 🆓 MITRE ATLAS — Adversarial Threat Landscape for AI Systems (T15).
📋 🆓 NIST AI Risk Management Framework — NIST (T16).
📋 🆓 The LLM inference playbook — Anyscale (T18).
📋 🆓 Prompt-injection defence patterns — Simon Willison (T15).
📋 🆓 a16z AI canon — a16z (T20/T21).
📋 🆓 UX design patterns for AI products — Nielsen Norman Group (T20).

📄 Papers & Research

Foundational papers, surveys, and benchmark papers. Includes a dated milestone-papers table.

Milestone Papers

Date	Keywords	Institution	Paper
2017-06	Transformer	Google	Attention Is All You Need
2018-10	BERT	Google	BERT: Pre-training of Deep Bidirectional Transformers
2020-05	GPT-3, ICL	OpenAI	Language Models are Few-Shot Learners
2020-05	RAG	Meta	RAG for Knowledge-Intensive NLP Tasks
2021-06	LoRA	Microsoft	LoRA: Low-Rank Adaptation of LLMs
2022-01	CoT	Google	Chain-of-Thought Prompting
2022-03	InstructGPT / RLHF	OpenAI	Training LMs to follow instructions with human feedback
2022-10	ReAct	Princeton / Google	ReAct: Synergizing Reasoning and Acting
2022-12	Constitutional AI	Anthropic	Constitutional AI
2023-02	Toolformer	Meta	Toolformer: LMs Can Teach Themselves to Use Tools
2023-03	Reflexion	Northeastern	Reflexion
2023-03	Self-Refine	CMU	Self-Refine: Iterative Refinement
2023-05	Tree of Thoughts	Princeton	Tree of Thoughts
2023-05	QLoRA	UW	QLoRA: Efficient Finetuning of Quantized LLMs
2023-05	Voyager	NVIDIA / Caltech	Voyager: Open-Ended Embodied Agent
2023-05	DPO	Stanford	DPO: Your LM Is Secretly a Reward Model
2023-06	LLM-as-Judge	UC Berkeley	Judging LLM-as-a-Judge
2023-07	Generative Agents	Stanford / Google	Generative Agents: Interactive Simulacra
2023-07	Lost in the Middle	Stanford	Lost in the Middle
2023-07	GCG	CMU	Universal and Transferable Adversarial Attacks
2023-09	Agent survey	Fudan	The Rise and Potential of LLM-based Agents
2023-10	SWE-bench	Princeton	SWE-bench: Can LMs Resolve Real-World Issues?
2023-10	AutoGen	Microsoft	AutoGen: Enabling Multi-Agent Conversations
2023-11	GAIA	Meta / HF	GAIA: Benchmark for General AI Assistants
2023-12	RAG Survey	Tongji	RAG for LLMs: A Survey
2024-02	SWE-agent	Princeton	SWE-agent: Agent-Computer Interfaces
2024-05	Many-shot jailbreaking	Anthropic	Many-shot Jailbreaking
2024-06	Prompt Report	Maryland	The Prompt Report
2024-06	τ-bench	Sierra	τ-bench: Tool-Agent-User benchmark
2024-09	o1 / reasoning	OpenAI	Learning to Reason with LLMs

T1 · Coding Agents & T4 · SWE Benchmarks

📄 🆓 SWE-bench: Can LMs Resolve Real-World GitHub Issues? — Jimenez et al.
📄 🆓 SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering — Yang et al.
📄 🆓 AutoCodeRover: Autonomous Program Improvement — Zhang et al.
📄 🆓 LiveCodeBench — Jain et al.
📄 🆓 BigCodeBench — Zhuo et al.

T5 · Autonomous SWE Agents

📄 🆓 Voyager: An Open-Ended Embodied Agent with LLMs — Wang et al.
📄 🆓 Agentless: Demystifying LLM-based Software Engineering Agents — Xia et al.
📄 🆓 OpenHands / OpenDevin — All Hands AI.

T6 · App Architecture

📄 🆓 Emerging Architectures for LLM Applications — a16z.
📄 🆓 The Prompt Report — Schulhoff et al.

T7 · Prompt Engineering

📄 🆓 Chain-of-Thought Prompting Elicits Reasoning — Wei et al.
📄 🆓 Tree of Thoughts — Yao et al.
📄 🆓 Self-Consistency Improves CoT — Wang et al.
📄 🆓 Large Language Models are Zero-Shot Reasoners — Kojima et al. ("Let's think step by step").

T8 · RAG

📄 🆓 Retrieval-Augmented Generation for Knowledge-Intensive NLP — Lewis et al.
📄 🆓 RAG for LLMs: A Survey — Gao et al.
📄 🆓 Self-RAG: Learning to Retrieve, Generate, and Critique — Asai et al.
📄 🆓 Precise Zero-Shot Dense Retrieval without Relevance Labels (HyDE) — Gao et al.
📄 🆓 Dense Passage Retrieval — Karpukhin et al.

T9 · Memory

📄 🆓 MemGPT: Towards LLMs as Operating Systems — Packer et al.
📄 🆓 Lost in the Middle — Liu et al.
📄 🆓 Generative Agents: Interactive Simulacra of Human Behavior — Park et al.

T10 · Tool Use & MCP

📄 🆓 Toolformer — Schick et al.
📄 🆓 Gorilla: LLM Connected with Massive APIs — Patil et al.
📄 🆓 MRKL Systems — Karpas et al.
📄 🆓 Berkeley Function-Calling Leaderboard — UC Berkeley.

T11 · Orchestration & Patterns

📄 🆓 ReAct: Synergizing Reasoning and Acting — Yao et al.
📄 🆓 Reflexion: Language Agents with Verbal Reinforcement Learning — Shinn et al.
📄 🆓 Self-Refine: Iterative Refinement with Self-Feedback — Madaan et al.
📄 🆓 The Rise and Potential of LLM-based Agents: A Survey — Xi et al.

T12 · Multi-Agent

📄 🆓 AutoGen — Wu et al.
📄 🆓 CAMEL: Communicative Agents for Mind Exploration — Li et al.
📄 🆓 A Survey on LLM-based Autonomous Agents — Wang et al.
📄 🆓 MetaGPT — Hong et al.

T13 · Evaluation

📄 🆓 Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena — Zheng et al.
📄 🆓 HELM: Holistic Evaluation of Language Models — Liang et al.
📄 🆓 Who Validates the Validators? — Shankar et al.

T14 · Observability

📄 🆓 OpenTelemetry Semantic Conventions for Generative AI — OTel.

T15 · Guardrails & Security

📄 🆓 Universal and Transferable Adversarial Attacks on Aligned LLMs (GCG) — Zou et al.
📄 🆓 Many-shot Jailbreaking — Anthropic.
📄 🆓 Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection — Greshake et al.

T16 · Safety & Alignment

📄 🆓 Constitutional AI — Bai et al.
📄 🆓 Scalable Agent Alignment via Reward Modeling — Leike et al.
📄 🆓 Concrete Problems in AI Safety — Amodei et al.

T17 · Fine-tuning & Post-training

📄 🆓 LoRA: Low-Rank Adaptation — Hu et al.
📄 🆓 QLoRA — Dettmers et al.
📄 🆓 Direct Preference Optimization (DPO) — Rafailov et al.
📄 🆓 Training LMs to follow instructions with human feedback (InstructGPT) — Ouyang et al.
📄 🆓 Constitutional AI / RLAIF — Bai et al.

T18 · Inference & Serving

📄 🆓 Efficient Memory Management for LLM Serving with PagedAttention (vLLM) — Kwon et al.
📄 🆓 FlashAttention — Dao et al.
📄 🆓 SGLang: Efficient Execution of Structured Language Model Programs — Zheng et al.
📄 🆓 Fast Inference from Transformers via Speculative Decoding — Leviathan et al.

T19 · Voice & Multimodal

📄 🆓 Robust Speech Recognition via Large-Scale Weak Supervision (Whisper) — Radford et al.
📄 🆓 Moshi — Kyutai.
📄 🆓 Seamless: Multilingual Expressive and Streaming Speech Translation — Meta.

T20 · Product & UX

📄 🆓 Guidelines for Human-AI Interaction — Amershi et al. (Microsoft Research, CHI 2019).

🧪 Benchmarks & Leaderboards

Public benchmarks and leaderboards for coding agents, tool use, RAG, evaluation, and more.

T1 / T4 · Coding Agents & SWE Benchmarks

⭐ 🧪 🆓 SWE-bench — Real-world GitHub-issue resolution benchmark; Verified subset is the de-facto industry standard.
🧪 🆓 Terminal-Bench — Stanford / Laude. Long-horizon terminal task benchmark.
🧪 🆓 LiveCodeBench — Rolling contamination-free coding benchmark.
🧪 🆓 BigCodeBench — Practical programming with diverse function calls.
🧪 🆓 HumanEval+ / EvalPlus — Strengthened HumanEval.
🧪 🆓 MLE-bench — OpenAI. Kaggle-style ML engineering benchmark.

T5 · Autonomous Agents

🧪 🆓 GAIA — General AI Assistants benchmark.
🧪 🆓 AgentBench — Tsinghua. Broad agent capability benchmark.
🧪 🆓 WebArena / VisualWebArena — Web-navigation agents.
🧪 🆓 OSWorld — Desktop OS-controlling agents.
🧪 🆓 MLE-bench — ML-engineering agents.

T8 · RAG

🧪 🆓 RAGAS — Framework and leaderboard for RAG eval.
🧪 🆓 MTEB — Massive Text Embedding Benchmark.
🧪 🆓 BEIR — Zero-shot IR benchmark.
🧪 🆓 ARES — Automated RAG evaluation.

T10 · Tool Use & Function Calling

🧪 🆓 Berkeley Function-Calling Leaderboard (BFCL) — UC Berkeley.
🧪 🆓 τ-bench — Sierra. Tool-agent-user interaction benchmark.
🧪 🆓 API-Bank — Alibaba. Tool-augmented assistants.

T11 · Orchestration / T12 · Multi-Agent

🧪 🆓 AgentBench — General agent-capability.
🧪 🆓 AgentBoard — HKUST. Analytic, fine-grained agent eval.

T13 · Evaluation

🧪 🆓 HELM — Stanford CRFM. Holistic evaluation.
🧪 🆓 Chatbot Arena / LMSYS Arena — Human-preference leaderboard.
🧪 🆓 MMLU-Pro — Harder MMLU.
🧪 🆓 MT-Bench — LLM-as-judge multi-turn.

T15 · Guardrails & Security

🧪 🆓 AdvBench / HarmBench — CAIS. Adversarial / red-team benchmarks.
🧪 🆓 JailbreakBench — Chao et al.
🧪 🆓 PurpleLlama CyberSecEval — Meta.

T16 · Safety & Alignment

🧪 🆓 TruthfulQA — Truthfulness benchmark.
🧪 🆓 BBQ — Bias benchmark.
🧪 🆓 ToxiGen — Toxicity.

T18 · Inference

🧪 🆓 MLPerf Inference — MLCommons. Industry-standard serving benchmark.
🧪 🆓 LLMPerf — Anyscale. Throughput/latency tool.

T19 · Voice & Multimodal

🧪 🆓 MMMU — Multimodal multidiscipline benchmark.
🧪 🆓 VideoMME — Video understanding.
🧪 🆓 Dynabench speech — Live speech-model benchmarks.

🏗️ Reference Implementations & Case Studies

Public production write-ups and canonical reference repositories that teach by example.

T1 / T3 · Coding Agents & IDE Rules

⭐ 🏗️ 🆓 Claude Code — Anthropic's reference agentic CLI.
🏗️ 🆓 Aider — Reference terminal coding agent with detailed engineering blog.
🏗️ 🆓 Cline — Open-source autonomous coding agent.
🏗️ 🆓 OpenHands — All Hands AI. Open-source autonomous SWE agent.

T2 · Spec-Driven Dev

🏗️ 🆓 GitHub spec-kit — Reference spec-driven toolkit.

T5 · Autonomous SWE Agents

🏗️ 🆓 SWE-agent — Princeton NLP. Reference agent for SWE-bench.
🏗️ 🆓 AutoCodeRover — NUS.
🏗️ 🆓 Agentless — Minimal agentless baseline that beat prior agents on SWE-bench Lite.

T6 · App Architecture

🏗️ 🆓 Open Interpreter — Reference local code-execution agent.
🏗️ 🆓 Quivr — Reference full-stack RAG assistant.
🏗️ 🆓 LangChain templates — Reference app scaffolds.

T8 · RAG

⭐ 🏗️ 🆓 LlamaIndex — Reference RAG framework; docs double as case studies.
🏗️ 🆓 RAGFlow — Production-grade RAG reference.
🏗️ 🆓 Verba — Weaviate reference RAG app.
🏗️ 🆓 GraphRAG — Microsoft Research.

T9 · Memory

🏗️ 🆓 Letta (MemGPT) — Reference agentic-memory implementation.
🏗️ 🆓 Mem0 — Reference memory layer.
🏗️ 🆓 Zep — Long-term memory store.

T10 · Tool Use & MCP

🏗️ 🆓 awesome-mcp-servers — Community catalogue of MCP server implementations.
🏗️ 🆓 Anthropic MCP reference servers — The canonical reference MCP servers.

T11 / T12 · Orchestration & Multi-Agent

🏗️ 🆓 LangGraph — Reference graph-based orchestration.
🏗️ 🆓 AutoGen — Microsoft.
🏗️ 🆓 CrewAI — Reference role-based multi-agent.
🏗️ 🆓 Pydantic AI — Type-safe agent framework.

T13 · Evaluation

🏗️ 🆓 EleutherAI lm-evaluation-harness — Standard offline-eval harness.
🏗️ 🆓 DeepEval — Reference eval framework.
🏗️ 🆓 RAGAS — RAG-specific evaluation.

T14 · Observability

🏗️ 🆓 Langfuse — Open-source LLM observability.
🏗️ 🆓 Arize Phoenix — Open-source tracing + evals.
🏗️ 🆓 OpenLLMetry — OTel-based LLM instrumentation.

T15 · Guardrails & Security

🏗️ 🆓 Guardrails AI — Reference guardrails framework.
🏗️ 🆓 NVIDIA NeMo Guardrails — Programmable guardrails.
🏗️ 🆓 Rebuff — Prompt-injection defence reference.

T17 · Fine-tuning

🏗️ 🆓 Unsloth — Fast LoRA/QLoRA reference.
🏗️ 🆓 Axolotl — Reference fine-tuning framework.
🏗️ 🆓 LLaMA-Factory — Unified fine-tuning toolkit.
🏗️ 🆓 Hugging Face alignment-handbook — Reference RLHF/DPO recipes.

T18 · Inference & Serving

⭐ 🏗️ 🆓 vLLM — Reference high-throughput LLM serving.
🏗️ 🆓 SGLang — Structured generation serving.
🏗️ 🆓 llama.cpp — Reference CPU/GPU local inference.
🏗️ 🆓 TensorRT-LLM — NVIDIA reference optimised serving.

T19 · Voice & Multimodal

🏗️ 🆓 LiveKit Agents — Voice-agent reference.
🏗️ 🆓 Pipecat — Daily's voice-agent framework.
🏗️ 🆓 Ultravox — Real-time speech LM.

T20 · Product & UX

🏗️ 🆓 Vercel AI SDK — Reference AI-UI patterns and streaming.
🏗️ 🆓 Open WebUI — Reference local chat UI.
🏗️ 🆓 assistant-ui — Reference React components for AI chat.

🎥 Talks, Workshops & Conferences

Recorded talks, workshops, and conference series worth watching.

Conference series

⭐ 🎥 🆓 AI Engineer Summit / World's Fair — The definitive practitioner conference; full talks on YouTube.
🎥 🆓 NeurIPS / ICML / ICLR — Core ML research venues; most papers include recorded talks.
🎥 🆓 COLM — Conference on Language Modeling. New dedicated LM venue.
🎥 🆓 MLSys — Core ML-systems conference (inference, serving).
🎥 🆓 LlamaCon — Meta's open-source LLM conference.

Canonical talks

⭐ 🎥 🆓 Intro to LLMs — Andrej Karpathy. The reference "how LLMs work" talk.
⭐ 🎥 🆓 Let's build GPT: from scratch, in code — Andrej Karpathy.
⭐ 🎥 🆓 1hr Talk: Intro to LLMs (Nov 2024) — Karpathy updated "Deep Dive into LLMs".
🎥 🆓 State of GPT — Andrej Karpathy (Microsoft Build 2023).
🎥 🆓 Stanford CS25: Transformers United — Full lecture series.

T1 · Coding Agents

🎥 🆓 Mastering Claude Code — Anthropic (Boris Cherny).
🎥 🆓 Cursor: Building the AI-first IDE — Cursor team channel.
🎥 🆓 The future of AI coding — Latent Space talk archives.

T4 · SWE Benchmarks

🎥 🆓 SWE-bench at NeurIPS — Carlos Jimenez.

T6 · App Architecture

🎥 🆓 State of AI Engineering — Latent Space keynotes.
🎥 🆓 Emerging architectures for LLM applications — a16z (video + post).

T7 · Prompt Engineering

🎥 🆓 Anthropic: Prompt Engineering for Business Performance — Anthropic.
🎥 🆓 ChatGPT Prompt Engineering for Developers — Andrew Ng + OpenAI.

T8 · RAG

🎥 🆓 Systematically improving RAG applications — Jason Liu.
🎥 🆓 RAG at scale — LangChain channel series.

T10 · MCP

🎥 🆓 Model Context Protocol deep dive — Anthropic.
🎥 🆓 MCP at AI Engineer Summit — AI Engineer.

T11 / T12 · Orchestration & Multi-Agent

🎥 🆓 Andrew Ng: What's next for AI agentic workflows — Sequoia AI Ascent 2024.
🎥 🆓 LangGraph: multi-agent workflows — LangChain.

T13 · Evaluation

🎥 🆓 Evaluating LLM-based applications — Josh Tobin (DBRX Summit).
🎥 🆓 LLM Evals: MT-Bench and Chatbot Arena — LMSYS.

T14 · Observability

🎥 🆓 OpenTelemetry for LLMs — KubeCon / OTel community talks.

T15 / T16 · Security & Safety

🎥 🆓 Simon Willison on prompt injection — Talks + essays hub.
🎥 🆓 Anthropic AI safety research — Anthropic channel.

T17 · Fine-tuning

🎥 🆓 Let's reproduce GPT-2 / build the GPT tokenizer — Karpathy channel.
🎥 🆓 Fine-tuning workshop — Hamel Husain channel.

T18 · Inference

🎥 🆓 vLLM: high-throughput LLM serving — Anyscale / UC Berkeley talks.
🎥 🆓 CUDA Mode lectures — Community GPU/kernel series.

T19 · Voice & Multimodal

🎥 🆓 OpenAI Realtime API demos — OpenAI.
🎥 🆓 LiveKit voice-agent talks — LiveKit.

T20 · Product & UX

🎥 🆓 AI UX: the next frontier — NNGroup.
🎥 🆓 Linus Lee: tools for thought — Talks archive.

T21 · Economics & Teams

🎥 🆓 a16z AI portfolio talks — a16z.
🎥 🆓 The Pragmatic Engineer on AI teams — Gergely Orosz.

🎧 Podcasts

Recurring podcasts with strong agentic & AI-engineering coverage.

⭐ 🎧 🆓 Latent Space — swyx & Alessio. The AI-engineering podcast of record; guests include most major AI-lab engineers.
⭐ 🎧 🆓 Practical AI — Daniel Whitenack & Chris Benson. Long-running, practitioner-first.
🎧 🆓 MLOps Community podcast — Demetrios Brinkmann. Ops-side operationalisation case studies.
🎧 🆓 Gradient Dissent — Weights & Biases. Applied-ML interviews.
🎧 🆓 The TWIML AI Podcast — Sam Charrington. Longest-running ML interview series.
🎧 🆓 No Priors — Sarah Guo & Elad Gil. Founders / researchers.
🎧 🆓 Cognitive Revolution — Nathan Labenz. Weekly AI engineering + strategy.
🎧 🆓 Dwarkesh Podcast — Dwarkesh Patel. Deep interviews with top researchers.
🎧 🆓 Machine Learning Street Talk — Tim Scarfe. Technical deep-dives.
🎧 🆓 Lex Fridman Podcast — Long-form interviews with AI-lab CEOs and researchers.
🎧 🆓 Unsupervised Learning — Redpoint. AI-founder / operator conversations.
🎧 🆓 Interconnects — Nathan Lambert. RLHF / post-training focus.
🎧 🆓 Pragmatic Engineer — Gergely Orosz. AI-engineering org/hiring coverage.

📰 Newsletters

Weekly and monthly curated newsletters.

⭐ 📰 🆓 The Batch — Andrew Ng / DeepLearning.AI. Weekly AI-engineering digest.
⭐ 📰 🆓 Import AI — Jack Clark (Anthropic co-founder). Policy + research.
⭐ 📰 🆓 Latent Space — swyx. The AI-engineering newsletter of record.
📰 🆓 Simon Willison's Weblog — RSS/email. Daily real-time coverage of tools and agents.
📰 🆓 Ahead of AI — Sebastian Raschka. LLM research + fine-tuning deep-dives.
📰 🆓 The Pragmatic Engineer — Gergely Orosz. AI-engineering hiring/org coverage.
📰 🆓 Interconnects — Nathan Lambert. RLHF / post-training.
📰 🆓 Last Week in AI — Weekly recap.
📰 🆓 TLDR AI — Daily headlines.
📰 🆓 Ben's Bites — Daily digest; founder-friendly.
📰 🆓 Chip Huyen's Blog — Occasional long-form on AI engineering.
📰 🆓 Eugene Yan — Pattern / eval / RAG deep-dives.
📰 🆓 Hamel's Blog — Evals + applied LLMs.
📰 🆓 Machine Learning Engineer Newsletter — Alejandro Saucedo. Weekly production-ML curation.
📰 🆓 MLOps Community newsletter — MLOps Community.
📰 🆓 The Data Exchange — Ben Lorica.

🛡️ Governance, Safety & Responsible AI

Policy frameworks, safety research, red-teaming resources, and responsible-AI guidance.

Policy & frameworks

⭐ 🆓 NIST AI Risk Management Framework (AI RMF 1.0) — NIST. The foundational US framework.
🆓 NIST Generative AI Profile (NIST-AI-600-1) — NIST.
🆓 EU AI Act — European Commission. Official text + implementation timeline.
🆓 UK AI Safety Institute reports — UK AISI.
🆓 OECD AI Principles — International reference.

Lab safety & responsible scaling

⭐ 🆓 Anthropic Responsible Scaling Policy — Anthropic.
🆓 Anthropic Core Views on AI Safety — Anthropic.
🆓 OpenAI Preparedness Framework — OpenAI.
🆓 Google DeepMind: Frontier Safety Framework — Google DeepMind.

Security & red-teaming

⭐ 🆓 OWASP Top 10 for LLM Applications — OWASP.
🆓 MITRE ATLAS — Adversarial threat landscape for AI systems.
🆓 NIST Adversarial ML Taxonomy (NIST AI 100-2) — NIST.
🆓 HarmBench — CAIS.
🆓 Simon Willison's prompt-injection series — SW.

Responsible AI practice

🆓 Microsoft Responsible AI Standard — Microsoft.
🆓 Google Responsible AI practices — Google.
🆓 Fairlearn — Open-source fairness toolkit.
🆓 Partnership on AI — Multi-stakeholder org with published frameworks and incident database.

Papers & research

📄 🆓 Concrete Problems in AI Safety — Amodei et al.
📄 🆓 Constitutional AI — Bai et al.
📄 🆓 Red Teaming Language Models with Language Models — Perez et al.
📄 🆓 Sleeper Agents — Hubinger et al. (Anthropic).

🎨 Product, UX & Economics of AI

Going beyond engineering: designing for AI, human-AI interaction, and the economics of LLM applications.

Design & UX

⭐ 🆓 Guidelines for Human-AI Interaction — Amershi et al. (Microsoft Research). The canonical design heuristics.
🆓 NNGroup: Generative AI design patterns — Nielsen Norman Group.
🆓 Google's People + AI Guidebook — Google PAIR.
🆓 Apple Human Interface Guidelines — Generative AI — Apple.
🆓 Maggie Appleton — Essays on the UX of agentic, malleable software.
🆓 Linus Lee — Essays on interfaces for tools of thought.

Economics & business

⭐ 🆓 a16z: The Economic Case for Generative AI — a16z.
🆓 Latent Space on unit economics — Latent Space.
🆓 Stanford AI Index Report — Stanford HAI. Annual deep economic + research snapshot.
🆓 Epoch AI — Data on compute, cost, and scaling trends.
🆓 Artificial Analysis — Cross-provider pricing/latency/quality dashboards.

Product strategy

🆓 16 Changes to the Way Enterprises Build Software with AI — a16z.
🆓 AI product strategy — Lenny's Newsletter (AI tag).
🆓 Every Inc — Prose-heavy essays on AI product + consumer LLM UX.

🧑‍🤝‍🧑 Teams, Hiring & Org Design

How organisations structure AI-engineering work, hire for it, and operate sustainably.

⭐ 🆓 The Pragmatic Engineer — AI tag — Gergely Orosz. AI-engineering hiring + org design.
🆓 Building the AI Engineer role — swyx / Latent Space. The foundational essay defining "AI Engineer" as a discipline.
🆓 What is an AI Engineer? — Applied LLMs consortium.
🆓 Eugene Yan: Team size and velocity — Eugene Yan.
🆓 Shreya Shankar: Operationalizing ML — Shreya Shankar.
🆓 Staff Engineer — AI org posts — Will Larson & community.
🆓 a16z AI canon — a16z. Curated reading list for people building AI teams.
🆓 Emmanuel Ameisen: Building ML Powered Applications — Book + blog on AI-team building.
🆓 DeepLearning.AI AI Engineer Hiring Report — The Batch periodic coverage.
🆓 Chip Huyen: Machine learning in production — Org-design questions from production ML.
🆓 GitHub: The AI-native developer — GitHub's research on workflows / productivity.

How to suggest a resource

Please use one of the issue templates (resource suggestion, broken link, or trending nomination) or open a pull request following the guidance in CONTRIBUTING.md. The curation methodology and update cadence are documented in NOTES.md.

Update cadence

Weekly: PR triage and broken-link fixes. Monthly: trending rotation and new-resource batches. Quarterly: full thoroughness pass against the checklist in NOTES.md.

License

— To the extent possible under law, the contributors have waived all copyright and related or neighboring rights to this work.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github		.github
images		images
.markdownlint.jsonc		.markdownlint.jsonc
.markdownlintignore		.markdownlintignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTES.md		NOTES.md
README.md		README.md
lychee.toml		lychee.toml

Folders and files

Latest commit

History

Repository files navigation

Awesome Agentic Engineering Resources

Legend

Quick links to sections on this page

Topic Coverage Matrix

Contributing to the list

Want to receive recurring updates on this repo and other advancements

Main Content

⭐ Trending / What's New

🧭 Core & Foundations

T1 · Coding Agents & AI-Assisted Development

T2 · Spec-Driven Development & Context Engineering

T6 · LLM Application Architecture & System Design

T7 · Prompt Engineering

T8 · Retrieval-Augmented Generation (RAG)

T10 · Tool Use, Function Calling & MCP

T11 · Orchestration, Planning & Design Patterns

T13 · Evaluation & Testing

🗓️ Milestones Timeline

👥 Communities

🧑‍🎓 Courses

T1 · Coding Agents & AI-Assisted Development

T4 · SWE Benchmarks & Coding Evaluation

T6 · LLM Application Architecture & System Design

T7 · Prompt Engineering

T8 · Retrieval-Augmented Generation (RAG)

T10 · Tool Use, Function Calling & MCP

T11 · Orchestration, Planning & Design Patterns

T12 · Multi-Agent Systems

T13 · Evaluation & Testing

T14 · Observability, Tracing & Debugging

T15 · Guardrails & Security

T16 · Safety, Alignment & Responsible AI

T17 · Fine-tuning, Post-training & RLHF

T18 · Inference, Serving, Cost & Latency

📘 Books

T1 · Coding Agents & AI-Assisted Development

T6 · LLM Application Architecture & System Design

T7 · Prompt Engineering

T8 · RAG

T10 · Tool Use & MCP

T11 · Orchestration & Design Patterns

T13 · Evaluation

T15 · Guardrails & Security

T16 · Safety, Alignment & Responsible AI

T17 · Fine-tuning & Post-training

T18 · Inference & Serving

T20 · Product & UX

T21 · Economics, Teams & Org

✍️ Articles & Essays

T1 · Coding Agents & AI-Assisted Development

T2 · Spec-Driven Development & Context Engineering

T3 · Agent IDE Rules, Memory Files & Workflows

T4 · SWE Benchmarks & Coding Evaluation

T5 · Autonomous Software Agents

T6 · LLM Application Architecture

T7 · Prompt Engineering

T8 · Retrieval-Augmented Generation (RAG)

T9 · Memory Systems & Long-Context

T10 · Tool Use, Function Calling & MCP

T11 · Orchestration & Design Patterns

T12 · Multi-Agent Systems & Coordination

T13 · Evaluation & Testing

T14 · Observability, Tracing & Debugging

T15 · Guardrails & Security

T16 · Safety, Alignment & Responsible AI

T17 · Fine-tuning, Post-training & RLHF

T18 · Inference, Serving, Cost & Latency

T19 · Voice, Multi-modal & Embodied Agents

T20 · Product, UX & Human-AI Interaction

T21 · Economics, Teams, Hiring & Org Design

🛠️ Tutorials & Cookbooks

T1 · Coding Agents & AI-Assisted Development

T2 · Spec-Driven Development

T3 · Agent IDE Rules & Workflows

T5 · Autonomous Software Agents

T6 · LLM Application Architecture