A curated list of high-signal resources β articles, books, courses, cookbooks, papers, playbooks, benchmarks, talks, podcasts, and newsletters β for agentic engineering and AI engineering.
This is a resources list, not a tools list. Open-source tools for building agentic systems live in the sister list awesome-production-agentic-systems; production ML tooling lives in awesome-production-machine-learning. This list covers the learning, design, and operational resources that sit alongside those tools β including both:
Agentic engineering focuses on using AI agents to do software engineering (Copilot, Cursor, Claude Code, Aider, Cline, Windsurf, Codex; spec-driven development; context engineering; agent IDE rules and memory files; SWE benchmarks). AI / agentic systems engineering focuses on building agentic and LLM-powered systems (architecture, RAG, memory, tool use & MCP, orchestration, multi-agent coordination, evaluation, observability, guardrails, safety, fine-tuning, inference, product/UX, economics, teams).
You can keep up to date by watching this repo for the monthly releases summarising newly added resources π€©
This list was proposed in EthicalML/awesome-production-machine-learning#709 as a sister list focused on resources rather than tools.
Resources are tagged with icons so you can scan and filter at a glance:
| Icon | Meaning |
|---|---|
| β | Editors' pick β start here |
| π | Free to access |
| π° | Paid |
| π | Book |
| π§βπ | Course |
| π₯ | Video / talk |
| π§ | Audio / podcast |
| π | Paper |
| π οΈ | Hands-on cookbook / tutorial |
| π | Playbook / design-pattern catalog |
| π§ͺ | Benchmark / leaderboard |
| ποΈ | Reference implementation / case study |
| π° | Newsletter |
Resources are organised as a matrix: the top-level sections above (rows) are resource types, and each section is sub-divided by topic. The 21 topics, T1βT21, are shared across sections. This lets you read vertically ("what papers exist on RAG?") or horizontally ("where do I find resources on Coding Agents?").
Topics:
| # | Topic |
|---|---|
| T1 | Coding Agents & AI-Assisted Development (Copilot, Cursor, Claude Code, Aider, Cline, Windsurf, Codex) |
| T2 | Spec-Driven Development & Context Engineering (AGENTS.md, spec-kit, rules files) |
| T3 | Agent IDE Rules, Memory Files & Developer Workflows |
| T4 | SWE Benchmarks & Coding Evaluation |
| T5 | Autonomous Software Agents & Long-Horizon Engineering Tasks |
| T6 | LLM Application Architecture & System Design |
| T7 | Prompt Engineering |
| T8 | Retrieval-Augmented Generation (RAG) |
| T9 | Memory Systems & Long-Context |
| T10 | Tool Use, Function Calling & MCP |
| T11 | Orchestration, Planning & Design Patterns |
| T12 | Multi-Agent Systems & Coordination |
| T13 | Evaluation & Testing |
| T14 | Observability, Tracing & Debugging |
| T15 | Guardrails & Security (prompt injection, jailbreaks, red-teaming) |
| T16 | Safety, Alignment & Responsible AI |
| T17 | Fine-tuning, Post-training, RLHF & Reasoning Training |
| T18 | Inference, Serving, Cost & Latency |
| T19 | Voice, Multi-modal & Embodied Agents |
| T20 | Product, UX & Human-AI Interaction Design |
| T21 | Economics, Teams, Hiring & Org Design |
Coverage (β = populated, β = opportunistic / partial, β = out of scope for that row):
| Row \ Topic | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9 | T10 | T11 | T12 | T13 | T14 | T15 | T16 | T17 | T18 | T19 | T20 | T21 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Core & Foundations | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β |
| Communities | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β |
| Courses | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β |
| Books | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β |
| Articles & Essays | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β |
| Tutorials & Cookbooks | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β |
| Playbooks & Patterns | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β |
| Papers & Research | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β |
| Benchmarks | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β |
| Reference Impls | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β |
| Talks & Conferences | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β |
| Podcasts | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β |
| Newsletters | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β | β |
The Trending / What's New, Milestones Timeline, Governance & Responsible AI, Product / UX / Economics, and Teams, Hiring & Org Design sections collapse across topics and are presented as curated lists rather than matrix cells.
Please review our CONTRIBUTING.md before submitting a PR β it explains the one-line description style, how to pick the right row/topic cell, and the quality bar for inclusion. Thank you to the community for supporting the list's growth π
| You can join the Machine Learning Engineer newsletter. Join over 70,000 ML professionals and enthusiasts who receive weekly curated articles & tutorials on production Machine Learning. |
|
| Also check out Awesome Production Agentic Systems and Awesome Production Machine Learning, the sister lists of open-source tools for agentic systems and production ML respectively. |
|
Rotating pinned items: the most-discussed agentic & AI-engineering resources of the current cycle. Refreshed regularly β see CONTRIBUTING.md for nomination criteria.
- β π Building effective agents β Anthropic (2024). The most-cited reference for agent design patterns (augmented LLM, prompt chaining, routing, parallelisation, orchestrator-workers, evaluator-optimiser, autonomous agents). Start here before any other agent reading.
- β π How we built our multi-agent research system β Anthropic (2025). Production retrospective on Claude's multi-agent research mode: orchestrator/subagent split, prompt engineering for agents, evaluation and failure modes.
- β π A practical guide to building agents β OpenAI (2025). 30-page PDF covering when (and when not) to build agents, tool design, guardrails, and human-in-the-loop patterns.
- β π The bitter lesson of AI agents / Agentic Coding: The Future of Software Development with Agents β Armin Ronacher (2025). Widely-shared essays on what it actually feels like to ship with agentic coding tools day-to-day.
- π Claude Code: Best practices for agentic coding β Anthropic (2025). CLAUDE.md, slash-commands, headless mode, custom permissions β the canonical how-to-use-Claude-Code reference.
- π How to build an agent β Thorsten Ball / Amp (2025). Viral step-by-step implementation of a tool-using coding agent in ~400 lines of Go, demystifying "what is an agent" in code.
- π The new code β Sean Grove / OpenAI on Latent Space (2025). Specs-as-code: the spec is the new artefact, models are the compiler. Heavily cited in the AGENTS.md / spec-kit discussion.
- π AGENTS.md β Community standard (2025) for per-repo agent instructions, now read by Claude Code, Codex, Aider, Cursor, Cline, Windsurf and others.
Canonical "what is agentic engineering / AI engineering" reading. Start here.
- β π Building effective agents β Anthropic. The reference taxonomy of agent design patterns (workflows vs. agents).
- β π Claude Code: Best practices for agentic coding β Anthropic. CLAUDE.md, tools, slash-commands, headless mode.
- π How to build an agent β Thorsten Ball. A working coding agent in ~400 lines; the clearest "agents are not magic" walkthrough.
- π Here's how I use LLMs to help me write code β Simon Willison. Grounded, practice-first account of daily LLM-assisted development.
- β π The new code β Sean Grove (OpenAI) on Latent Space. The canonical "specs are the new code" essay.
- π AGENTS.md β Community standard for per-repo agent instructions.
- π spec-kit β GitHub's toolkit and essay set on spec-driven development with coding agents.
- π The rise of "context engineering" β LangChain. Why prompt engineering became context engineering.
- β π π° AI Engineering β Chip Huyen (O'Reilly, 2025). The textbook for building LLM applications end-to-end.
- β π Patterns for Building LLM-based Systems & Products β Eugene Yan. Evaluation, RAG, fine-tuning, caching, guardrails, defensive UX, collecting feedback β the reference pattern catalogue.
- π Emerging Architectures for LLM Applications β a16z. The widely-shared reference diagram for the LLM app stack.
- π What We Learned from a Year of Building with LLMs β Yan, Bensal, Bhawal, Husain, Shankar (2024). Tactical, operational, and strategic lessons distilled from shipping.
- β π Prompt Engineering β Lilian Weng (OpenAI). The systematic taxonomy.
- π Prompt Engineering Guide β DAIR.AI. Continuously updated, with per-technique deep-dives.
- π OpenAI: Prompt engineering β OpenAI official guide.
- π Anthropic: Prompt engineering overview β Anthropic's practical guide for Claude.
- β π π Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks β Lewis et al. (2020). The original RAG paper.
- β π Advanced RAG Techniques / Pinecone Learn β Pinecone. The hub for RAG primers and patterns.
- π Retrieval-Augmented Generation for LLMs: A Survey β Gao et al. (2023). The reference survey.
- π RAG is more than just embedding search β Jason Liu. Systems-view RAG: query understanding, tool routing, evaluation.
- β π Introducing the Model Context Protocol β Anthropic (2024). The canonical introduction to MCP.
- β π Model Context Protocol β Specification β Open protocol docs and SDKs.
- π π Toolformer: Language Models Can Teach Themselves to Use Tools β Schick et al. (2023). The foundational tool-use paper.
- π Function calling guide β OpenAI. The canonical reference for structured tool calls.
- β π Building effective agents β Anthropic. The orchestration pattern taxonomy.
- π LLM Powered Autonomous Agents β Lilian Weng. The canonical deep-dive on planning, memory, and tool use in agent loops.
- π π ReAct: Synergizing Reasoning and Acting in Language Models β Yao et al. (2022). The foundational reason+act loop.
- π π The Rise and Potential of LLM Based Agents: A Survey β Xi et al. (2023). Survey of agent architectures and components.
- β π Your AI Product Needs Evals β Hamel Husain. The most-cited essay on why and how to build evals for LLM products.
- π Task-Specific LLM Evals that Do & Don't Work β Eugene Yan. A pragmatic survey of eval techniques per task type.
- π π Judging LLM-as-a-Judge β Zheng et al. (2023). The foundational LLM-as-judge paper (MT-Bench, Chatbot Arena).
- π Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences β Shankar et al. (2024). How to make LLM-judges trustworthy.
Dated, field-defining events that shaped agentic & AI engineering.
| Date | Event | Reference |
|---|---|---|
| 2017-06 | Transformer architecture introduced | Attention Is All You Need |
| 2020-05 | GPT-3 shows in-context learning at scale | Language Models are Few-Shot Learners |
| 2020-05 | RAG framework introduced | RAG for Knowledge-Intensive NLP |
| 2021-06 | GitHub Copilot preview launches β first mainstream AI coding assistant | GitHub blog |
| 2022-01 | Chain-of-Thought prompting | Wei et al. |
| 2022-03 | InstructGPT / RLHF | Ouyang et al. |
| 2022-10 | ReAct: reasoning + acting agent loop | Yao et al. |
| 2022-11 | ChatGPT release β mainstream adoption inflection | OpenAI |
| 2023-03 | GPT-4 release | OpenAI |
| 2023-03 | HuggingGPT / Toolformer-era tool use | Toolformer |
| 2023-03 | LangChain & LlamaIndex hit mainstream | β |
| 2023-05 | Voyager: open-ended agents in Minecraft | Voyager |
| 2023-06 | Simon Willison coins "prompt injection" as a durable threat category | SW blog |
| 2023-10 | SWE-bench released β real-world coding eval | SWE-bench |
| 2023-12 | Mixture-of-experts open models (Mixtral) | Mistral |
| 2024-03 | Devin demo β autonomous software agent pitch | Cognition |
| 2024-05 | GPT-4o: native multi-modal + realtime voice | OpenAI |
| 2024-06 | Anthropic's "Building effective agents" publishes | Anthropic |
| 2024-07 | SWE-bench Verified launched | OpenAI |
| 2024-09 | o1 reveals reasoning-model era | OpenAI |
| 2024-11 | Model Context Protocol (MCP) announced | Anthropic |
| 2025-02 | Claude Code general availability | Anthropic |
| 2025-05 | AGENTS.md published as cross-agent standard | agents.md |
| 2025-06 | GitHub spec-kit / "new code" essays formalise spec-driven dev | spec-kit |
Discords, Slacks, forums, and meetups where practitioners gather.
- π MLOps Community β Slack + podcast + meetups; the biggest practitioner community at the ops/engineering intersection. Active agent and LLM-ops channels.
- π LangChain Discord β Heavy day-to-day Q&A on agent orchestration, RAG, evaluation, MCP.
- π LlamaIndex Discord β RAG-centric builder community with active reference-impl discussion.
- π r/LocalLLaMA β The definitive open-weights / local-inference forum; fastest signal for new models, quantisation, and serving.
- π r/MachineLearning β Academic and practitioner mix; where new papers and threads get dissected.
- π Hacker News β Filter for "LLM", "agent", "Claude", "Cursor" β where engineering-side essays trend.
- π EleutherAI Discord β Open research community; strong training/interpretability discussion.
- π Hugging Face Discord & Forums β Transformers, TRL, PEFT, model-hub discussions.
- π AI Engineer World's Fair / Latent Space Discord β Practitioner community anchoring the AI Engineer conference series.
- π AI Dev Board β Community-curated hub for AI engineering resources and discussions.
- π Cursor Community Forum β User-driven forum for Cursor rules, MCP, and workflows.
- π Anthropic Discord β Official Claude / Claude Code / MCP community.
Structured courses β free and paid, university and industry.
- β π§βπ π AI Python for Beginners β DeepLearning.AI (Andrew Ng). Gateway to AI-assisted coding.
- π§βπ π Pair Programming with a Large Language Model β DeepLearning.AI + Google.
- π§βπ π GitHub Copilot Fundamentals β Microsoft Learn. Official training path.
- π§βπ π Evaluating and Debugging Generative AI β DeepLearning.AI + W&B. Covers coding-eval mechanics.
- π§βπ π Mastering LLMs: Evals β Hamel Husain & Shreya Shankar (Maven). Companion evals-for-LLMs curriculum.
- π§βπ π SWE-bench tutorial β Princeton NLP. Free, self-paced walk-through of running and scoring coding evals.
- β π§βπ π LLM Bootcamp β Full Stack Deep Learning. Free 2-day bootcamp on building LLM apps end-to-end.
- π§βπ π Building Systems with the ChatGPT API β DeepLearning.AI + OpenAI.
- π§βπ π CS25: Transformers United β Stanford. Seminal deep-dive seminar series.
- β π§βπ π ChatGPT Prompt Engineering for Developers β Andrew Ng & Isa Fulford (OpenAI).
- π§βπ π Anthropic Prompt Engineering Interactive Tutorial β Anthropic. Hands-on, notebook-based.
- π§βπ π Prompt Engineering Guide (DAIR.AI) β Self-paced, continuously updated.
- π§βπ π Advanced Retrieval for AI with Chroma β DeepLearning.AI.
- π§βπ π Building and Evaluating Advanced RAG Applications β DeepLearning.AI + LlamaIndex + TruEra.
- π§βπ π LangChain Chat with Your Data β DeepLearning.AI + LangChain.
- π§βπ π° Systematically Improving RAG Applications β Jason Liu on Maven.
- π§βπ π Functions, Tools and Agents with LangChain β DeepLearning.AI + LangChain.
- π§βπ π MCP: Build Rich-Context AI Apps with Anthropic β DeepLearning.AI + Anthropic.
- π§βπ π Introduction to MCP β Anthropic official quickstart.
- π§βπ π AI Agents in LangGraph β DeepLearning.AI + LangChain.
- π§βπ π AI Agentic Design Patterns with AutoGen β DeepLearning.AI + Microsoft.
- π§βπ π Hugging Face Agents Course β Hugging Face. Free, certifying course on agent fundamentals.
- π§βπ π Multi AI Agent Systems with crewAI β DeepLearning.AI + crewAI.
- π§βπ π Practical Multi AI Agents and Advanced Use Cases with crewAI β DeepLearning.AI.
- π§βπ π Building Agentic RAG with LlamaIndex β DeepLearning.AI + LlamaIndex.
- β π§βπ π° AI Evals For Engineers & PMs β Hamel Husain & Shreya Shankar on Maven. The industry-standard evals cohort course.
- π§βπ π Quality and Safety for LLM Applications β DeepLearning.AI + WhyLabs.
- π§βπ π Automated Testing for LLMOps β DeepLearning.AI + CircleCI.
- π§βπ π LLMOps β DeepLearning.AI + Google Cloud.
- π§βπ π Evaluating LLMs with Arize β Arize course hub.
- π§βπ π LangSmith Academy β LangChain. Free self-paced LangSmith courses covering tracing and evals.
- π§βπ π Red Teaming LLM Applications β DeepLearning.AI + Giskard.
- π§βπ π Safe and Reliable AI via Guardrails β DeepLearning.AI + Guardrails AI.
- π§βπ π Prompt Injection Attacks (Learn Prompting) β Learn Prompting. Open course covering injection/jailbreak taxonomies.
- π§βπ π AI Safety Fundamentals β BlueDot Impact. The standard entry curriculum.
- π§βπ π ARENA (Alignment Research Engineer Accelerator) β Hands-on alignment / interpretability.
- π§βπ π Intro to AI Safety, Remastered β Richard Ngo / BlueDot. Free reading curriculum.
- β π§βπ π Finetuning Large Language Models β DeepLearning.AI + Lamini.
- π§βπ π Reinforcement Learning from Human Feedback β DeepLearning.AI + Google Cloud.
- π§βπ π Hugging Face NLP Course (incl. RLHF chapter) β Hugging Face.
- π§βπ π Efficiently Serving LLMs β DeepLearning.AI + Predibase.
- π§βπ π Quantization Fundamentals with Hugging Face β DeepLearning.AI + HF.
- π§βπ π CUDA Mode lectures β Community lectures on GPU inference internals.
Published and in-progress books covering agentic & AI engineering.
- β π π° AI-Assisted Programming β Tom Taulli (O'Reilly, 2024). Practical coverage of Copilot/Cursor/Claude workflows.
- π π° Prompt Engineering for Generative AI β James Phoenix & Mike Taylor (O'Reilly, 2024). Includes heavy coverage of code-generation prompting patterns.
- β π π° AI Engineering: Building Applications with Foundation Models β Chip Huyen (O'Reilly, 2025). The reference textbook for the field.
- π π° Designing Machine Learning Systems β Chip Huyen (O'Reilly, 2022). The prior-generation canonical ML-systems text; still essential for data/infra context.
- π π° Generative AI on AWS β Chris Fregly, Antje Barth, Shelbee Eigenbrode (O'Reilly, 2023).
- π π Prompt Engineering for LLMs β John Berryman & Albert Ziegler (O'Reilly, 2024). From Copilot's original tech-lead.
- π π° The Prompt Report β Schulhoff et al. (2024). A 76-page survey that effectively functions as a book-length prompting reference.
- π π° Building LLM Apps β Valentina Alto (Wiley, 2024). RAG-heavy application text.
- π π RAG-Driven Generative AI β Denis Rothman (Packt, 2024).
- π π° Building Intelligent Apps with OpenAI β Olivier Caelen & Marie-Alice Blete (O'Reilly, 2024). Heavy function-calling coverage.
- π π° Generative AI with LangChain β Ben Auffarth (Packt, 2023). Orchestration patterns end-to-end.
- π π° Prompt Engineering for Generative AI β Phoenix & Taylor (O'Reilly, 2024). Chapter-length eval coverage.
- π π° The Developer's Playbook for Large Language Model Security β Steve Wilson (O'Reilly, 2024). OWASP LLM Top 10 project lead's book.
- π π° Generative AI Security β Ken Huang et al. (Apress, 2024).
- π π° Human Compatible β Stuart Russell (2019). The foundational alignment argument.
- π π° The Alignment Problem β Brian Christian (2020). The canonical popular-press primer.
- β π π° Build a Large Language Model (From Scratch) β Sebastian Raschka (Manning, 2024). The reference hands-on text.
- π π° Hands-On Large Language Models β Jay Alammar & Maarten Grootendorst (O'Reilly, 2024).
- π π° Efficient Processing of Deep Neural Networks β Sze et al. (Morgan & Claypool). Hardware/inference reference.
- π π° Designing Machine Learning Systems β Chip Huyen. Includes pragmatic product/UX chapters.
- π π° Human-AI Interaction Design β IxDF topic hub.
- π π° Managing Machine Learning Projects β Simon Thompson (Manning).
- π π The Pragmatic Engineer's AI coverage β Gergely Orosz. Regularly-updated editorial that functions as a rolling book on AI-engineering org design.
Long-form writing from canonical authors and engineering teams.
- β π Here's how I use LLMs to help me write code β Simon Willison.
- π Agentic Coding: The Future of Software Development β Armin Ronacher.
- π Revenge of the junior developer β Steve Yegge (Sourcegraph).
- π The death of the stubborn developer β Steve Yegge.
- β π The new code β Sean Grove / Latent Space.
- π Context Engineering β LangChain.
- π The rise of "context engineering" β LangChain.
- π Spec-driven development with AI β GitHub Blog.
- β π Claude Code: Best practices for agentic coding β Anthropic.
- π Cursor rules directory β Community catalogue of
.cursorrulesfiles. - π My Claude Code setup β widely-shared CLAUDE.md + slash-command playbook.
- π Aider: Tips for using with large codebases β Aider docs.
- β π Introducing SWE-bench Verified β OpenAI.
- π Why we built Terminal-Bench β Stanford / Laude.
- π Measuring an AI system's ability to do ML R&D β METR.
- π The leaderboard illusion β Singh et al. on bench-gaming.
- β π How we built our multi-agent research system β Anthropic.
- π Devin, a software engineer β Cognition.
- π Don't build multi-agents β Cognition. Contrarian but important counterpoint to multi-agent maximalism.
- π SWE-agent: Agent-Computer Interfaces β Princeton NLP writeup.
- β π Patterns for Building LLM-based Systems & Products β Eugene Yan.
- π Emerging Architectures for LLM Applications β a16z.
- π What We Learned from a Year of Building with LLMs β Yan/Bensal/Bhawal/Husain/Shankar.
- π Twelve factor agents β HumanLayer. The "12-factor app" equivalent for agent apps.
- β π Prompt Engineering β Lilian Weng.
- π Prompting is programming β Eugene Yan.
- π A guide to prompting Claude β Anthropic.
- π The prompt report β Learn Prompting team summary of their 76-page survey.
- β π RAG is more than just embedding search β Jason Liu.
- π How to improve your RAG system's performance β Anyscale.
- π Advanced RAG Techniques β Pinecone.
- π Practical considerations in RAG application design β Eugene Yan.
- β π Lost in the Middle: How Language Models Use Long Contexts β Liu et al.
- π Memory for agents β LangChain.
- π Extending Context Length in LLMs β Hugging Face.
- π The agentic memory stack β Letta (MemGPT).
- β π Introducing the Model Context Protocol β Anthropic.
- π Function calling with LLMs: a practical guide β DAIR.AI.
- π Tool use is eating the world β Latent Space.
- π Designing MCP servers that agents actually use β Phil Schmid.
- β π LLM Powered Autonomous Agents β Lilian Weng.
- π Building effective agents β Anthropic.
- π Agent design patterns β Andrew Ng, The Batch series.
- π AI agent frameworks β Latent Space comparative review.
- β π How we built our multi-agent research system β Anthropic.
- π Multi-agent workflows β LangChain.
- π Don't build multi-agents β Cognition.
- π AutoGen: Enabling next-gen LLM applications β Microsoft.
- β π Your AI product needs evals β Hamel Husain.
- π Task-specific LLM evals that do & don't work β Eugene Yan.
- π Creating a LLM-as-a-Judge that drives business results β Hamel Husain.
- π LLM evals: everything I learned in 12 months β Shreya Shankar.
- β π So you want to build an LLM observability platform β Hamel Husain (subsection of evals post; foundational).
- π The OpenTelemetry Gen AI semantic conventions β OTel.
- π How Honeycomb uses LLMs for product experiences β Phillip Carter.
- π Logfire: observability for the LLM era β Pydantic.
- β π Prompt injection series β Simon Willison. Canonical ongoing series.
- π OWASP Top 10 for LLM Applications β OWASP.
- π Universal and Transferable Adversarial Attacks on Aligned LLMs β Zou et al. (GCG attack).
- π Red teaming LLMs β Hugging Face.
- β π Core Views on AI Safety β Anthropic.
- π Anthropic's Responsible Scaling Policy β Anthropic.
- π Preparedness Framework β OpenAI.
- π Scalable oversight via debate & recursive reward modelling β DeepMind Safety Research.
- β π Ahead of AI β Sebastian Raschka. The canonical fine-tuning / post-training deep-dives.
- π The Novice's LLM Training Guide β Community reference.
- π DPO: Your language model is secretly a reward model β Rafailov et al.
- π The alignment handbook β Hugging Face.
- β π Transformer Inference Arithmetic β Kipply.
- π LLM Inference Speed of Light β Arseny Kapoulkine.
- π Everything I've learned about efficient LLM inference β Baseten engineering blog.
- π GPU performance for LLM inference β vLLM team blog.
- β π Hello GPT-4o β OpenAI.
- π Building a voice agent with LiveKit β LiveKit Agents docs.
- π Voice-first LLM products β Latent Space.
- π Moshi: a speech-text foundation model β Kyutai.
- β π Maggie Appleton essays β Canonical AI-UX thinking.
- π Microsoft HAX guidelines for human-AI interaction β Microsoft Research.
- π Generative AI: Design Patterns (NNGroup) β Nielsen Norman Group.
- π Building products with AI: UX lessons / thesephist.com essays β Linus Lee.
- β π AI engineering org design β Gergely Orosz, Pragmatic Engineer.
- π Building an AI team β Eugene Yan.
- π a16z AI canon β a16z.
- π 16 Changes to the Way Enterprises Build Software with AI β a16z.
Hands-on, code-first guides and official cookbooks from model providers and framework authors.
- β π οΈ π Claude Code cookbook β Anthropic.
- π οΈ π Aider tutorials β Aider docs.
- π οΈ π Continue.dev recipes β Continue.
- π οΈ π GitHub spec-kit β The official spec-driven-development toolkit.
- π οΈ π AGENTS.md examples β Example
AGENTS.mdfiles for common stacks.
- π οΈ π awesome-cursorrules β Curated
.cursorrulesexamples. - π οΈ π Claude Code slash-commands cookbook β Anthropic.
- π οΈ π SWE-agent quickstart β Princeton NLP.
- π οΈ π OpenHands (formerly OpenDevin) β All Hands AI.
- β π οΈ π OpenAI Cookbook β The reference recipe library for OpenAI APIs.
- π οΈ π Anthropic Cookbook β Claude recipes.
- π οΈ π Gemini API Cookbook β Google.
- π οΈ π Hugging Face Open-Source AI Cookbook β Hugging Face.
- π οΈ π Anthropic prompt-engineering interactive tutorial β Notebook-based.
- π οΈ π Prompt Engineering Guide notebooks β DAIR.AI.
- β π οΈ π LlamaIndex tutorials β LlamaIndex.
- π οΈ π LangChain RAG from scratch β LangChain.
- π οΈ π Pinecone RAG handbook β Pinecone.
- π οΈ π Advanced RAG notebooks β Nir Diamant. 30+ advanced RAG recipes.
- π οΈ π Mem0 quickstart β Mem0.
- π οΈ π Letta (MemGPT) cookbook β Letta.
- π οΈ π LangGraph memory β LangChain.
- β π οΈ π MCP quickstart β Anthropic.
- π οΈ π awesome-mcp-servers β Community reference-servers catalogue.
- π οΈ π OpenAI function calling cookbook β OpenAI.
- π οΈ π LangGraph tutorials β LangChain.
- π οΈ π Anthropic building-effective-agents examples β Anthropic.
- π οΈ π LlamaIndex agent tutorials β LlamaIndex.
- π οΈ π CrewAI examples β CrewAI.
- π οΈ π AutoGen notebook gallery β Microsoft.
- π οΈ π LangGraph multi-agent examples β LangChain.
- β π οΈ π Hamel Husain's evals repo β Companion code to the evals course.
- π οΈ π LangSmith evals tutorials β LangChain.
- π οΈ π RAGAS tutorials β RAG-specific eval cookbook.
- π οΈ π Langfuse cookbook β Langfuse.
- π οΈ π Arize Phoenix tutorials β Arize.
- π οΈ π Logfire LLM tracing tutorials β Pydantic.
- π οΈ π Guardrails AI cookbook β Guardrails AI.
- π οΈ π NVIDIA NeMo Guardrails β NVIDIA.
- π οΈ π Prompt injection CTFs (Gandalf) β Lakera. Hands-on red-team practice.
- β π οΈ π Unsloth notebooks β Fast fine-tuning recipes.
- π οΈ π Axolotl examples β Axolotl.
- π οΈ π Hugging Face TRL tutorials β TRL.
- π οΈ π vLLM examples β vLLM.
- π οΈ π TensorRT-LLM tutorials β NVIDIA.
- π οΈ π llama.cpp server β ggerganov.
- π οΈ π OpenAI Realtime API cookbook β OpenAI.
- π οΈ π LiveKit Agents examples β LiveKit.
- π οΈ π Pipecat β Daily. Voice-agent framework with extensive cookbook.
Opinionated, prescriptive guides distilling design patterns and operational practices.
- β π π Building effective agents β Anthropic. The canonical pattern taxonomy (T11).
- β π π Patterns for Building LLM-based Systems & Products β Eugene Yan (T6).
- β π π What We Learned from a Year of Building with LLMs β Yan/Bensal/Bhawal/Husain/Shankar (T6/T13).
- π π 12-Factor Agents β HumanLayer. Opinionated operational principles for agent apps (T6/T11).
- π π A practical guide to building agents β OpenAI PDF (T11).
- π π Claude Code: best practices for agentic coding β Anthropic (T1/T3).
- π π LangGraph design patterns β LangChain (T11/T12).
- π π Instructor's RAG patterns β Jason Liu (T8).
- π π OpenAI's prompt-engineering playbook β OpenAI (T7).
- π π Anthropic's prompt engineering overview β Anthropic (T7).
- π π RAG-Fusion, HyDE, and other advanced retrieval patterns β Nir Diamant (T8).
- π π LLM observability playbook β Hamel Husain (T13/T14).
- π π OWASP Top 10 for LLM Applications β OWASP. The security-pattern catalogue (T15).
- π π MITRE ATLAS β Adversarial Threat Landscape for AI Systems (T15).
- π π NIST AI Risk Management Framework β NIST (T16).
- π π The LLM inference playbook β Anyscale (T18).
- π π Prompt-injection defence patterns β Simon Willison (T15).
- π π a16z AI canon β a16z (T20/T21).
- π π UX design patterns for AI products β Nielsen Norman Group (T20).
Foundational papers, surveys, and benchmark papers. Includes a dated milestone-papers table.
- π π SWE-bench: Can LMs Resolve Real-World GitHub Issues? β Jimenez et al.
- π π SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering β Yang et al.
- π π AutoCodeRover: Autonomous Program Improvement β Zhang et al.
- π π LiveCodeBench β Jain et al.
- π π BigCodeBench β Zhuo et al.
- π π Voyager: An Open-Ended Embodied Agent with LLMs β Wang et al.
- π π Agentless: Demystifying LLM-based Software Engineering Agents β Xia et al.
- π π OpenHands / OpenDevin β All Hands AI.
- π π Emerging Architectures for LLM Applications β a16z.
- π π The Prompt Report β Schulhoff et al.
- π π Chain-of-Thought Prompting Elicits Reasoning β Wei et al.
- π π Tree of Thoughts β Yao et al.
- π π Self-Consistency Improves CoT β Wang et al.
- π π Large Language Models are Zero-Shot Reasoners β Kojima et al. ("Let's think step by step").
- π π Retrieval-Augmented Generation for Knowledge-Intensive NLP β Lewis et al.
- π π RAG for LLMs: A Survey β Gao et al.
- π π Self-RAG: Learning to Retrieve, Generate, and Critique β Asai et al.
- π π Precise Zero-Shot Dense Retrieval without Relevance Labels (HyDE) β Gao et al.
- π π Dense Passage Retrieval β Karpukhin et al.
- π π MemGPT: Towards LLMs as Operating Systems β Packer et al.
- π π Lost in the Middle β Liu et al.
- π π Generative Agents: Interactive Simulacra of Human Behavior β Park et al.
- π π Toolformer β Schick et al.
- π π Gorilla: LLM Connected with Massive APIs β Patil et al.
- π π MRKL Systems β Karpas et al.
- π π Berkeley Function-Calling Leaderboard β UC Berkeley.
- π π ReAct: Synergizing Reasoning and Acting β Yao et al.
- π π Reflexion: Language Agents with Verbal Reinforcement Learning β Shinn et al.
- π π Self-Refine: Iterative Refinement with Self-Feedback β Madaan et al.
- π π The Rise and Potential of LLM-based Agents: A Survey β Xi et al.
- π π AutoGen β Wu et al.
- π π CAMEL: Communicative Agents for Mind Exploration β Li et al.
- π π A Survey on LLM-based Autonomous Agents β Wang et al.
- π π MetaGPT β Hong et al.
- π π Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena β Zheng et al.
- π π HELM: Holistic Evaluation of Language Models β Liang et al.
- π π Who Validates the Validators? β Shankar et al.
- π π OpenTelemetry Semantic Conventions for Generative AI β OTel.
- π π Universal and Transferable Adversarial Attacks on Aligned LLMs (GCG) β Zou et al.
- π π Many-shot Jailbreaking β Anthropic.
- π π Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection β Greshake et al.
- π π Constitutional AI β Bai et al.
- π π Scalable Agent Alignment via Reward Modeling β Leike et al.
- π π Concrete Problems in AI Safety β Amodei et al.
- π π LoRA: Low-Rank Adaptation β Hu et al.
- π π QLoRA β Dettmers et al.
- π π Direct Preference Optimization (DPO) β Rafailov et al.
- π π Training LMs to follow instructions with human feedback (InstructGPT) β Ouyang et al.
- π π Constitutional AI / RLAIF β Bai et al.
- π π Efficient Memory Management for LLM Serving with PagedAttention (vLLM) β Kwon et al.
- π π FlashAttention β Dao et al.
- π π SGLang: Efficient Execution of Structured Language Model Programs β Zheng et al.
- π π Fast Inference from Transformers via Speculative Decoding β Leviathan et al.
- π π Robust Speech Recognition via Large-Scale Weak Supervision (Whisper) β Radford et al.
- π π Moshi β Kyutai.
- π π Seamless: Multilingual Expressive and Streaming Speech Translation β Meta.
- π π Guidelines for Human-AI Interaction β Amershi et al. (Microsoft Research, CHI 2019).
Public benchmarks and leaderboards for coding agents, tool use, RAG, evaluation, and more.
- β π§ͺ π SWE-bench β Real-world GitHub-issue resolution benchmark; Verified subset is the de-facto industry standard.
- π§ͺ π Terminal-Bench β Stanford / Laude. Long-horizon terminal task benchmark.
- π§ͺ π LiveCodeBench β Rolling contamination-free coding benchmark.
- π§ͺ π BigCodeBench β Practical programming with diverse function calls.
- π§ͺ π HumanEval+ / EvalPlus β Strengthened HumanEval.
- π§ͺ π MLE-bench β OpenAI. Kaggle-style ML engineering benchmark.
- π§ͺ π GAIA β General AI Assistants benchmark.
- π§ͺ π AgentBench β Tsinghua. Broad agent capability benchmark.
- π§ͺ π WebArena / VisualWebArena β Web-navigation agents.
- π§ͺ π OSWorld β Desktop OS-controlling agents.
- π§ͺ π MLE-bench β ML-engineering agents.
- π§ͺ π RAGAS β Framework and leaderboard for RAG eval.
- π§ͺ π MTEB β Massive Text Embedding Benchmark.
- π§ͺ π BEIR β Zero-shot IR benchmark.
- π§ͺ π ARES β Automated RAG evaluation.
- π§ͺ π Berkeley Function-Calling Leaderboard (BFCL) β UC Berkeley.
- π§ͺ π Ο-bench β Sierra. Tool-agent-user interaction benchmark.
- π§ͺ π API-Bank β Alibaba. Tool-augmented assistants.
- π§ͺ π AgentBench β General agent-capability.
- π§ͺ π AgentBoard β HKUST. Analytic, fine-grained agent eval.
- π§ͺ π HELM β Stanford CRFM. Holistic evaluation.
- π§ͺ π Chatbot Arena / LMSYS Arena β Human-preference leaderboard.
- π§ͺ π MMLU-Pro β Harder MMLU.
- π§ͺ π MT-Bench β LLM-as-judge multi-turn.
- π§ͺ π AdvBench / HarmBench β CAIS. Adversarial / red-team benchmarks.
- π§ͺ π JailbreakBench β Chao et al.
- π§ͺ π PurpleLlama CyberSecEval β Meta.
- π§ͺ π TruthfulQA β Truthfulness benchmark.
- π§ͺ π BBQ β Bias benchmark.
- π§ͺ π ToxiGen β Toxicity.
- π§ͺ π MLPerf Inference β MLCommons. Industry-standard serving benchmark.
- π§ͺ π LLMPerf β Anyscale. Throughput/latency tool.
- π§ͺ π MMMU β Multimodal multidiscipline benchmark.
- π§ͺ π VideoMME β Video understanding.
- π§ͺ π Dynabench speech β Live speech-model benchmarks.
Public production write-ups and canonical reference repositories that teach by example.
- β ποΈ π Claude Code β Anthropic's reference agentic CLI.
- ποΈ π Aider β Reference terminal coding agent with detailed engineering blog.
- ποΈ π Cline β Open-source autonomous coding agent.
- ποΈ π OpenHands β All Hands AI. Open-source autonomous SWE agent.
- ποΈ π GitHub spec-kit β Reference spec-driven toolkit.
- ποΈ π SWE-agent β Princeton NLP. Reference agent for SWE-bench.
- ποΈ π AutoCodeRover β NUS.
- ποΈ π Agentless β Minimal agentless baseline that beat prior agents on SWE-bench Lite.
- ποΈ π Open Interpreter β Reference local code-execution agent.
- ποΈ π Quivr β Reference full-stack RAG assistant.
- ποΈ π LangChain templates β Reference app scaffolds.
- β ποΈ π LlamaIndex β Reference RAG framework; docs double as case studies.
- ποΈ π RAGFlow β Production-grade RAG reference.
- ποΈ π Verba β Weaviate reference RAG app.
- ποΈ π GraphRAG β Microsoft Research.
- ποΈ π Letta (MemGPT) β Reference agentic-memory implementation.
- ποΈ π Mem0 β Reference memory layer.
- ποΈ π Zep β Long-term memory store.
- ποΈ π awesome-mcp-servers β Community catalogue of MCP server implementations.
- ποΈ π Anthropic MCP reference servers β The canonical reference MCP servers.
- ποΈ π LangGraph β Reference graph-based orchestration.
- ποΈ π AutoGen β Microsoft.
- ποΈ π CrewAI β Reference role-based multi-agent.
- ποΈ π Pydantic AI β Type-safe agent framework.
- ποΈ π EleutherAI lm-evaluation-harness β Standard offline-eval harness.
- ποΈ π DeepEval β Reference eval framework.
- ποΈ π RAGAS β RAG-specific evaluation.
- ποΈ π Langfuse β Open-source LLM observability.
- ποΈ π Arize Phoenix β Open-source tracing + evals.
- ποΈ π OpenLLMetry β OTel-based LLM instrumentation.
- ποΈ π Guardrails AI β Reference guardrails framework.
- ποΈ π NVIDIA NeMo Guardrails β Programmable guardrails.
- ποΈ π Rebuff β Prompt-injection defence reference.
- ποΈ π Unsloth β Fast LoRA/QLoRA reference.
- ποΈ π Axolotl β Reference fine-tuning framework.
- ποΈ π LLaMA-Factory β Unified fine-tuning toolkit.
- ποΈ π Hugging Face alignment-handbook β Reference RLHF/DPO recipes.
- β ποΈ π vLLM β Reference high-throughput LLM serving.
- ποΈ π SGLang β Structured generation serving.
- ποΈ π llama.cpp β Reference CPU/GPU local inference.
- ποΈ π TensorRT-LLM β NVIDIA reference optimised serving.
- ποΈ π LiveKit Agents β Voice-agent reference.
- ποΈ π Pipecat β Daily's voice-agent framework.
- ποΈ π Ultravox β Real-time speech LM.
- ποΈ π Vercel AI SDK β Reference AI-UI patterns and streaming.
- ποΈ π Open WebUI β Reference local chat UI.
- ποΈ π assistant-ui β Reference React components for AI chat.
Recorded talks, workshops, and conference series worth watching.
- β π₯ π AI Engineer Summit / World's Fair β The definitive practitioner conference; full talks on YouTube.
- π₯ π NeurIPS / ICML / ICLR β Core ML research venues; most papers include recorded talks.
- π₯ π COLM β Conference on Language Modeling. New dedicated LM venue.
- π₯ π MLSys β Core ML-systems conference (inference, serving).
- π₯ π LlamaCon β Meta's open-source LLM conference.
- β π₯ π Intro to LLMs β Andrej Karpathy. The reference "how LLMs work" talk.
- β π₯ π Let's build GPT: from scratch, in code β Andrej Karpathy.
- β π₯ π 1hr Talk: Intro to LLMs (Nov 2024) β Karpathy updated "Deep Dive into LLMs".
- π₯ π State of GPT β Andrej Karpathy (Microsoft Build 2023).
- π₯ π Stanford CS25: Transformers United β Full lecture series.
- π₯ π Mastering Claude Code β Anthropic (Boris Cherny).
- π₯ π Cursor: Building the AI-first IDE β Cursor team channel.
- π₯ π The future of AI coding β Latent Space talk archives.
- π₯ π SWE-bench at NeurIPS β Carlos Jimenez.
- π₯ π State of AI Engineering β Latent Space keynotes.
- π₯ π Emerging architectures for LLM applications β a16z (video + post).
- π₯ π Anthropic: Prompt Engineering for Business Performance β Anthropic.
- π₯ π ChatGPT Prompt Engineering for Developers β Andrew Ng + OpenAI.
- π₯ π Systematically improving RAG applications β Jason Liu.
- π₯ π RAG at scale β LangChain channel series.
- π₯ π Model Context Protocol deep dive β Anthropic.
- π₯ π MCP at AI Engineer Summit β AI Engineer.
- π₯ π Andrew Ng: What's next for AI agentic workflows β Sequoia AI Ascent 2024.
- π₯ π LangGraph: multi-agent workflows β LangChain.
- π₯ π Evaluating LLM-based applications β Josh Tobin (DBRX Summit).
- π₯ π LLM Evals: MT-Bench and Chatbot Arena β LMSYS.
- π₯ π OpenTelemetry for LLMs β KubeCon / OTel community talks.
- π₯ π Simon Willison on prompt injection β Talks + essays hub.
- π₯ π Anthropic AI safety research β Anthropic channel.
- π₯ π Let's reproduce GPT-2 / build the GPT tokenizer β Karpathy channel.
- π₯ π Fine-tuning workshop β Hamel Husain channel.
- π₯ π vLLM: high-throughput LLM serving β Anyscale / UC Berkeley talks.
- π₯ π CUDA Mode lectures β Community GPU/kernel series.
- π₯ π OpenAI Realtime API demos β OpenAI.
- π₯ π LiveKit voice-agent talks β LiveKit.
- π₯ π AI UX: the next frontier β NNGroup.
- π₯ π Linus Lee: tools for thought β Talks archive.
- π₯ π a16z AI portfolio talks β a16z.
- π₯ π The Pragmatic Engineer on AI teams β Gergely Orosz.
Recurring podcasts with strong agentic & AI-engineering coverage.
- β π§ π Latent Space β swyx & Alessio. The AI-engineering podcast of record; guests include most major AI-lab engineers.
- β π§ π Practical AI β Daniel Whitenack & Chris Benson. Long-running, practitioner-first.
- π§ π MLOps Community podcast β Demetrios Brinkmann. Ops-side operationalisation case studies.
- π§ π Gradient Dissent β Weights & Biases. Applied-ML interviews.
- π§ π The TWIML AI Podcast β Sam Charrington. Longest-running ML interview series.
- π§ π No Priors β Sarah Guo & Elad Gil. Founders / researchers.
- π§ π Cognitive Revolution β Nathan Labenz. Weekly AI engineering + strategy.
- π§ π Dwarkesh Podcast β Dwarkesh Patel. Deep interviews with top researchers.
- π§ π Machine Learning Street Talk β Tim Scarfe. Technical deep-dives.
- π§ π Lex Fridman Podcast β Long-form interviews with AI-lab CEOs and researchers.
- π§ π Unsupervised Learning β Redpoint. AI-founder / operator conversations.
- π§ π Interconnects β Nathan Lambert. RLHF / post-training focus.
- π§ π Pragmatic Engineer β Gergely Orosz. AI-engineering org/hiring coverage.
Weekly and monthly curated newsletters.
- β π° π The Batch β Andrew Ng / DeepLearning.AI. Weekly AI-engineering digest.
- β π° π Import AI β Jack Clark (Anthropic co-founder). Policy + research.
- β π° π Latent Space β swyx. The AI-engineering newsletter of record.
- π° π Simon Willison's Weblog β RSS/email. Daily real-time coverage of tools and agents.
- π° π Ahead of AI β Sebastian Raschka. LLM research + fine-tuning deep-dives.
- π° π The Pragmatic Engineer β Gergely Orosz. AI-engineering hiring/org coverage.
- π° π Interconnects β Nathan Lambert. RLHF / post-training.
- π° π Last Week in AI β Weekly recap.
- π° π TLDR AI β Daily headlines.
- π° π Ben's Bites β Daily digest; founder-friendly.
- π° π Chip Huyen's Blog β Occasional long-form on AI engineering.
- π° π Eugene Yan β Pattern / eval / RAG deep-dives.
- π° π Hamel's Blog β Evals + applied LLMs.
- π° π Machine Learning Engineer Newsletter β Alejandro Saucedo. Weekly production-ML curation.
- π° π MLOps Community newsletter β MLOps Community.
- π° π The Data Exchange β Ben Lorica.
Policy frameworks, safety research, red-teaming resources, and responsible-AI guidance.
- β π NIST AI Risk Management Framework (AI RMF 1.0) β NIST. The foundational US framework.
- π NIST Generative AI Profile (NIST-AI-600-1) β NIST.
- π EU AI Act β European Commission. Official text + implementation timeline.
- π UK AI Safety Institute reports β UK AISI.
- π OECD AI Principles β International reference.
- β π Anthropic Responsible Scaling Policy β Anthropic.
- π Anthropic Core Views on AI Safety β Anthropic.
- π OpenAI Preparedness Framework β OpenAI.
- π Google DeepMind: Frontier Safety Framework β Google DeepMind.
- β π OWASP Top 10 for LLM Applications β OWASP.
- π MITRE ATLAS β Adversarial threat landscape for AI systems.
- π NIST Adversarial ML Taxonomy (NIST AI 100-2) β NIST.
- π HarmBench β CAIS.
- π Simon Willison's prompt-injection series β SW.
- π Microsoft Responsible AI Standard β Microsoft.
- π Google Responsible AI practices β Google.
- π Fairlearn β Open-source fairness toolkit.
- π Partnership on AI β Multi-stakeholder org with published frameworks and incident database.
- π π Concrete Problems in AI Safety β Amodei et al.
- π π Constitutional AI β Bai et al.
- π π Red Teaming Language Models with Language Models β Perez et al.
- π π Sleeper Agents β Hubinger et al. (Anthropic).
Going beyond engineering: designing for AI, human-AI interaction, and the economics of LLM applications.
- β π Guidelines for Human-AI Interaction β Amershi et al. (Microsoft Research). The canonical design heuristics.
- π NNGroup: Generative AI design patterns β Nielsen Norman Group.
- π Google's People + AI Guidebook β Google PAIR.
- π Apple Human Interface Guidelines β Generative AI β Apple.
- π Maggie Appleton β Essays on the UX of agentic, malleable software.
- π Linus Lee β Essays on interfaces for tools of thought.
- β π a16z: The Economic Case for Generative AI β a16z.
- π Latent Space on unit economics β Latent Space.
- π Stanford AI Index Report β Stanford HAI. Annual deep economic + research snapshot.
- π Epoch AI β Data on compute, cost, and scaling trends.
- π Artificial Analysis β Cross-provider pricing/latency/quality dashboards.
- π 16 Changes to the Way Enterprises Build Software with AI β a16z.
- π AI product strategy β Lenny's Newsletter (AI tag).
- π Every Inc β Prose-heavy essays on AI product + consumer LLM UX.
How organisations structure AI-engineering work, hire for it, and operate sustainably.
- β π The Pragmatic Engineer β AI tag β Gergely Orosz. AI-engineering hiring + org design.
- π Building the AI Engineer role β swyx / Latent Space. The foundational essay defining "AI Engineer" as a discipline.
- π What is an AI Engineer? β Applied LLMs consortium.
- π Eugene Yan: Team size and velocity β Eugene Yan.
- π Shreya Shankar: Operationalizing ML β Shreya Shankar.
- π Staff Engineer β AI org posts β Will Larson & community.
- π a16z AI canon β a16z. Curated reading list for people building AI teams.
- π Emmanuel Ameisen: Building ML Powered Applications β Book + blog on AI-team building.
- π DeepLearning.AI AI Engineer Hiring Report β The Batch periodic coverage.
- π Chip Huyen: Machine learning in production β Org-design questions from production ML.
- π GitHub: The AI-native developer β GitHub's research on workflows / productivity.
Please use one of the issue templates (resource suggestion, broken link, or trending nomination) or open a pull request following the guidance in CONTRIBUTING.md. The curation methodology and update cadence are documented in NOTES.md.
Weekly: PR triage and broken-link fixes. Monthly: trending rotation and new-resource batches. Quarterly: full thoroughness pass against the checklist in NOTES.md.


