Presentation Link: https://www.canva.com/design/DAHD6XI3ZZ0/sODZGuj9uNZ03-MiELBCdQ/edit?utm_content=DAHD6XI3ZZ0&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton
Video Link: https://youtu.be/oCD0hucRmcU
An agentic platform that resolves software incidents end-to-end: from incident ingestion to root-cause analysis, minimal patch generation, sandbox validation, and delivery-ready reporting.
- Project Overview
- Problem Statement
- System Workflow
- Agent Responsibilities
- Knowledge Graph Design
- Docker Sandbox Validation
- MCP Integrations
- Reporting and Outputs
- Risk Scoring Strategy
- Quick Start
- Tech Stack
- Repository Structure
- Configuration
FixFlow AI is a multi-agent engineering system that automates the full incident resolution lifecycle:
- Accept incidents from GitHub and Slack
- Build structured incident context
- Analyze code and dependencies
- Generate a minimal patch
- Validate changes in an isolated Docker sandbox
- Produce reports and handoff artifacts for teams
The platform uses a stateful agent graph and a code knowledge graph to improve accuracy, reduce regressions, and support iterative retries when fixes fail validation.
Most incident handling workflows are still manual and slow:
- Parse issue text and logs
- Identify root cause in a large codebase
- Propose a safe fix
- Run tests and evaluate regressions
- Communicate findings to stakeholders
These steps are difficult to scale, especially for teams handling multiple incidents in parallel. FixFlow AI addresses this by coordinating specialized agents with strict validation and risk-aware output decisions.
- Incident arrives through GitHub or Slack webhook.
- Incident Parser extracts structured fields (severity, stack traces, service, symptoms).
- Supervisor routes work to specialist agents.
- Codebase Analyst and Knowledge Retriever build root-cause context.
- Critic checks fix strategy for quality and minimal scope.
- Fix Writer generates a focused patch.
- Validation Agent runs sandbox checks and compares baseline vs patched behavior.
- Synthesis Agent creates a resolution narrative.
- Risk Scorer assigns LOW, MEDIUM, or HIGH risk.
- System generates report and integration-ready output.
| Agent | File | Responsibility |
|---|---|---|
| Supervisor | src/agents/supervisor.py | Controls orchestration and retry routing |
| Incident Parser | src/agents/incident_parser.py | Converts raw incident text to structured context |
| Codebase Analyst | src/agents/codebase_analyst.py | Performs code-level investigation |
| Knowledge Retriever | src/agents/knowledge_retriever.py | Pulls historical and graph context |
| Critic | src/agents/critic.py | Reviews plans and catches weak reasoning |
| Fix Writer | src/agents/fix_writer.py | Generates focused remediation patches |
| Validation | src/agents/validation.py | Executes validation and result checks |
| KG Builder | src/agents/kg_builder.py | Builds and updates graph artifacts |
| Synthesis | src/agents/synthesis.py | Produces consolidated resolution output |
| Risk Scorer | src/agents/risk_scorer.py | Determines risk policy for outcomes |
FixFlow uses pluggable graph backends:
- Neo4j backend for persistent graph queries
- NetworkX fallback for local/offline operation
Graph components are implemented under src/graph, including backend factory, query interface, and base graph abstractions.
Key graph use cases:
- Blast radius estimation
- Historical incident lookup
- Dependency relationship tracing
- Context enrichment for better fix planning
Validation is performed in isolated containers to prevent host contamination and improve reproducibility.
- Python sandbox image: docker/python.Dockerfile
- Node sandbox image: docker/node.Dockerfile
- Orchestrator: src/sandbox/docker_runner.py
Validation flow:
- Capture baseline test result
- Apply patch in sandbox context
- Re-run tests
- Detect regressions and classify outcome
The system integrates with MCP bridges for external tooling:
- GitHub integration for issue and PR workflows
- Slack integration for incident ingestion and notifications
Relevant modules:
- src/mcp/github_server.py
- src/mcp/github_tools.py
- src/mcp/slack_server.py
- src/mcp/slack_tools.py
- src/mcp/client_bridge.py
Report generation lives in src/reports/report_generator.py with template support under src/reports/templates.
Output channels include:
- Incident resolution JSON artifacts (for traceability)
- PR-friendly markdown summaries
- Slack message summaries for operational visibility
Sample report data can be found in reports/INC-004_report.json.
Risk scoring combines:
- Scope of code impact
- Validation confidence
- Incident severity
- Change complexity
Policy examples:
- LOW: suggest automated continuation
- MEDIUM: human review recommended
- HIGH: report-only with escalation guidance
# 1) Create or activate environment
python -m venv venv
venv\Scripts\activate
# 2) Install dependencies
pip install -r requirements.txt
# 3) (Optional) Start Neo4j via Docker Compose
docker-compose up -d
# 4) Run demo incident flow
python demo.py --incident INC-004 --slack-channel C0AL8NG5J79
# 5) Run batch demo
python demo_batch.py| Component | Technology |
|---|---|
| Orchestration | LangGraph |
| LLM Provider | Cerebras Llama 3.3 70B |
| Graph | Neo4j, NetworkX |
| Sandbox | Docker |
| Integrations | MCP tools (GitHub, Slack) |
| API | FastAPI |
| Reporting | Jinja2 templates |
src/
agents/ # Agent implementations and workflow state
api/ # FastAPI app and webhook routes
graph/ # Graph backends and query interface
llm/ # LLM client abstractions and providers
mcp/ # MCP bridges and platform tool wrappers
reports/ # Report generation and templates
sandbox/ # Containerized validation utilities
utils/ # Shared utilities
tests/ # Test and integration helper scripts
docker/ # Dockerfiles for runtime/sandbox images
reports/ # Generated incident report outputs
Create an environment file and configure required values.
Suggested variables:
- CEREBRAS_API_KEY
- GITHUB_TOKEN
- SLACK_BOT_TOKEN
- NEO4J_URI
- NEO4J_USER
- NEO4J_PASSWORD
The runtime configuration entrypoint is src/config.py.