First off, thank you for considering contributing to SMP! 🎉
SMP is not a standard web application; it is the core memory and safety guardrail layer for autonomous AI agents. Because this codebase is read, parsed, and modified by both humans and AI agents, strict adherence to architectural consistency, immutability, and explicit typing is absolutely critical.
This guide will walk you through the setup, coding standards, and workflows required to contribute successfully.
- Development Environment Setup
- Mental Model & Architecture
- Coding Standards
- How to Add New Features
- Testing Guidelines
- Git & PR Workflow
- Python 3.11+ (Strict requirement for
X | Yunions,tomllib, andmsgspecoptimizations) - Docker (Required for spawning agent sandboxes and running Testcontainers for DBs)
- Neo4j Desktop or Neo4j Docker image (Must include the Graph Data Science (GDS) plugin for Louvain and PageRank).
-
Clone & Create a Virtual Environment:
git clone https://github.com/your-org/structural-memory.git cd structural-memory python3.11 -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install Dependencies in Editable Mode:
pip install -e ".[dev]" -
Configure the Environment: Copy the example environment file and configure your database endpoints.
cp .env.example .env
Note: Make sure
NEO4J_URIpoints to an instance with the GDS plugin enabled.
Before writing code, please read the ARCHITECTURE.md.
Key rules to remember:
- No LLMs at Query Time: Do not add external API calls to OpenAI/Anthropic inside the query engine (
SeedWalkEngine). Relevance is calculated via graph math (Vector + PageRank + HeatScore). - Immutability First: Data flowing through the system must be immutable. We use
msgspec.Structwithfrozen=True. - Agents are untrusted: Any endpoint touching the filesystem must go through the Sandbox and
smp/guard/check.
We use automated tools to enforce our standards, but here are the specific rules you must follow:
- No
typingmodule fallbacks: Use modern Python 3.11+ syntax.- ❌
Optional[str],Union[int, str],List[str],Dict[str, Any] - ✅
str | None,int | str,list[str],dict[str, Any]
- ❌
- Msgspec Structs: All data models must use
msgspec. Do not usedataclassesorpydantic(they are too slow for massive graph serialization).import msgspec class RankedResult(msgspec.Struct, frozen=True): node_id: str node_type: str vector_score: float is_seed: bool = False
- Every file must start with
from __future__ import annotations. - Always use absolute imports for local modules.
- ❌
from .models import WalkNode - ✅
from smp.engine.models import WalkNode
- ❌
- Group imports:
Standard Library$\rightarrow$ Third-Party$\rightarrow$ Local. Separate groups with a blank line.
- Classes:
PascalCase - Functions/Methods/Variables:
snake_case - Private Members: Prefix with a single underscore
_private_method. - Docstrings: Use Google-style docstrings. Because SMP parses docstrings for the
smp/enrichpipeline, docstrings must be clear, concise, and written in the imperative mood.
SMP does not use a massive if/else statement for protocol routing. We use a Dispatcher Pattern.
- Locate the correct handler file in
smp/protocol/handlers/(e.g.,query.py,safety.py,sandbox.py). - Define your asynchronous handler function.
- Decorate it with
@rpc_method("smp/your/method"). - Define the input/output schema using
msgspec.
Example:
# smp/protocol/handlers/telemetry.py
from smp.protocol.dispatcher import rpc_method
from smp.core.models import ServerContext
@rpc_method("smp/telemetry/hot")
async def handle_telemetry_hot(params: dict, ctx: ServerContext) -> dict:
"""Retrieves high-churn, high-impact nodes."""
window = params.get("window_days", 30)
return await ctx.engine.telemetry.get_hot_nodes(window)If you add a new node type or relationship type (e.g., IMPLEMENTS):
- Update the schema documentation in
ARCHITECTURE.md. - Update the
NodeTypesorEdgeTypesEnums insmp/core/constants.py. - Add any required Neo4j index constraints in
smp/core/store.py(e.g.,CREATE INDEX IF NOT EXISTS FOR (n:NewType) ON (n.id)).
We use pytest and pytest-asyncio. Graph databases and vector stores present unique testing challenges.
- Unit Tests: Should mock Neo4j and ChromaDB. Use these for testing logic (e.g., ranking math in
SeedWalkEngine._rank). - Integration Tests: Found in
tests/integration/. These require actual databases. The CI pipeline uses Testcontainers to spin up ephemeral Neo4j and ChromaDB instances. - Writing Cypher in Tests: When testing Cypher queries, always clean up the graph state in a
finallyblock or use a fresh database schema per test.
Running Tests:
# Run everything
pytest
# Run fast unit tests only (skips DB integration tests)
pytest -m "not integration"Create a branch from main using the following convention:
feature/your-feature-namefix/issue-descriptiondocs/what-you-updated
Write meaningful commit messages based on the Conventional Commits specification:
feat: add AST data-flow verificationfix: resolve static linker namespacing bugrefactor: migrate dataclasses to msgspec
Before opening a Pull Request, you must run and pass these four commands. CI will fail immediately if these are not met:
# 1. Linting (Ruff)
ruff check .
# 2. Formatting (Ruff)
ruff format .
# 3. Type Checking (Mypy - Strict Mode)
mypy smp/
# 4. Testing (Pytest)
pytest- Push your branch to your fork.
- Open a PR against the
mainbranch. - Fill out the PR template provided in
.github/PULL_REQUEST_TEMPLATE.md. - Ensure your PR title matches the Conventional Commits format (e.g.,
feat: implement eBPF trace extraction). - Wait for a maintainer (or a designated Reviewer Agent) to review your code!