Ask in natural language, get a planned, verified, and executed analysis of spatial omics data.
Stable release from PyPI:
pip install stat-agentWith the full set of analysis skill dependencies (squidpy, scvi-tools, torch, liana, cell2location, …):
pip install "stat-agent[skills]"Some skills require packages that aren't on PyPI or have conflict with other pacakges; install separately as needed:
# STAGATE (requires PyG + matching torch_geometric/torch_sparse/torch_scatter wheels)
pip install git+https://github.com/QIFEIDKN/STAGATE_pyG.gitstat-web # serves on http://localhost:8889
# or
./start_web.sh # also starts a Jupyter Lab alongsideIn the UI:
- Enter the path to your dataset directory.
- Configure your LLM provider and paste an API key.
- Click Load Dataset.
- Ask questions in the chat panel:
- "Annotate cell types using the breast-cancer reference."
- "Find spatially variable genes."
- "Show CD8A expression in slice 1."
- "Run RCTD deconvolution and overlay the dominant cell type."
STAT auto-detects your data layout. Place files in a single directory.
Single-slice
dataset/
├── tissue.h5ad # Required: AnnData with x, y in obs
└── he.tif # Optional: H&E image (pixel coords = cell coords)
Multi-slice
dataset/
├── tissue_slice_0.h5ad
├── he_slice_0.tif
├── tissue_slice_1.h5ad
└── he_slice_1.tif
Multi-omics (gene + protein)
dataset/
├── tissue.h5ad # Gene expression
├── tissue_protein.h5ad # Protein expression
├── he.tif
└── protein_CD3.tif
Coordinate convention. Cell coordinates (x, y) in adata.obs map directly to image pixel (x, y). No coordinate transformation. Note the array indexing swap: image array img[y, x] corresponds to cell (x, y).
Required AnnData fields: adata.obs['x'], adata.obs['y'], and the expression matrix adata.X. adata.obs['celltype'] is optional — annotation skills will populate it.
Skills are auto-discovered from stat_agent/skills/{slug}/SKILL.md. Each skill carries metadata (modalities, data level, prerequisites) and a templated code body. The current catalog:
| Skill | Summary |
|---|---|
| Cell Type Annotation with scANVI | Annotate cell types in spatial transcriptomics data using scANVI transfer learning from a reference scRNA-seq dataset. |
| Fast Cell Type Annotation (Clustering + LLM) | Annotate cell types using unsupervised clustering, marker genes, and LLM-based annotation. |
| Cell Type Annotation via Spatial Mapping (Tangram) | Map single-cell reference annotations onto spatial transcriptomics data using Tangram deep learning alignment. |
| Skill | Summary |
|---|---|
| Cell Type Deconvolution (RCTD) | Perform cell type deconvolution (or annotation on spot) on spatial transcriptomics data (Visium spots) using RCTD with a single-cell refere… |
| Bayesian Cell Type Deconvolution (Cell2location) | Reference-based Bayesian deconvolution of spot-level spatial transcriptomics using Cell2location. |
| Fast Spot Deconvolution (FlashDeconv) | Ultra-fast reference-based cell type deconvolution for spot-level spatial data using FlashDeconv. |
| Skill | Summary |
|---|---|
| Spatial Domain Detection (SpaGCN) | Identify spatial domains in spot-level spatial transcriptomics data using SpaGCN, integrating gene expression, spatial location, and H&E hi… |
| Spatial Domain Detection (STAGATE) | Identify spatial domains using STAGATE (Spatial-Transcriptomics Graph Attention Auto-Encoder). |
| Spatial Domain Detection (GraphST) | Identify spatial domains in spot-level data using GraphST (Graph Self-supervised Transformer). |
| Skill | Summary |
|---|---|
| Spatial Statistics Analysis | Compute spatial statistics including Moran's I (spatial autocorrelation of genes), Ripley's K (spatial point pattern of cell types), co-occ… |
| Neighborhood Enrichment Analysis | Compute neighborhood enrichment z-scores to identify which cell types are spatially co-localized or depleted from each other's neighborhood… |
| Spatial Niche Detection | Identify spatial cellular niches using Harmonics hierarchical model. |
| Spatially Variable Genes (SpatialDE) | Identify spatially variable genes using SpatialDE Gaussian process regression. |
| Skill | Summary |
|---|---|
| Differential Gene Expression Analysis | Find differentially expressed marker genes between groups using scanpy rank_genes_groups with Wilcoxon test. |
| GO Enrichment Analysis | Find enriched Gene Ontology (GO) terms for a user-provided gene list. |
| Over-Representation & Pathway Enrichment Analysis (ORA) | Test whether a gene list is enriched for specific pathways or gene sets using Over-Representation Analysis (Fisher's exact test). |
| Per-Cell Pathway Activity Scoring (ssGSEA) | Compute per-cell pathway activity scores using single-sample Gene Set Enrichment Analysis (ssGSEA). |
| Two-Group Pathway Enrichment Comparison | Compare pathway / gene-set enrichment between two user-provided gene lists (typically markers of two cell populations, clusters, or conditi… |
| Skill | Summary |
|---|---|
| Cell-Cell Communication Analysis (LIANA+) | Analyze cell-cell communication using LIANA+ to identify significant ligand-receptor interactions between cell types. |
| Cell-Cell Communication Analysis (CellPhoneDB) | Analyze cell-cell communication using CellPhoneDB statistical method to identify significant ligand-receptor interactions between cell type… |
| Skill | Summary |
|---|---|
| Batch Integration (Harmony) | Integrate multiple spatial transcriptomics slices using Harmony batch correction. |
| Batch Integration (BBKNN) | Correct batch effects across multiple slices using BBKNN (Batch Balanced K-Nearest Neighbors). |
| Batch Integration (Scanorama) | Correct batch effects across multiple slices using Scanorama panoramic stitching. |
| Skill | Summary |
|---|---|
| Spatial Alignment (STalign) | Align two cell-level spatial transcriptomics slices using STalign. |
| Slice Registration (PASTE) | Align multiple spatial transcriptomics slices using PASTE (Probabilistic Alignment of ST Experiments). |
| Skill | Summary |
|---|---|
| Expression-based CNV Inference (infercnvpy) | Infer copy number variations (CNVs) from gene expression data using infercnvpy. |
| Pseudotime Trajectory Analysis (Palantir / DPT) | Infer cell developmental trajectories and pseudotime ordering using expression-based methods. |
Adding a new skill. Create stat_agent/skills/<your-slug>/SKILL.md with YAML frontmatter (name, title, description, filter_requirements, prerequisites, optional default_skill), then write the analysis instructions and code template in the body. The registry will pick it up at startup.
STAT supports five providers via a unified LLMBackend. In the web UI's Configure LLM panel, pick a Provider from the dropdown, then type the bare Model ID as it appears at that provider's API — no prefix needed. (Older saved configs that include a prefix like anthropic/… still work for backward compatibility.)
For programmatic use, export the corresponding environment variable before launching stat-web. Every model ID below has been verified end-to-end against the live provider API.
| Provider | Where to get a key | Env var | Default model | Other verified IDs |
|---|---|---|---|---|
| OpenAI | https://platform.openai.com/api-keys | OPENAI_API_KEY |
gpt-5.4 |
gpt-5.5, gpt-4o |
| Anthropic | https://console.anthropic.com/settings/keys | ANTHROPIC_API_KEY |
claude-opus-4-7 |
claude-opus-4-6, claude-sonnet-4-6 |
| Google Gemini | https://aistudio.google.com/app/apikey | GOOGLE_API_KEY |
gemini-3.1-pro-preview |
gemini-2.5-pro |
| DeepSeek | https://platform.deepseek.com/api_keys | DEEPSEEK_API_KEY |
deepseek-v4-pro |
deepseek-v4-flash |
| Poe (multi-model gateway) | https://poe.com/api_key | POE_API_KEY |
claude-sonnet-4.5 |
claude-opus-4.7, gpt-5.5, gemini-3.1-pro, deepseek-v4-pro-el |
Poe caveat.
claude-opus-4.6andclaude-sonnet-4.6on Poe force extended-thinking on the bot side and are not yet supported through STAT — useclaude-opus-4.7instead, or switch to the direct Anthropic provider.
Tip. For long-context analysis (multi-slice integration, large reference profiles), prefer models with 200 k+ context:
claude-opus-4-7,claude-opus-4-6,gpt-5.5,gemini-3.1-pro-preview.
Verify before a long run. Use the Test Connection button in the Configure LLM panel — it sends a one-token round-trip through the same
LLMBackendcode path as the agent and reports the exact error if anything is off.
The analyses, figures, and benchmarks from the STAT paper live in a separate repository: https://github.com/chenyhvvvv/STAT-PaperRepro
BSD-3-Clause © STAT contributors.