STAT — Spatial Transcriptomics Analytical agenT

Ask in natural language, get a planned, verified, and executed analysis of spatial omics data.

Installation

Stable release from PyPI:

pip install stat-agent

With the full set of analysis skill dependencies (squidpy, scvi-tools, torch, liana, cell2location, …):

pip install "stat-agent[skills]"

Some skills require packages that aren't on PyPI or have conflict with other pacakges; install separately as needed:

# STAGATE (requires PyG + matching torch_geometric/torch_sparse/torch_scatter wheels)
pip install git+https://github.com/QIFEIDKN/STAGATE_pyG.git

Quick start

Web interface

stat-web                    # serves on http://localhost:8889
# or
./start_web.sh              # also starts a Jupyter Lab alongside

In the UI:

Enter the path to your dataset directory.
Configure your LLM provider and paste an API key.
Click Load Dataset.
Ask questions in the chat panel:
- "Annotate cell types using the breast-cancer reference."
- "Find spatially variable genes."
- "Show CD8A expression in slice 1."
- "Run RCTD deconvolution and overlay the dominant cell type."

Data format

STAT auto-detects your data layout. Place files in a single directory.

Single-slice

dataset/
├── tissue.h5ad          # Required: AnnData with x, y in obs
└── he.tif               # Optional: H&E image (pixel coords = cell coords)

Multi-slice

dataset/
├── tissue_slice_0.h5ad
├── he_slice_0.tif
├── tissue_slice_1.h5ad
└── he_slice_1.tif

Multi-omics (gene + protein)

dataset/
├── tissue.h5ad          # Gene expression
├── tissue_protein.h5ad  # Protein expression
├── he.tif
└── protein_CD3.tif

Coordinate convention. Cell coordinates (x, y) in adata.obs map directly to image pixel (x, y). No coordinate transformation. Note the array indexing swap: image array img[y, x] corresponds to cell (x, y).

Required AnnData fields: adata.obs['x'], adata.obs['y'], and the expression matrix adata.X. adata.obs['celltype'] is optional — annotation skills will populate it.

Built-in skills

Skills are auto-discovered from stat_agent/skills/{slug}/SKILL.md. Each skill carries metadata (modalities, data level, prerequisites) and a templated code body. The current catalog:

Cell type annotation

Skill	Summary
Cell Type Annotation with scANVI	Annotate cell types in spatial transcriptomics data using scANVI transfer learning from a reference scRNA-seq dataset.
Fast Cell Type Annotation (Clustering + LLM)	Annotate cell types using unsupervised clustering, marker genes, and LLM-based annotation.
Cell Type Annotation via Spatial Mapping (Tangram)	Map single-cell reference annotations onto spatial transcriptomics data using Tangram deep learning alignment.

Spot deconvolution

Skill	Summary
Cell Type Deconvolution (RCTD)	Perform cell type deconvolution (or annotation on spot) on spatial transcriptomics data (Visium spots) using RCTD with a single-cell refere…
Bayesian Cell Type Deconvolution (Cell2location)	Reference-based Bayesian deconvolution of spot-level spatial transcriptomics using Cell2location.
Fast Spot Deconvolution (FlashDeconv)	Ultra-fast reference-based cell type deconvolution for spot-level spatial data using FlashDeconv.

Spatial domains

Skill	Summary
Spatial Domain Detection (SpaGCN)	Identify spatial domains in spot-level spatial transcriptomics data using SpaGCN, integrating gene expression, spatial location, and H&E hi…
Spatial Domain Detection (STAGATE)	Identify spatial domains using STAGATE (Spatial-Transcriptomics Graph Attention Auto-Encoder).
Spatial Domain Detection (GraphST)	Identify spatial domains in spot-level data using GraphST (Graph Self-supervised Transformer).

Spatial statistics & niches

Skill	Summary
Spatial Statistics Analysis	Compute spatial statistics including Moran's I (spatial autocorrelation of genes), Ripley's K (spatial point pattern of cell types), co-occ…
Neighborhood Enrichment Analysis	Compute neighborhood enrichment z-scores to identify which cell types are spatially co-localized or depleted from each other's neighborhood…
Spatial Niche Detection	Identify spatial cellular niches using Harmonics hierarchical model.
Spatially Variable Genes (SpatialDE)	Identify spatially variable genes using SpatialDE Gaussian process regression.

Differential expression & pathway

Skill	Summary
Differential Gene Expression Analysis	Find differentially expressed marker genes between groups using scanpy rank_genes_groups with Wilcoxon test.
GO Enrichment Analysis	Find enriched Gene Ontology (GO) terms for a user-provided gene list.
Over-Representation & Pathway Enrichment Analysis (ORA)	Test whether a gene list is enriched for specific pathways or gene sets using Over-Representation Analysis (Fisher's exact test).
Per-Cell Pathway Activity Scoring (ssGSEA)	Compute per-cell pathway activity scores using single-sample Gene Set Enrichment Analysis (ssGSEA).
Two-Group Pathway Enrichment Comparison	Compare pathway / gene-set enrichment between two user-provided gene lists (typically markers of two cell populations, clusters, or conditi…

Cell-cell communication

Skill	Summary
Cell-Cell Communication Analysis (LIANA+)	Analyze cell-cell communication using LIANA+ to identify significant ligand-receptor interactions between cell types.
Cell-Cell Communication Analysis (CellPhoneDB)	Analyze cell-cell communication using CellPhoneDB statistical method to identify significant ligand-receptor interactions between cell type…

Multi-slice integration

Skill	Summary
Batch Integration (Harmony)	Integrate multiple spatial transcriptomics slices using Harmony batch correction.
Batch Integration (BBKNN)	Correct batch effects across multiple slices using BBKNN (Batch Balanced K-Nearest Neighbors).
Batch Integration (Scanorama)	Correct batch effects across multiple slices using Scanorama panoramic stitching.

Slice alignment & registration

Skill	Summary
Spatial Alignment (STalign)	Align two cell-level spatial transcriptomics slices using STalign.
Slice Registration (PASTE)	Align multiple spatial transcriptomics slices using PASTE (Probabilistic Alignment of ST Experiments).

CNV inference & trajectory

Skill	Summary
Expression-based CNV Inference (infercnvpy)	Infer copy number variations (CNVs) from gene expression data using infercnvpy.
Pseudotime Trajectory Analysis (Palantir / DPT)	Infer cell developmental trajectories and pseudotime ordering using expression-based methods.

Adding a new skill. Create stat_agent/skills/<your-slug>/SKILL.md with YAML frontmatter (name, title, description, filter_requirements, prerequisites, optional default_skill), then write the analysis instructions and code template in the body. The registry will pick it up at startup.

LLM providers

STAT supports five providers via a unified LLMBackend. In the web UI's Configure LLM panel, pick a Provider from the dropdown, then type the bare Model ID as it appears at that provider's API — no prefix needed. (Older saved configs that include a prefix like anthropic/… still work for backward compatibility.)

For programmatic use, export the corresponding environment variable before launching stat-web. Every model ID below has been verified end-to-end against the live provider API.

Provider	Where to get a key	Env var	Default model	Other verified IDs
OpenAI	https://platform.openai.com/api-keys	`OPENAI_API_KEY`	`gpt-5.4`	`gpt-5.5`, `gpt-4o`
Anthropic	https://console.anthropic.com/settings/keys	`ANTHROPIC_API_KEY`	`claude-opus-4-7`	`claude-opus-4-6`, `claude-sonnet-4-6`
Google Gemini	https://aistudio.google.com/app/apikey	`GOOGLE_API_KEY`	`gemini-3.1-pro-preview`	`gemini-2.5-pro`
DeepSeek	https://platform.deepseek.com/api_keys	`DEEPSEEK_API_KEY`	`deepseek-v4-pro`	`deepseek-v4-flash`
Poe (multi-model gateway)	https://poe.com/api_key	`POE_API_KEY`	`claude-sonnet-4.5`	`claude-opus-4.7`, `gpt-5.5`, `gemini-3.1-pro`, `deepseek-v4-pro-el`

Poe caveat. claude-opus-4.6 and claude-sonnet-4.6 on Poe force extended-thinking on the bot side and are not yet supported through STAT — use claude-opus-4.7 instead, or switch to the direct Anthropic provider.

Tip. For long-context analysis (multi-slice integration, large reference profiles), prefer models with 200 k+ context: claude-opus-4-7, claude-opus-4-6, gpt-5.5, gemini-3.1-pro-preview.

Verify before a long run. Use the Test Connection button in the Configure LLM panel — it sends a one-token round-trip through the same LLMBackend code path as the agent and reports the exact error if anything is off.

Reproducing the paper

The analyses, figures, and benchmarks from the STAT paper live in a separate repository: https://github.com/chenyhvvvv/STAT-PaperRepro

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
examples		examples
scripts		scripts
stat_agent		stat_agent
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
start_web.sh		start_web.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STAT — Spatial Transcriptomics Analytical agenT

Table of contents

Installation

Quick start

Web interface

Data format

Built-in skills

Cell type annotation

Spot deconvolution

Spatial domains

Spatial statistics & niches

Differential expression & pathway

Cell-cell communication

Multi-slice integration

Slice alignment & registration

CNV inference & trajectory

LLM providers

Reproducing the paper

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

STAT — Spatial Transcriptomics Analytical agenT

Table of contents

Installation

Quick start

Web interface

Data format

Built-in skills

Cell type annotation

Spot deconvolution

Spatial domains

Spatial statistics & niches

Differential expression & pathway

Cell-cell communication

Multi-slice integration

Slice alignment & registration

CNV inference & trajectory

LLM providers

Reproducing the paper

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages