MolecularAI · pregHosh · Apr 17, 2026
diff --git a/contrib/reinvent-doc/core_concept/README.md b/contrib/reinvent-doc/core_concept/README.md
@@ -50,6 +50,15 @@ REINVENT4 addresses this at two levels:
 - **Diversity Filter**: penalises repeated Murcko scaffolds during the run. Molecules are bucketed by scaffold; once a bucket fills, further molecules with that scaffold are penalised. Only molecules above `minscore` enter memory. Types: `IdenticalMurckoScaffold` (recommended), `IdenticalTopologicalScaffold`, `ScaffoldSimilarity` (Tanimoto-based), `PenalizeSameSmiles` (exact SMILES repetition).
 - **Inception (Experience Replay)**: replays the highest-scoring molecules seen so far alongside the current batch in the loss computation. Useful when high-scoring molecules are rare — prevents the agent from forgetting them between epochs. Memory can be pre-seeded with known actives. Reinvent only.
 
+### NaviDiv: Advanced Diversity Control
+
+Beyond the built-in diversity filter, **NaviDiv** offers fine-grained control over multiple diversity dimensions simultaneously. NaviDiv offers six complementary metrics: Scaffold, Ngram, Fragments, Cluster, Ring, and Functional Group diversity, and can be used in two ways:
+
+1. **Live constraints during RL:**  add NaviDiv-penalty components to your scoring function to steer the optimization toward diverse chemotypes while optimizing your task objective.
+2. **Post-hoc analysis:** inspect generated molecules in a Streamlit dashboard after the run to understand diversity patterns, identify mode collapse, and validate that multiple chemical scaffolds were explored.
+
+
+
 
 
 ----
@@ -70,3 +79,5 @@ References:
 6. Loeffler, H. H.; He, J.; Tibo, A.; Janet, J. P.; Voronov, A.; Mervin, L. H.; Engkvist, O. Reinvent 4: Modern AI-Driven Generative Molecule Design. *J. Cheminform.* **2024**, *16* (1), 20. [https://doi.org/10.1186/s13321-024-00812-5](https://doi.org/10.1186/s13321-024-00812-5)
 
 7. Guo, J.; Schwaller, P. Augmented Memory: Sample-Efficient Generative Molecular Design with Reinforcement Learning. *JACS Au* **2024**, *4* (6), 2160–2172. [https://doi.org/10.1021/jacsau.4c00066](https://doi.org/10.1021/jacsau.4c00066)
+
+8. Azzouzi, M.; Worakul, T.; Corminboeuf, C. NaviDiv: A Web App for Monitoring Chemical Diversity in Generative Molecular Design. *Digital Discovery* **2026**. [https://doi.org/10.1039/D5DD00487J](https://doi.org/10.1039/D5DD00487J)
diff --git a/contrib/reinvent-doc/tutorials/README.md b/contrib/reinvent-doc/tutorials/README.md
@@ -29,7 +29,7 @@ reinvent config.toml -l run.log -s 42
 4. [Scoring Function Design](scoring_function.md) — how to formulate objectives, choose transforms and weights, use built-in components, and write custom ones
 5. [Scoring](scoring.md) — evaluate an existing SMILES list against a scoring function without running RL; useful for validating your scoring setup
 6. [Common Workflows](workflows.md) — end-to-end strategies combining sampling, TL, and RL for different scenarios
-7. [Monitoring and Analysis](monitoring.md) — TensorBoard metrics during TL/RL, CSV output columns, and DataWarrior visualisation
+7. [Monitoring and Analysis](monitoring.md) — TensorBoard metrics during TL/RL, CSV output columns, DataWarrior visualisation, and NaviDiv for diversity monitoring
 
 ## Example Config Files
 

diff --git a/contrib/reinvent-doc/tutorials/monitoring.md b/contrib/reinvent-doc/tutorials/monitoring.md
@@ -131,6 +131,30 @@ Generator-specific input columns (`Input_Scaffold`, `Warheads`, etc.) are prepen
 
 ---
 
+## Diversity Monitoring with NaviDiv
+
+For detailed analysis of chemical diversity in your generated molecules, NaviDiv provides six complementary metrics: Scaffold, Ngram, Fragments, Cluster, Ring, and Functional Group diversity. You can use NaviDiv in two ways:
+
+1. **Live diversity constraints during RL** — add NaviDiv components to your scoring function to steer the optimization toward diverse chemotypes while optimizing your task objective.
+2. **Post-hoc analysis** — inspect your RL output CSV in the NaviDiv Streamlit dashboard to visualize diversity patterns, identify mode collapse, and validate that multiple chemical series were explored.
+
+Quick start:
+
+```bash
+# Install NaviDiv into your reinvent4 environment
+git clone https://github.com/LCMD-epfl/NaviDiv.git
+cd NaviDiv
+pip install -e .
+
+# Analyse your RL output
+streamlit run app.py
+# Then load your results.csv and run diversity scorers
+```
+
+For details on NaviDiv configuration, tuning parameters, and interpreting results, see [NaviDiv repository](https://github.com/LCMD-epfl/NaviDiv) and [paper](https://doi.org/10.1039/D5DD00487J).
+
+---
+
 ## Visualising with DataWarrior
 
 [DataWarrior](https://openmolecules.org/datawarrior/) is a free desktop tool that can render SMILES directly and is well-suited for browsing REINVENT output CSVs.

diff --git a/contrib/tutorials/NaviDiv/README.md b/contrib/tutorials/NaviDiv/README.md
@@ -0,0 +1,203 @@
+# NaviDiv: Chemical Diversity Monitoring for REINVENT4
+
+**NaviDiv** is a framework for analysing and controlling chemical diversity during generative molecular design. It plugs into [REINVENT4](https://github.com/MolecularAI/REINVENT4) as a live diversity-penalty scoring component and ships a standalone Streamlit dashboard for post-hoc analysis.
+
+[![DOI](https://img.shields.io/badge/DOI-10.1039%2FD5DD00487J-blue)](https://doi.org/10.1039/D5DD00487J)
+[![Repository](https://img.shields.io/badge/github-LCMD--epfl%2FNaviDiv-black)](https://github.com/LCMD-epfl/NaviDiv)
+
+
+![NaviDiv Overview](asset/figure1_navidiv.svg)
+
+
+## Features
+
+NaviDiv provides six complementary diversity metrics:
+
+| Scorer | What it measures |
+|---|---|
+| **Scaffold** | Bemis-Murcko scaffold diversity (wire-frame / framework variants) |
+| **Ngram** | SMILES n-gram sequence-pattern diversity |
+| **Fragments** | BRICS fragment-decomposition diversity |
+| **Cluster** | Tanimoto-similarity cluster diversity |
+| **RingScorer** | Ring-system diversity |
+| **FGscorer** | Functional-group diversity |
+
+All scorers can run standalone (app or script) or as live REINVENT4 penalty components during RL optimisation.
+
+## Installation
+
+Assumes REINVENT4 is already installed. Clone NaviDiv and install it into the same environment:
+
+```bash
+conda activate reinvent4
+git clone https://github.com/mohammedazzouzi15/NaviDiv.git
+cd NaviDiv
+pip install -e .
+```
+
+Then set `NAVIDIV_ROOT` for the run scripts:
+
+```bash
+export NAVIDIV_ROOT="$(pwd)"
+export PYTHONPATH="${PYTHONPATH}:${NAVIDIV_ROOT}/src/navidiv/reinvent"
+```
+
+## Usage
+
+Two independent workflows are available — run either or both:
+
+---
+
+### A — Streamlit app: post-hoc diversity analysis
+
+Explore generated molecules interactively. A bundled 200-molecule sample CSV is in `examples/app/sample_molecules.csv`.
+
+**Option 1 — Streamlit dashboard:**
+
+```bash
+conda activate reinvent4
+export NAVIDIV_ROOT=/path/to/NaviDiv
+bash contrib/tutorials/NaviDiv/examples/app/run_app.sh
+# Opens http://localhost:8501
+# When prompted, load: contrib/tutorials/NaviDiv/examples/app/sample_molecules.csv
+```
+
+Recommended dashboard workflow:
+1. **Load File** — enter the CSV path and click Load
+2. **Run t-SNE** — 2D chemical-space projection
+3. **Run individual scorers** — Scaffold, Ngram, Fragments, Cluster, …
+4. **Run All Scorers** — full diversity report written to `scorer_output/`
+5. **Per Step tab** — diversity trends over optimisation steps (requires `step` column)
+
+**Option 2 — programmatic script (no browser):**
+
+```bash
+conda activate reinvent4
+cd contrib/tutorials/NaviDiv/examples/app
+python run_demo.py
+# Prints diversity scores for all metrics and writes outputs to ./demo_output/
+```
+
+---
+
+### B — REINVENT4 run: live diversity constraints during RL
+
+Add diversity penalty components to a REINVENT4 staged-learning run. All configs and scripts are self-contained in `examples/reinvent/`.
+
+**Quick test (10 steps, ~1 min):**
+
+```bash
+conda activate reinvent4
+export NAVIDIV_ROOT=/path/to/NaviDiv
+export PYTHONPATH="${PYTHONPATH}:${NAVIDIV_ROOT}/src/navidiv/reinvent"
+
+cd /path/to/REINVENT4/contrib/tutorials/NaviDiv/examples/reinvent
+EXAMPLE="$(pwd)"
+
+python3 "${NAVIDIV_ROOT}/src/navidiv/reinvent/run_reinvent_2.py" \
+    --config-name test \
+    --config-path "${EXAMPLE}/conf_folder" \
+    name=quick_test \
+    wd="${EXAMPLE}/runs/test" \
+    input_generator.file_path="${EXAMPLE}/InputGenerator_custom.py" \
+    reinvent_common.prior_filename="${EXAMPLE}/priors/formed.prior" \
+    reinvent_common.agent_filename="${EXAMPLE}/priors/formed.prior" \
+    reinvent_common.max_steps=10 \
+    diversity_scorer=All_weak_constraints
+```
+
+**Full demo (all 6 diversity strategies, 100 steps each):**
+
+```bash
+conda activate reinvent4
+export NAVIDIV_ROOT=/path/to/NaviDiv
+
+cd /path/to/REINVENT4/contrib/tutorials/NaviDiv/examples/reinvent
+bash run.sh
+```
+
+Each strategy runs sequentially, then t-SNE and full diversity analysis are applied automatically. Results land in `runs/demo/`.
+
+## Diversity strategy reference
+
+| Config file | Strategy | Best for |
+|---|---|---|
+| `All_constraints.yaml` | All metrics, moderate constraints | Balanced exploration + optimisation |
+| `All_weak_constraints.yaml` | All metrics, light constraints | Property-first, some diversity |
+| `scaffold_only.yaml` | Scaffold diversity | Exploring different core frameworks |
+| `fragement_only.yaml` | Fragment diversity | Exploring molecular building blocks |
+| `ngram_only.yaml` | N-gram sequence diversity | Varying SMILES sequence patterns |
+| `similarity_only.yaml` | Cluster-based diversity | Preventing near-duplicate generation |
+
+### Key tuning parameters
+
+| Parameter | Effect |
+|---|---|
+| `count_perc_ratio` | Lower = stricter diversity constraint |
+| `Total Number of Molecules with Substructure` | Cap on molecules sharing a motif |
+| `score_every` | Diversity evaluation frequency (lower = more control, slower) |
+| `diff_median_score` | Min score improvement required to accept a molecule |
+
+## Output structure
+
+```
+runs/demo/
+└── scaffold_only/
+    ├── scaffold_only_1.csv          # Generated molecules + all scores
+    ├── scaffold_only_1_TSNE.csv     # With 2D t-SNE coordinates
+    ├── scorer_output/               # NaviDiv diversity analysis files
+    └── logs/                        # REINVENT4 training logs
+```
+
+Load `*_TSNE.csv` in the NaviDiv app for interactive exploration.
+
+## Post-run analysis
+
+```bash
+conda activate reinvent4
+export NAVIDIV_ROOT=/path/to/NaviDiv
+
+# t-SNE projection (if not run automatically)
+python3 "${NAVIDIV_ROOT}/src/navidiv/get_tsne.py" \
+    --df_path runs/demo/scaffold_only/scaffold_only_1.csv \
+    --step 20
+
+# Comprehensive diversity report
+python3 "${NAVIDIV_ROOT}/src/navidiv/run_all_scorers.py" \
+    --df_path runs/demo/scaffold_only/scaffold_only_1_TSNE.csv \
+    --output_path runs/demo/scaffold_only/scorer_output
+```
+
+## Citation
+
+If you use NaviDiv, please cite:
+
+```bibtex
+@article{azzouzi_navidiv:_2026,
+	title = {{NaviDiv}: a web app for monitoring chemical diversity in generative molecular design},
+	shorttitle = {{NaviDiv}},
+	url = {https://pubs.rsc.org/en/content/articlelanding/2026/dd/d5dd00487j},
+	doi = {10.1039/D5DD00487J},
+	urldate = {2026-04-16},
+	journal = {Digital Discovery},
+	author = {Azzouzi, Mohammed and Worakul, Thanapat and Corminboeuf, Clémence},
+	year = {2026},
+}
+
+@article{loeffler_reinvent_2024,
+	title = {Reinvent 4: {Modern} {AI}–driven generative molecule design},
+	volume = {16},
+	issn = {1758-2946},
+	shorttitle = {Reinvent 4},
+	url = {https://doi.org/10.1186/s13321-024-00812-5},
+	doi = {10.1186/s13321-024-00812-5},
+	number = {1},
+	urldate = {2026-04-17},
+	journal = {Journal of Cheminformatics},
+	author = {Loeffler, Hannes H. and He, Jiazhen and Tibo, Alessandro and Janet, Jon Paul and Voronov, Alexey and Mervin, Lewis H. and Engkvist, Ola},
+	month = feb,
+	year = {2024},
+	pages = {20},
+}
+
+```