feat: automated Mermaid architecture diagram generation and CI checks#15
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an automated mechanism to generate and CI-enforce a high-level Mermaid architecture diagram (docs/architecture.mmd) derived from repository contents.
Changes:
- Introduces
scripts/generate_architecture_diagram.pyto infer components/dependencies and generate a Mermaid diagram, with--checkdrift-detection mode. - Adds GitHub Actions workflows to (a) auto-regenerate+commit the diagram on pushes and (b) enforce freshness on pull requests.
- Documents local usage in
README.mdand checks in an initial generateddocs/architecture.mmd.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
scripts/generate_architecture_diagram.py |
New generator script that scans the repo and produces a deterministic Mermaid diagram with --check. |
README.md |
Documents the automated diagram workflows and local commands. |
docs/architecture.mmd |
Initial generated Mermaid architecture diagram output. |
.github/workflows/update-architecture-diagram.yml |
Push workflow to regenerate and commit updated diagram. |
.github/workflows/check-architecture-diagram.yml |
PR workflow to verify the diagram is up to date (--check). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def iter_repo_files(root: Path) -> Iterable[Path]: | ||
| for path in root.rglob("*"): | ||
| if path.is_file() and not should_skip(path.relative_to(root)): | ||
| yield path |
There was a problem hiding this comment.
Path.rglob() iteration order is filesystem-dependent, so the set/order of file_infos (and later Counter.most_common() tie-breaking) can make the generated Mermaid output non-deterministic across runners. Since CI uses --check with an exact string compare, please sort the discovered paths (or otherwise enforce a stable ordering) before processing.
There was a problem hiding this comment.
Fixed in cd2c30e. iter_repo_files now wraps root.rglob("*") with sorted(), ensuring a stable, filesystem-independent path discovery order so the generated diagram is fully deterministic across CI runners.
| for (src, dst), weight in deps.most_common(30): | ||
| src_id = sanitize_node_id(src) | ||
| dst_id = sanitize_node_id(dst) | ||
| lines.append(f" {src_id} -. {weight} refs .-> {dst_id}") | ||
|
|
There was a problem hiding this comment.
deps.most_common(30) is not deterministic when multiple edges share the same weight (tie order depends on insertion order), which can cause --check to fail intermittently. Consider emitting edges in a fully-stable sort order (e.g., sort by (-weight, src, dst) after building the counter).
| for pattern in IMPORT_PATTERNS: | ||
| for match in pattern.findall(text): | ||
| token = str(match).replace("\\", "/").split("/")[0].split(".")[0].lower() | ||
| if token: | ||
| tokens.add(token) |
There was a problem hiding this comment.
The token extraction drops Python relative imports like from .foo import ... / from ..foo import ... because splitting on . yields an empty first segment; this will systematically under-detect dependencies for common Python package layouts. Consider stripping leading dots before tokenizing (or adding a dedicated relative-import handling path).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9bc8af35ff
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| summary = ", ".join( | ||
| f"{lang}:{count}" for lang, count in lang_counts.most_common(2) | ||
| ) |
There was a problem hiding this comment.
Make language summaries deterministic
Counter.most_common(2) uses insertion order to break ties, and here that insertion order comes from iter_repo_files(ROOT) traversal order; filesystem iteration order is not stable across environments. If a component has tied language counts, the summary can flip between runs, which makes --check fail intermittently and can trigger unnecessary auto-commits from the update workflow. Please sort tied language entries explicitly before truncating to the top 2.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Summary
- Updated the language-summary logic in
build_diagramto enforce deterministic tie-breaking by sorting language counts with(-count, language)before selecting the top two entries. This replaces the priorCounter.most_common(2)behavior that could vary when counts were tied. scripts/generate_architecture_diagram.pyL137-L143 - Committed the fix as
4afc93dand created a follow-up PR with title: “fix: deterministic language tie-breaking in architecture summary”.
Testing
- ✅
python3 scripts/generate_architecture_diagram.py - ✅
python3 scripts/generate_architecture_diagram.py --check
| lines.append(" docs[(Documentation)]") | ||
| lines.append(" repo --> docs") | ||
| for doc in sorted(docs)[:8]: | ||
| doc_id = sanitize_node_id("doc_" + doc.name) |
There was a problem hiding this comment.
Generate unique doc node IDs from full paths
This ID is derived from doc.name (basename) only, so two docs like backend/README.md and frontend/README.md both become the same Mermaid node ID and collapse into one node. That silently drops files from the diagram and misrepresents repository structure whenever duplicate doc filenames exist. Use the relative path (or include parent segments) when building doc_id.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Summary
- Addressed the requested review feedback at
scripts/generate_architecture_diagram.pyline 152 by changing documentation node ID generation from basename-only to full relative path (doc.as_posix()), which prevents Mermaid node collisions for files with the same name in different directories. - Kept documentation node labels unchanged (
doc.name) so diagram readability is preserved while IDs are now unique/stable. - Committed the change as
4445c2eand created a follow-up PR message viamake_prsummarizing this fix on top of the original PR.
Testing
- ✅
python3 scripts/generate_architecture_diagram.py - ✅
python3 scripts/generate_architecture_diagram.py --check
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…output Agent-Logs-Url: https://github.com/bell-kevin/kevinBellTemplateRepo/sessions/d296905a-fa8f-49ae-a856-9e40064fd3e9 Co-authored-by: bell-kevin <8269880+bell-kevin@users.noreply.github.com>
|
@codex review |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if ext in {".md", ".rst", ".adoc", ".txt"}: | ||
| docs.append(rel_path) | ||
| language = CODE_EXTENSIONS.get(ext) | ||
| if language: |
There was a problem hiding this comment.
Doc detection is currently extension-based (.md/.rst/.adoc/.txt), so common documentation files without an extension (e.g. LICENSE, COPYING) won't be included in the diagram (this repo’s LICENSE is currently omitted). Consider explicitly including a small allowlist of well-known doc filenames (case-insensitive) or treating root-level files like LICENSE* as docs.
| - name: Check out repository | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 0 |
There was a problem hiding this comment.
fetch-depth: 0 is likely unnecessary for regenerating + committing a single generated file, and it increases checkout time and .git size. Unless you specifically need full history here, consider using the default depth (or fetch-depth: 1) and reserving full history for cases like the tag-release workflow where it’s required.
| fetch-depth: 0 | |
| fetch-depth: 1 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a1a5f8eeb6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| git config user.email "41898282+github-actions[bot]@users.noreply.github.com" | ||
| git add docs/architecture.mmd | ||
| git commit -m "chore: update architecture diagram" | ||
| git push |
There was a problem hiding this comment.
Use a push method that retriggers diagram checks
This step pushes auto-generated commits with the repository GITHUB_TOKEN, but GitHub does not start new workflow runs for events created by GITHUB_TOKEN. As a result, when a PR first fails scripts/generate_architecture_diagram.py --check and this job later fixes docs/architecture.mmd, the follow-up commit will not rerun check-architecture-diagram.yml, so the latest PR SHA is left without a fresh verification run. That breaks the stated guarantee that PRs are validated against the current diagram.
Useful? React with 👍 / 👎.
| lines.append(" docs[(Documentation)]") | ||
| lines.append(" repo --> docs") | ||
| for doc in sorted(docs)[:8]: | ||
| doc_id = sanitize_node_id("doc_" + doc.name) |
There was a problem hiding this comment.
Derive documentation node IDs from full relative paths
The Mermaid node ID for docs is based only on doc.name, so files with the same basename in different directories (for example, README.md in multiple folders) map to the same node and overwrite each other in the graph. This causes the generated architecture diagram to omit documents and misrepresent repository structure. Use a path-unique identifier (such as doc.as_posix()) when building doc_id.
Useful? React with 👍 / 👎.
Path.rglob()non-determinism issue at lines 80-83 ofscripts/generate_architecture_diagram.pyiter_repo_filesto enforce stable orderingdocs/architecture.mmdwith the fixed script