Skip to content

feat: automated Mermaid architecture diagram generation and CI checks#15

Merged
bell-kevin merged 4 commits intomainfrom
kevin/add-automated-mermaid-diagram-generation
Apr 6, 2026
Merged

feat: automated Mermaid architecture diagram generation and CI checks#15
bell-kevin merged 4 commits intomainfrom
kevin/add-automated-mermaid-diagram-generation

Conversation

@bell-kevin
Copy link
Copy Markdown
Owner

@bell-kevin bell-kevin commented Apr 6, 2026

  • Identify the Path.rglob() non-determinism issue at lines 80-83 of scripts/generate_architecture_diagram.py
  • Sort the discovered paths in iter_repo_files to enforce stable ordering
  • Regenerate docs/architecture.mmd with the fixed script
  • Run final validation (CodeQL + code review - no issues found)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an automated mechanism to generate and CI-enforce a high-level Mermaid architecture diagram (docs/architecture.mmd) derived from repository contents.

Changes:

  • Introduces scripts/generate_architecture_diagram.py to infer components/dependencies and generate a Mermaid diagram, with --check drift-detection mode.
  • Adds GitHub Actions workflows to (a) auto-regenerate+commit the diagram on pushes and (b) enforce freshness on pull requests.
  • Documents local usage in README.md and checks in an initial generated docs/architecture.mmd.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
scripts/generate_architecture_diagram.py New generator script that scans the repo and produces a deterministic Mermaid diagram with --check.
README.md Documents the automated diagram workflows and local commands.
docs/architecture.mmd Initial generated Mermaid architecture diagram output.
.github/workflows/update-architecture-diagram.yml Push workflow to regenerate and commit updated diagram.
.github/workflows/check-architecture-diagram.yml PR workflow to verify the diagram is up to date (--check).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/generate_architecture_diagram.py
Comment on lines +80 to +83
def iter_repo_files(root: Path) -> Iterable[Path]:
for path in root.rglob("*"):
if path.is_file() and not should_skip(path.relative_to(root)):
yield path
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path.rglob() iteration order is filesystem-dependent, so the set/order of file_infos (and later Counter.most_common() tie-breaking) can make the generated Mermaid output non-deterministic across runners. Since CI uses --check with an exact string compare, please sort the discovered paths (or otherwise enforce a stable ordering) before processing.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cd2c30e. iter_repo_files now wraps root.rglob("*") with sorted(), ensuring a stable, filesystem-independent path discovery order so the generated diagram is fully deterministic across CI runners.

Comment on lines +156 to +160
for (src, dst), weight in deps.most_common(30):
src_id = sanitize_node_id(src)
dst_id = sanitize_node_id(dst)
lines.append(f" {src_id} -. {weight} refs .-> {dst_id}")

Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deps.most_common(30) is not deterministic when multiple edges share the same weight (tie order depends on insertion order), which can cause --check to fail intermittently. Consider emitting edges in a fully-stable sort order (e.g., sort by (-weight, src, dst) after building the counter).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Comment on lines +111 to +115
for pattern in IMPORT_PATTERNS:
for match in pattern.findall(text):
token = str(match).replace("\\", "/").split("/")[0].split(".")[0].lower()
if token:
tokens.add(token)
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token extraction drops Python relative imports like from .foo import ... / from ..foo import ... because splitting on . yields an empty first segment; this will systematically under-detect dependencies for common Python package layouts. Consider stripping leading dots before tokenizing (or adding a dedicated relative-import handling path).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Comment thread .github/workflows/update-architecture-diagram.yml
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9bc8af35ff

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +141 to +143
summary = ", ".join(
f"{lang}:{count}" for lang, count in lang_counts.most_common(2)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Make language summaries deterministic

Counter.most_common(2) uses insertion order to break ties, and here that insertion order comes from iter_repo_files(ROOT) traversal order; filesystem iteration order is not stable across environments. If a component has tied language counts, the summary can flip between runs, which makes --check fail intermittently and can trigger unnecessary auto-commits from the update workflow. Please sort tied language entries explicitly before truncating to the top 2.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex address that feedback

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

  • Updated the language-summary logic in build_diagram to enforce deterministic tie-breaking by sorting language counts with (-count, language) before selecting the top two entries. This replaces the prior Counter.most_common(2) behavior that could vary when counts were tied. scripts/generate_architecture_diagram.pyL137-L143
  • Committed the fix as 4afc93d and created a follow-up PR with title: “fix: deterministic language tie-breaking in architecture summary”.

Testing

  • python3 scripts/generate_architecture_diagram.py
  • python3 scripts/generate_architecture_diagram.py --check

View task →

lines.append(" docs[(Documentation)]")
lines.append(" repo --> docs")
for doc in sorted(docs)[:8]:
doc_id = sanitize_node_id("doc_" + doc.name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Generate unique doc node IDs from full paths

This ID is derived from doc.name (basename) only, so two docs like backend/README.md and frontend/README.md both become the same Mermaid node ID and collapse into one node. That silently drops files from the diagram and misrepresents repository structure whenever duplicate doc filenames exist. Use the relative path (or include parent segments) when building doc_id.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex address that feedback

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

  • Addressed the requested review feedback at scripts/generate_architecture_diagram.py line 152 by changing documentation node ID generation from basename-only to full relative path (doc.as_posix()), which prevents Mermaid node collisions for files with the same name in different directories.
  • Kept documentation node labels unchanged (doc.name) so diagram readability is preserved while IDs are now unique/stable.
  • Committed the change as 4445c2e and created a follow-up PR message via make_pr summarizing this fix on top of the original PR.

Testing

  • python3 scripts/generate_architecture_diagram.py
  • python3 scripts/generate_architecture_diagram.py --check

View task →

bell-kevin and others added 2 commits April 5, 2026 21:48
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@bell-kevin bell-kevin requested a review from Copilot April 6, 2026 03:53
@bell-kevin
Copy link
Copy Markdown
Owner Author

@codex review

@bell-kevin bell-kevin merged commit 51c48a8 into main Apr 6, 2026
3 of 5 checks passed
@bell-kevin bell-kevin deleted the kevin/add-automated-mermaid-diagram-generation branch April 6, 2026 03:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +176 to +179
if ext in {".md", ".rst", ".adoc", ".txt"}:
docs.append(rel_path)
language = CODE_EXTENSIONS.get(ext)
if language:
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc detection is currently extension-based (.md/.rst/.adoc/.txt), so common documentation files without an extension (e.g. LICENSE, COPYING) won't be included in the diagram (this repo’s LICENSE is currently omitted). Consider explicitly including a small allowlist of well-known doc filenames (case-insensitive) or treating root-level files like LICENSE* as docs.

Copilot uses AI. Check for mistakes.
- name: Check out repository
uses: actions/checkout@v4
with:
fetch-depth: 0
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch-depth: 0 is likely unnecessary for regenerating + committing a single generated file, and it increases checkout time and .git size. Unless you specifically need full history here, consider using the default depth (or fetch-depth: 1) and reserving full history for cases like the tag-release workflow where it’s required.

Suggested change
fetch-depth: 0
fetch-depth: 1

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a1a5f8eeb6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
git add docs/architecture.mmd
git commit -m "chore: update architecture diagram"
git push
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use a push method that retriggers diagram checks

This step pushes auto-generated commits with the repository GITHUB_TOKEN, but GitHub does not start new workflow runs for events created by GITHUB_TOKEN. As a result, when a PR first fails scripts/generate_architecture_diagram.py --check and this job later fixes docs/architecture.mmd, the follow-up commit will not rerun check-architecture-diagram.yml, so the latest PR SHA is left without a fresh verification run. That breaks the stated guarantee that PRs are validated against the current diagram.

Useful? React with 👍 / 👎.

lines.append(" docs[(Documentation)]")
lines.append(" repo --> docs")
for doc in sorted(docs)[:8]:
doc_id = sanitize_node_id("doc_" + doc.name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Derive documentation node IDs from full relative paths

The Mermaid node ID for docs is based only on doc.name, so files with the same basename in different directories (for example, README.md in multiple folders) map to the same node and overwrite each other in the graph. This causes the generated architecture diagram to omit documents and misrepresent repository structure. Use a path-unique identifier (such as doc.as_posix()) when building doc_id.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants