Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds AI-generated scaffolding content to the learning resources project, implementing a systematic approach to populate missing or stub topic pages with structured content and learning resources. The PR includes a Python script to remove dead links and performs YAML frontmatter reformatting across all topic files.
Key Changes
- Dead link removal script: Adds
kill_the_dead_links.pyto validate and remove inaccessible URLs from learning resources - AI scaffolding documentation: Updates README.md with guidelines for AI-generated content including marking, structure, and resource selection criteria
- YAML formatting standardization: Reformats all topic frontmatter from inline flow style to block style (e.g.,
- type: external_linkinstead of- type: "external_link")
Reviewed changes
Copilot reviewed 80 out of 82 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/kill_the_dead_links.py | New Python script to validate URLs and remove dead links from topic learning resources |
| scripts/requirements.txt | Dependencies for the dead links script (python-frontmatter, requests) |
| README.md | Documents AI scaffolding approach, content guidelines, and resource selection criteria |
| .github/copilot-instructions.md | Adds references to design principles and data architecture documentation |
| docs/design-principles.md | Formatting improvements with consistent blank lines |
| docs/data-architecture.md | Formatting improvements with consistent section spacing |
| src/topics/*.md (87 files) | YAML frontmatter reformatting and addition of AI-generated content with descriptions, learning resources, and cross-references |
Comments suppressed due to low confidence (1)
scripts/kill_the_dead_links.py:23
- Except block directly handles BaseException.
except:
| def kill_the_dead_links(path): | ||
| print(f"Processing file \"{path}\"..") | ||
| post = frontmatter.load(path) | ||
| if "learning_resources" in post.metadata: | ||
| for resource in post.metadata["learning_resources"]: | ||
| print(f"Requesting URL \"{resource['url']}\"..") | ||
| try: | ||
| response = requests.get(resource["url"], timeout=5) | ||
| response.raise_for_status() | ||
| except: | ||
| print(f"Dead link found: {resource['url']}") | ||
| post.metadata["learning_resources"].remove(resource) |
There was a problem hiding this comment.
Modifying a list while iterating over it with remove() can cause items to be skipped. When an item is removed, the iterator's index doesn't adjust, potentially skipping the next element. Use a list comprehension or iterate over a copy: post.metadata["learning_resources"] = [resource for resource in post.metadata["learning_resources"] if not is_dead(resource)]
| def kill_the_dead_links(path): | |
| print(f"Processing file \"{path}\"..") | |
| post = frontmatter.load(path) | |
| if "learning_resources" in post.metadata: | |
| for resource in post.metadata["learning_resources"]: | |
| print(f"Requesting URL \"{resource['url']}\"..") | |
| try: | |
| response = requests.get(resource["url"], timeout=5) | |
| response.raise_for_status() | |
| except: | |
| print(f"Dead link found: {resource['url']}") | |
| post.metadata["learning_resources"].remove(resource) | |
| def is_dead(resource): | |
| print(f"Requesting URL \"{resource['url']}\"..") | |
| try: | |
| response = requests.get(resource["url"], timeout=5) | |
| response.raise_for_status() | |
| return False | |
| except: | |
| print(f"Dead link found: {resource['url']}") | |
| return True | |
| def kill_the_dead_links(path): | |
| print(f"Processing file \"{path}\"..") | |
| post = frontmatter.load(path) | |
| if "learning_resources" in post.metadata: | |
| post.metadata["learning_resources"] = [ | |
| resource for resource in post.metadata["learning_resources"] | |
| if not is_dead(resource) | |
| ] |
README.md
Outdated
| - Learning resources need to come from sources widely accepted in expert communities | ||
| - Original documentation is preferrable | ||
| - Youtube videos from widely accepted channels with a long history of reliable information are acceptable | ||
| - Freely accessible ressources are preferred to paid-for offers |
There was a problem hiding this comment.
Typo: "ressources" should be "resources".
| - Freely accessible ressources are preferred to paid-for offers | |
| - Freely accessible resources are preferred to paid-for offers |
README.md
Outdated
| - Provide the structure as laid out in the data architecture | ||
| - Add learning resources for all levels | ||
| - Learning resources need to come from sources widely accepted in expert communities | ||
| - Original documentation is preferrable |
There was a problem hiding this comment.
Typo: "preferrable" should be "preferable".
| - Original documentation is preferrable | |
| - Original documentation is preferable |
README.md
Outdated
| - Add learning resources for all levels | ||
| - Learning resources need to come from sources widely accepted in expert communities | ||
| - Original documentation is preferrable | ||
| - Youtube videos from widely accepted channels with a long history of reliable information are acceptable |
There was a problem hiding this comment.
Capitalization: "Youtube" should be "YouTube" (proper brand name).
| - Youtube videos from widely accepted channels with a long history of reliable information are acceptable | |
| - YouTube videos from widely accepted channels with a long history of reliable information are acceptable |
README.md
Outdated
| - `src/` - This folder contains definition files that outline the archetypes, topics, and learning resources in Markdown format. | ||
| - `src/archetypes/` - Contains archetype definitions, which describe the competencies and skills expected at various stages of a software engineer's career. | ||
| - `src/topics/` - Contains definitions of topics that are relevant to the archetypes, detailing the specific skills, knowledge areas and any learning resources associated with them. | ||
| - `src/topics/` - Contains definitions of topics that are relevant to the archetypes, detailing the specific skills, knowledge areas and any learning resources associated with them. |
There was a problem hiding this comment.
Trailing whitespace on this line. Consider removing it to maintain consistency.
| - `src/topics/` - Contains definitions of topics that are relevant to the archetypes, detailing the specific skills, knowledge areas and any learning resources associated with them. | |
| - `src/topics/` - Contains definitions of topics that are relevant to the archetypes, detailing the specific skills, knowledge areas and any learning resources associated with them. |
| {{ render_description() }} | ||
|
|
||
| - No newline at end of file | ||
| {{ render_learning_resources() }} No newline at end of file |
There was a problem hiding this comment.
{{ render_learning_resources() }} outputs link title and url from frontmatter without validation/escaping, allowing injection of malicious links (e.g., javascript: URLs) or HTML in titles. An attacker could submit a resource that executes JS when clicked. Fix by strictly validating URLs to http/https schemes and escaping titles/URLs in the macro before rendering.
| *Generated by AI, pending community review.* | ||
|
|
||
| ## Learning Materials | ||
| {{ render_description() }} |
There was a problem hiding this comment.
{{ render_description() }} returns frontmatter description directly without sanitization, enabling stored XSS if HTML/JS is included in the description. A malicious PR could add <script> in frontmatter that executes when the page is viewed. Fix by escaping/sanitizing in render_description() and/or enabling HTML sanitization in MkDocs Material.
| {{ render_description() }} | ||
|
|
||
| - No newline at end of file | ||
| {{ render_learning_resources() }} No newline at end of file |
There was a problem hiding this comment.
{{ render_learning_resources() }} emits titles/URLs from frontmatter without escaping or URL scheme checks, allowing injection of javascript: links or HTML in titles. A crafted resource could cause script execution on click. Fix by whitelisting http/https schemes and escaping titles/URLs inside render_learning_resources().
| *Generated by AI, pending community review.* | ||
|
|
||
| ## Learning Materials | ||
| {{ render_description() }} |
There was a problem hiding this comment.
The newly added {{ render_description() }} macro outputs raw frontmatter without sanitization, creating a stored XSS risk if HTML/JS is placed in description. An attacker could add <script> in a PR and have it execute for readers. Fix by escaping/sanitizing in render_description() and configure MkDocs to disallow raw HTML or only allow a safe subset.
| {{ render_description() }} | ||
|
|
||
| - No newline at end of file | ||
| {{ render_learning_resources() }} No newline at end of file |
There was a problem hiding this comment.
{{ render_learning_resources() }} renders title and url directly from frontmatter, allowing injection of unsafe links (e.g., javascript:) or HTML in titles. A malicious resource could trigger script execution on click. Fix by validating URL schemes (only http/https) and escaping titles/URLs inside the macro.
pburls
left a comment
There was a problem hiding this comment.
Something doesn't look to have worked correctly. Some working links from the FSI topics were removed.
| learning_resources: | ||
| - type: "external_link" | ||
| title: "Accounting Equation: What It Is and How You Calculate It" | ||
| url: "https://www.investopedia.com/terms/a/accounting-equation.asp" |
There was a problem hiding this comment.
issue: I checked two of the investopedia links that were removed and they both worked. Is it possible that the script incorrectly removed them due to some transient issue?
I added Christian's fork containing the AI scaffoldings as remote.
I created a branch from his main.
I manually performed some cleanup:
Finally, I wrote a dead links killer script and ran it against all the topics. Dead links are now dead for good!
This changed the frontmatter of all the topics in terms of YAML, formally but not substantially.
I browsed through all the topics: they all look good!