Skip to content
This repository was archived by the owner on Dec 12, 2025. It is now read-only.

Topics scaffoldings#70

Open
fabioscagliola wants to merge 10 commits intomainfrom
scaffoldings
Open

Topics scaffoldings#70
fabioscagliola wants to merge 10 commits intomainfrom
scaffoldings

Conversation

@fabioscagliola
Copy link
Copy Markdown
Contributor

@fabioscagliola fabioscagliola commented Dec 2, 2025

I added Christian's fork containing the AI scaffoldings as remote.
I created a branch from his main.
I manually performed some cleanup:

  • I removed redundant headings as "learning resources" is added automatically
  • I removed redundant descriptions from bodies as they now come from the frontmatter

Finally, I wrote a dead links killer script and ran it against all the topics. Dead links are now dead for good!
This changed the frontmatter of all the topics in terms of YAML, formally but not substantially.
I browsed through all the topics: they all look good!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds AI-generated scaffolding content to the learning resources project, implementing a systematic approach to populate missing or stub topic pages with structured content and learning resources. The PR includes a Python script to remove dead links and performs YAML frontmatter reformatting across all topic files.

Key Changes

  • Dead link removal script: Adds kill_the_dead_links.py to validate and remove inaccessible URLs from learning resources
  • AI scaffolding documentation: Updates README.md with guidelines for AI-generated content including marking, structure, and resource selection criteria
  • YAML formatting standardization: Reformats all topic frontmatter from inline flow style to block style (e.g., - type: external_link instead of - type: "external_link")

Reviewed changes

Copilot reviewed 80 out of 82 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
scripts/kill_the_dead_links.py New Python script to validate URLs and remove dead links from topic learning resources
scripts/requirements.txt Dependencies for the dead links script (python-frontmatter, requests)
README.md Documents AI scaffolding approach, content guidelines, and resource selection criteria
.github/copilot-instructions.md Adds references to design principles and data architecture documentation
docs/design-principles.md Formatting improvements with consistent blank lines
docs/data-architecture.md Formatting improvements with consistent section spacing
src/topics/*.md (87 files) YAML frontmatter reformatting and addition of AI-generated content with descriptions, learning resources, and cross-references
Comments suppressed due to low confidence (1)

scripts/kill_the_dead_links.py:23

  • Except block directly handles BaseException.
            except:

Comment on lines +14 to +25
def kill_the_dead_links(path):
print(f"Processing file \"{path}\"..")
post = frontmatter.load(path)
if "learning_resources" in post.metadata:
for resource in post.metadata["learning_resources"]:
print(f"Requesting URL \"{resource['url']}\"..")
try:
response = requests.get(resource["url"], timeout=5)
response.raise_for_status()
except:
print(f"Dead link found: {resource['url']}")
post.metadata["learning_resources"].remove(resource)
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying a list while iterating over it with remove() can cause items to be skipped. When an item is removed, the iterator's index doesn't adjust, potentially skipping the next element. Use a list comprehension or iterate over a copy: post.metadata["learning_resources"] = [resource for resource in post.metadata["learning_resources"] if not is_dead(resource)]

Suggested change
def kill_the_dead_links(path):
print(f"Processing file \"{path}\"..")
post = frontmatter.load(path)
if "learning_resources" in post.metadata:
for resource in post.metadata["learning_resources"]:
print(f"Requesting URL \"{resource['url']}\"..")
try:
response = requests.get(resource["url"], timeout=5)
response.raise_for_status()
except:
print(f"Dead link found: {resource['url']}")
post.metadata["learning_resources"].remove(resource)
def is_dead(resource):
print(f"Requesting URL \"{resource['url']}\"..")
try:
response = requests.get(resource["url"], timeout=5)
response.raise_for_status()
return False
except:
print(f"Dead link found: {resource['url']}")
return True
def kill_the_dead_links(path):
print(f"Processing file \"{path}\"..")
post = frontmatter.load(path)
if "learning_resources" in post.metadata:
post.metadata["learning_resources"] = [
resource for resource in post.metadata["learning_resources"]
if not is_dead(resource)
]

Copilot uses AI. Check for mistakes.
README.md Outdated
- Learning resources need to come from sources widely accepted in expert communities
- Original documentation is preferrable
- Youtube videos from widely accepted channels with a long history of reliable information are acceptable
- Freely accessible ressources are preferred to paid-for offers
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "ressources" should be "resources".

Suggested change
- Freely accessible ressources are preferred to paid-for offers
- Freely accessible resources are preferred to paid-for offers

Copilot uses AI. Check for mistakes.
README.md Outdated
- Provide the structure as laid out in the data architecture
- Add learning resources for all levels
- Learning resources need to come from sources widely accepted in expert communities
- Original documentation is preferrable
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "preferrable" should be "preferable".

Suggested change
- Original documentation is preferrable
- Original documentation is preferable

Copilot uses AI. Check for mistakes.
README.md Outdated
- Add learning resources for all levels
- Learning resources need to come from sources widely accepted in expert communities
- Original documentation is preferrable
- Youtube videos from widely accepted channels with a long history of reliable information are acceptable
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalization: "Youtube" should be "YouTube" (proper brand name).

Suggested change
- Youtube videos from widely accepted channels with a long history of reliable information are acceptable
- YouTube videos from widely accepted channels with a long history of reliable information are acceptable

Copilot uses AI. Check for mistakes.
README.md Outdated
- `src/` - This folder contains definition files that outline the archetypes, topics, and learning resources in Markdown format.
- `src/archetypes/` - Contains archetype definitions, which describe the competencies and skills expected at various stages of a software engineer's career.
- `src/topics/` - Contains definitions of topics that are relevant to the archetypes, detailing the specific skills, knowledge areas and any learning resources associated with them.
- `src/topics/` - Contains definitions of topics that are relevant to the archetypes, detailing the specific skills, knowledge areas and any learning resources associated with them.
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace on this line. Consider removing it to maintain consistency.

Suggested change
- `src/topics/` - Contains definitions of topics that are relevant to the archetypes, detailing the specific skills, knowledge areas and any learning resources associated with them.
- `src/topics/` - Contains definitions of topics that are relevant to the archetypes, detailing the specific skills, knowledge areas and any learning resources associated with them.

Copilot uses AI. Check for mistakes.
{{ render_description() }}

- No newline at end of file
{{ render_learning_resources() }} No newline at end of file
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{{ render_learning_resources() }} outputs link title and url from frontmatter without validation/escaping, allowing injection of malicious links (e.g., javascript: URLs) or HTML in titles. An attacker could submit a resource that executes JS when clicked. Fix by strictly validating URLs to http/https schemes and escaping titles/URLs in the macro before rendering.

Copilot uses AI. Check for mistakes.
*Generated by AI, pending community review.*

## Learning Materials
{{ render_description() }}
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{{ render_description() }} returns frontmatter description directly without sanitization, enabling stored XSS if HTML/JS is included in the description. A malicious PR could add <script> in frontmatter that executes when the page is viewed. Fix by escaping/sanitizing in render_description() and/or enabling HTML sanitization in MkDocs Material.

Copilot uses AI. Check for mistakes.
{{ render_description() }}

- No newline at end of file
{{ render_learning_resources() }} No newline at end of file
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{{ render_learning_resources() }} emits titles/URLs from frontmatter without escaping or URL scheme checks, allowing injection of javascript: links or HTML in titles. A crafted resource could cause script execution on click. Fix by whitelisting http/https schemes and escaping titles/URLs inside render_learning_resources().

Copilot uses AI. Check for mistakes.
*Generated by AI, pending community review.*

## Learning Materials
{{ render_description() }}
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly added {{ render_description() }} macro outputs raw frontmatter without sanitization, creating a stored XSS risk if HTML/JS is placed in description. An attacker could add <script> in a PR and have it execute for readers. Fix by escaping/sanitizing in render_description() and configure MkDocs to disallow raw HTML or only allow a safe subset.

Copilot uses AI. Check for mistakes.
{{ render_description() }}

- No newline at end of file
{{ render_learning_resources() }} No newline at end of file
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{{ render_learning_resources() }} renders title and url directly from frontmatter, allowing injection of unsafe links (e.g., javascript:) or HTML in titles. A malicious resource could trigger script execution on click. Fix by validating URL schemes (only http/https) and escaping titles/URLs inside the macro.

Copilot uses AI. Check for mistakes.
@fabioscagliola fabioscagliola changed the title Scaffoldings Topics scaffoldings Dec 3, 2025
Copy link
Copy Markdown
Contributor

@pburls pburls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something doesn't look to have worked correctly. Some working links from the FSI topics were removed.

learning_resources:
- type: "external_link"
title: "Accounting Equation: What It Is and How You Calculate It"
url: "https://www.investopedia.com/terms/a/accounting-equation.asp"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I checked two of the investopedia links that were removed and they both worked. Is it possible that the script incorrectly removed them due to some transient issue?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants