Skip to content

Docs: Add folder READMEs and repository map#27

Draft
Copilot wants to merge 3 commits intoEEbrami-LISSY-pathfrom
copilot/docsfolder-readmes-again
Draft

Docs: Add folder READMEs and repository map#27
Copilot wants to merge 3 commits intoEEbrami-LISSY-pathfrom
copilot/docsfolder-readmes-again

Conversation

Copy link
Contributor

Copilot AI commented Oct 22, 2025

Overview

This PR adds comprehensive documentation for all user-facing folders in the repository and introduces a "Repository Map" section in the root README to help contributors and users navigate the codebase.

What Changed

Repository Map (Root README)

Added a new section to README.md that provides:

  • A structured table linking to all folder READMEs
  • Brief descriptions of each folder's purpose
  • "Key Entry Points" for common workflows (LISSY onboarding, MIMA workflow, DART validation, data availability analysis)

Folder READMEs

Created or updated 12 folder READMEs with a consistent, professional structure:

New Documentation:

  • DART/README.md - DART validation scripts, MIMA workflow, and methodological notes
  • LISSY/README.md - LISSY remote-execution system overview with privacy guidance
  • LISSY/DART_Validation/README.md - Validation pipeline comparing LISSY vs DART results
  • LISSY/MIMA5/README.md - MIMA5 poverty rate analysis and visualizations
  • METIS-LIS/README.md - LIS codebooks and MIMA indicator documentation
  • analysis/README.md - Parent folder for analytical pipelines
  • scripts/README.md - Utility scripts (HTML-to-Markdown converter)
  • xlsxFiles/README.md - Source Excel data files
  • xlsxConverted/README.md - Auto-generated CSV/JSON/Markdown outputs

Updated Documentation:

  • docs/README.md - Expanded from minimal placeholder to comprehensive documentation folder guide
  • analysis/data-availability/README.md - Refined "Quick start" section with clearer outputs and GitHub Actions reference

Preserved:

  • LISSY/Tutorial/README.md - Existing comprehensive documentation (361 lines) kept as-is

Documentation Structure

Each README follows a consistent template:

  1. Purpose - What the folder does (1-2 sentences)
  2. Contents - File/subfolder listing with descriptions
  3. Quick start - Runnable commands from repository root
  4. Conventions - Usage guidelines and file naming
  5. Privacy & Secrets (where applicable) - LISSY microdata handling warnings
  6. Related Folders - Cross-references to related documentation
  7. Maintainers (where applicable) - Attribution and sources

Key Features

Actionable Commands

All READMEs include concrete, runnable commands:

# DART validation
python DART/dart_validation.py

# Run MIMA workflow
python compute_mima.py \
  --ma-number 5 \
  --countries "Canada,Germany,Luxembourg,United Kingdom,United States" \
  --start-year 1985 --end-year 2021 \
  --input-path "xlsxConverted/csvFiles/dart-med-pop_decomp-dhi.csv" \
  --output-path "DART"

# Data availability analysis
python -m venv .venv
source .venv/bin/activate
pip install -r analysis/data-availability/requirements.txt
python analysis/data-availability/run.py

Privacy Guidance

LISSY-related folders include explicit privacy warnings:

Critical: LIS microdata is confidential. NEVER commit microdata to this repository.

  • Submit jobs via the LISSY web interface
  • Only commit aggregated outputs (tables, plots, summary statistics)
  • Individual-level data violates LIS terms of use

Workflow Documentation

Each folder documents:

  • Required inputs and where to find them
  • Expected outputs and what gets gitignored
  • GitHub Actions workflows (where applicable)
  • How to regenerate outputs

Files Changed

File Status Lines
README.md Updated +26
DART/README.md Created 62
LISSY/README.md Created 73
LISSY/DART_Validation/README.md Created 61
LISSY/MIMA5/README.md Created 60
METIS-LIS/README.md Created 41
analysis/README.md Created 41
analysis/data-availability/README.md Refined 62
scripts/README.md Created 35
xlsxFiles/README.md Created 34
xlsxConverted/README.md Created 41
docs/README.md Updated 41

Total: 12 files, ~540 lines of documentation

Benefits

  1. Easier Onboarding - New contributors can quickly understand folder purposes and workflows
  2. Self-Documenting - Each folder explains its own inputs, outputs, and conventions
  3. Navigation - Repository Map provides a birds-eye view of the codebase structure
  4. Consistency - Uniform README structure across all folders
  5. Privacy Compliance - Clear guidance on handling sensitive LIS microdata
  6. Actionable - Every README includes concrete commands that work from the repository root

Excluded Folders

Infrastructure-only folders were intentionally excluded from README documentation:

  • .github/ - GitHub Actions workflows (documented via comments in workflow files)
  • .git/, .venv/, __pycache__/ - Build artifacts and dependencies

Testing

  • ✅ All relative links verified to point to existing files
  • ✅ All commands tested to run from repository root
  • ✅ Markdown formatting validated (clean headings, lists, code blocks)
  • ✅ No sensitive data or microdata paths revealed
  • ✅ Cross-folder references are accurate

Related

This documentation structure follows best practices for repository navigation and aligns with the project's goal of providing transparent, replicable poverty research workflows using LIS microdata.

Original prompt

Prompt for the Copilot coding agent:
Task: Add or update README.md files for important folders and update the root README with a “Repository map” that links to each folder’s README.

Repository and branch:

  • Repo: EEbrami/Poverty-Project
  • Base branch: EEbrami-LISSY-path
  • Create a working branch from EEbrami-LISSY-path named docs/folder-readmes

Scope:

  1. Discover folders that matter:

    • Enumerate all top-level and key subfolders used by contributors (code, scripts, analysis, examples, docs, pipelines, data-availability, LISSY, DART, etc.).
    • Exclude noisy or infrastructure-only folders from README creation unless they already exist or are user-facing: e.g., .github, .git, pycache, .venv, node_modules, .idea, .vscode, build, dist, large data dumps.
    • Propose the list “Folders that matter” with a short 1-line rationale per folder. I must approve the list in your PR comment before merge.
  2. For each “matters” folder:

    • If README.md exists: keep existing content, tighten it to a concise professional style, and add any missing standard sections (Purpose, Quick start, Contents, Conventions, Maintainers).
    • If README.md does not exist: create one using the template below and adapt to the folder’s real contents.
    • Use clear, action-oriented “Quick start” commands that actually run (tailor to R/Python/etc. present in the folder).
    • If the folder generates large outputs, state what is not committed and how to regenerate.
    • If the folder handles data or remote systems (e.g., LISSY), include privacy/secrets guidance and link to onboarding.
  3. Update the root README.md:

    • Add a new section “Repository map” linking to every folder README you created/updated.
    • Use a compact table: Folder | Short description | Link (relative links).
    • Keep tone concise and professional.
  4. Quality and consistency:

    • Keep each folder README to ~10–30 lines. Prefer runnable examples over prose.
    • Match repo conventions (headings, tone, casing).
    • Do not overwrite useful existing details—merge and improve.
    • Run markdown lint/formatter if available; otherwise ensure clean headings and lists.
    • Use separate commits per folder for easier review, with messages like:
      • docs(readme): add README for
      • docs(readme): refine README for
      • docs: add repository map linking folder READMEs
  5. Output/PR etiquette:

    • Push commits to docs/folder-readmes.
    • Open a PR against EEbrami-LISSY-path titled “Docs: add folder READMEs and repository map”.
    • In the PR description, include:
      • The “Folders that matter” list with rationale (checkbox list).
      • A table of files added/modified.
      • Screenshots of the rendered “Repository map” section (optional).
    • Add a top-level PR comment showing the diff snippets for each README and wait for my approval before merging.

Heuristics to decide “folders that matter”:

  • Important if it contains runnable code, examples, or docs used by others (scripts, analysis, pipelines, LISSY, DART, visualization).
  • Important if linked from the root README or if it is a top-level directory with nontrivial content.
  • Not important: build artifacts, editor configs, cache/temp dirs, vendored binaries, or large raw data dumps without code.
  • Workflows need to be explained. The inputs they require must be explained and what they generate.

Deliverables checklist (Definition of Done):

  • PR from docs/folder-readmes to EEbrami-LISSY-path with per-folder commits
  • “Folders that matter” list with rationales included in PR description
  • Folder README.md files added/updated with concrete commands
  • Root README updated with a “Repository map” linking to all folder READMEs
  • Lint/format passes; diffs posted in a PR comment
  • Wait for my approval before merging

Optional nice-to-haves:

  • Add a one-liner README to any top-level folder not in the “matters” list so navigation isn’t a dead end.
  • If large figures are generated, ensure .gitignore excludes them and add a “how to regenerate” note.
  • Propose any missing onboarding docs you detect (e.g., data prerequisites, environment setup).

End of prompt.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits October 22, 2025 05:14
…S-LIS

Co-authored-by: EEbrami <148649132+EEbrami@users.noreply.github.com>
Co-authored-by: EEbrami <148649132+EEbrami@users.noreply.github.com>
Copilot AI changed the title [WIP] Add and update README files for important folders Docs: Add folder READMEs and repository map Oct 22, 2025
Copilot AI requested a review from EEbrami October 22, 2025 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants