Skip to content

feat: add structured machine summaries to evaluations#444

Open
syahmiharith wants to merge 10 commits intosantifer:mainfrom
syahmiharith:machine-summary-evaluations
Open

feat: add structured machine summaries to evaluations#444
syahmiharith wants to merge 10 commits intosantifer:mainfrom
syahmiharith:machine-summary-evaluations

Conversation

@syahmiharith
Copy link
Copy Markdown
Contributor

@syahmiharith syahmiharith commented Apr 24, 2026

Closes #442. Summary: adds a standardized fenced YAML Machine Summary block to interactive and batch evaluation reports; updates analyze-patterns.mjs to prefer the structured block while preserving legacy markdown fallbacks; documents the patterns command and adds a parser self-test to the quick test suite. Tests: node --check analyze-patterns.mjs; node analyze-patterns.mjs --self-test; node test-all.mjs --quick.

Summary by CodeRabbit

  • New Features

    • Reports now include a machine-readable "Machine Summary" YAML block for automation and pattern analysis.
    • Added a pattern-analysis CLI with a self-test mode and an npm script to run it.
  • Documentation

    • Guides and report templates updated to specify the structured summary format and script usage.
  • Bug Fixes

    • Parser improvements: deduplicates gaps and more robustly selects global scores.
  • Tests

    • Test suite runs the parser self-check as part of validation.
  • Chores

    • Small CI/workflow input-name updates.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a standardized machine-readable ## Machine Summary YAML block to evaluation reports; updates analyzer (analyze-patterns.mjs) to parse it (with legacy fallback), adds a --self-test flag, an npm patterns script, test integration, docs, and minor CI input renames.

Changes

Cohort / File(s) Summary
Machine Summary spec & templates
batch/batch-prompt.md, modes/oferta.md, modes/batch.md
Define and require a ## Machine Summary YAML section in saved reports with an exact schema (typed fields, enums, numeric-only score, empty-array rules, confidence guidance); place block after A–G and global score.
Pattern analysis CLI & logic
analyze-patterns.mjs
Add YAML parsing (js-yaml) and fenced ## Machine Summary extraction; allowlist/map fields into normalized report (company, role, scores.global, legitimacyTier, finalDecision, riskLevel, confidence, nextAction, topStrengths); map hard_stops/soft_gaps to report.gaps and dedupe gaps case-insensitively; prevent overwriting Machine Summary-derived values; fallback score logic using tracker or reportData?.scores?.global; add --self-test CLI flag that asserts parser behavior and returns non‑zero on failure.
Scripts & tests
package.json, test-all.mjs
Add patterns npm script (node analyze-patterns.mjs); test runner uses process.execPath and includes analyze-patterns.mjs --self-test in test set.
Docs & README updates
docs/SCRIPTS.md, CLAUDE.md
Document the new patterns script, options (--summary, --min-threshold, --self-test), exit codes, and update CLAUDE.md Main Files table to note reports/ contain the ## Machine Summary block.
CI workflow param names
.github/workflows/welcome.yml
Rename workflow inputs passed to actions/first-interaction@v3 from repo-token/pr-message/issue-message to repo_token/pr_message/issue_message.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (analyze-patterns.mjs)
    participant FS as File System (reports/*.md)
    participant Parser as Machine Summary Parser
    participant Analyzer as Pattern Analyzer
    CLI->>FS: read report files
    FS->>Parser: provide report contents
    Parser->>Parser: extract fenced "## Machine Summary" YAML (or fallback to legacy markdown parsing)
    Parser->>Analyzer: emit normalized report object (company, role, scores.global, gaps, decision, etc.)
    Analyzer->>CLI: produce summaries / set exit code
    CLI-->>Analyzer: --self-test (optional)
    Analyzer-->>CLI: success/failure exit
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and concisely describes the main objective: adding structured machine summaries to evaluations, which is the core change across multiple files.
Linked Issues check ✅ Passed All code requirements from issue #442 are met: Machine Summary YAML block added to evaluation reports [oferta.md, batch.md, batch-prompt.md], analyze-patterns.mjs updated to parse it with allowlisting, documentation added to SCRIPTS.md, and self-test added to test suite.
Out of Scope Changes check ✅ Passed All changes align with issue #442 scope. Incidental changes to test-all.mjs (process.execPath fix), package.json (patterns script), CLAUDE.md (documentation), and welcome.yml workflow (parameter names) are minor supporting modifications that enable the core feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@analyze-patterns.mjs`:
- Around line 104-106: The current comment-stripping line in the loop (using
rawLine.replace(/\s+#.*$/, '') assigned to line) will remove content after a
whitespace-preceded '#' even when that '#' is inside quoted strings and corrupt
values parsed later by parseScalar; replace that single-regex approach with
logic that scans rawLine character-by-character to find the first '#' that is
not inside a single- or double-quoted region (and not escaped) and truncate from
that position (or simply leave the line unchanged if no such comment delimiter
is found); alternatively, switch this parsing path to use js-yaml for safe
comment handling and parsing instead of manual stripping—update the code around
rawLine -> line handling and ensure parseScalar receives intact quoted strings.
- Around line 100-137: Replace the hand-rolled parsing branch that picks between
JSON.parse(raw) and parseYamlLikeSummary(raw) with a call to the js-yaml loader
(yamlLoad(raw) / yaml.load) so the code becomes: parsed = yamlLoad(raw);
validate parsed is a non-null object, then filter its entries by
MACHINE_SUMMARY_FIELDS as before; remove or stop using parseYamlLikeSummary and
ensure the yaml loader is imported/aliased as yamlLoad where used.

In `@batch/batch-prompt.md`:
- Around line 168-195: The Machine Summary templates are inconsistent for the
hard_stops field: the main YAML example shows a placeholder item while the
embedded report template uses an empty list, which may cause worker LLMs to
default to [] even when blockers exist; update both Machine Summary occurrences
in this file so they match (either change the YAML example to show hard_stops:
[] or change the embedded report template to include a placeholder item) and
ensure the field name hard_stops and the numeric score/enum semantics remain
unchanged so parsers like analyze-patterns.mjs and modes/oferta.md keep working.

In `@modes/oferta.md`:
- Around line 145-171: The Machine Summary YAML example under the "Machine
Summary" section is inconsistent: the top example shows hard_stops with a
placeholder list item while the embedded report template hardcodes hard_stops:
[] — align them so workers won't omit real blockers; update the "Machine
Summary" example to use hard_stops: [] (and apply the same change for soft_gaps
and top_strengths where applicable) or alternatively change the embedded report
template to show the placeholder list item format, ensuring both occurrences use
the same representation for empty vs. populated lists.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 50d223ac-a2c4-4ba6-820b-0238663c35b2

📥 Commits

Reviewing files that changed from the base of the PR and between b8a3a12 and 81b4e89.

📒 Files selected for processing (8)
  • CLAUDE.md
  • analyze-patterns.mjs
  • batch/batch-prompt.md
  • docs/SCRIPTS.md
  • modes/batch.md
  • modes/oferta.md
  • package.json
  • test-all.mjs

Comment thread analyze-patterns.mjs Outdated
Comment thread analyze-patterns.mjs Outdated
Comment thread batch/batch-prompt.md
Comment thread modes/oferta.md
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@analyze-patterns.mjs`:
- Line 351: The current fallback expression for score (parseFloat(e.score) ||
reportData?.scores?.global || 0) treats a legitimate 0 as falsy and will
substitute the global score; fix this by explicitly checking e.score for
null/undefined (or checking that parseFloat(e.score) is a valid number) before
falling back—modify the logic around the score variable (referencing e.score and
reportData?.scores?.global) so that a real 0 is preserved rather than
overwritten by the global or default value.
- Around line 188-215: The scalar fields assigned from parseMachineSummary
(report.remote, report.seniority, report.domain, report.teamSize) can become
arrays if the YAML emits sequences; update the assignment logic in the
machineSummary handling block (where parseMachineSummary() is used) to
defensively coerce or validate values: for each of remote, seniority, domain,
teamSize, if the machineSummary value is a string assign it, otherwise if it's a
number coerce toString(), else leave the existing report value (skip
arrays/objects). This prevents downstream callers like classifyRemote(raw) which
call raw.toLowerCase() from receiving non-string types; use the same pattern
already used for topStrengths/normalizeList but as a simple string
coercion/guard for these scalar fields.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 136a1f7f-13ba-4286-aeed-20df826cdab6

📥 Commits

Reviewing files that changed from the base of the PR and between 6e2896a and 347ff7d.

📒 Files selected for processing (3)
  • analyze-patterns.mjs
  • batch/batch-prompt.md
  • modes/oferta.md

Comment thread analyze-patterns.mjs
Comment thread analyze-patterns.mjs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@analyze-patterns.mjs`:
- Around line 83-87: The normalizeList function currently coerces object inputs
into the string "[object Object]" which pollutes report.gaps; update
normalizeList to detect plain objects (i.e., typeof value === 'object' && value
!== null && !Array.isArray(value)) and return [] for those cases so mappings are
discarded rather than stringified; keep existing handling for arrays,
null/undefined/empty string, and other primitives. Ensure you modify the
normalizeList function (the one used to sanitize hard_stops/soft_gaps) so
malformed mapping inputs are rejected before they reach report.gaps.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4ee50b5a-7ac0-4dc3-a0ac-bb671853c649

📥 Commits

Reviewing files that changed from the base of the PR and between 347ff7d and 1ad6475.

📒 Files selected for processing (1)
  • analyze-patterns.mjs

Comment thread analyze-patterns.mjs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@analyze-patterns.mjs`:
- Around line 83-94: normalizeScalar currently drops boolean-ish values (e.g.,
remote: true) which can cause downstream failures; update normalizeScalar to
accept booleans by coercing true/false to "true"/"false" (or String(value)) for
typeof value === 'boolean' so it returns a string rather than null, ensuring
functions like classifyRemote that call .toLowerCase() won’t throw on boolean
inputs; modify the normalizeScalar function accordingly to handle boolean values
alongside string and number cases.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f4209083-206b-4fad-b741-38ef7acebcf4

📥 Commits

Reviewing files that changed from the base of the PR and between 1ad6475 and 01a2e21.

📒 Files selected for processing (1)
  • analyze-patterns.mjs

Comment thread analyze-patterns.mjs
Comment on lines +83 to +94
function normalizeList(value) {
if (Array.isArray(value)) return value.map(v => String(v).trim()).filter(Boolean);
if (value === null || value === undefined || value === '') return [];
if (typeof value === 'object') return [];
return [String(value).trim()].filter(Boolean);
}

function normalizeScalar(value) {
if (typeof value === 'string') return value.trim() || null;
if (typeof value === 'number' && Number.isFinite(value)) return String(value);
return null;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Normalizers look good; optional boolean coercion.

normalizeList now correctly discards mapping inputs, and normalizeScalar safely filters non-scalar types before they reach downstream consumers like classifyRemote (which calls .toLowerCase()). One edge case: if an agent ever emits remote: true or confidence: yes (the latter is still a plain string under YAML 1.2 core schema, but a boolean under YAML 1.1 legacy), normalizeScalar will drop it silently. If you want to be forgiving of boolean-ish values, a one-liner handles it:

🛡️ Optional defensive coercion for booleans
 function normalizeScalar(value) {
   if (typeof value === 'string') return value.trim() || null;
   if (typeof value === 'number' && Number.isFinite(value)) return String(value);
+  if (typeof value === 'boolean') return String(value);
   return null;
 }

Low-priority given the spec mandates strings and js-yaml@4.1.1 uses the core schema by default.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze-patterns.mjs` around lines 83 - 94, normalizeScalar currently drops
boolean-ish values (e.g., remote: true) which can cause downstream failures;
update normalizeScalar to accept booleans by coercing true/false to
"true"/"false" (or String(value)) for typeof value === 'boolean' so it returns a
string rather than null, ensuring functions like classifyRemote that call
.toLowerCase() won’t throw on boolean inputs; modify the normalizeScalar
function accordingly to handle boolean values alongside string and number cases.

@Qodo-Free-For-OSS
Copy link
Copy Markdown

Hi, parseMachineSummary hard-codes LF newlines in its regex, so reports with CRLF line endings will not match and the Machine Summary YAML block will be silently ignored.

Severity: action required | Category: reliability

How to fix: Support CRLF in regex

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

parseMachineSummary() matches the Machine Summary fenced block using a regex that assumes LF-only (\n) newlines. On Windows or repos checked out with CRLF (\r\n), the match fails and the Machine Summary is ignored.

Issue Context

This causes structured fields (e.g., score, hard_stops, soft_gaps) not to be extracted even when present, reducing reliability of the new feature.

Fix Focus Areas

  • analyze-patterns.mjs[96-111]

Suggested change

Update the regex to accept CRLF, e.g. replace \n with \r?\n (or more generally (?:\r?\n)+ where appropriate). Consider adding a self-test case using \r\n line endings to prevent regression.


Found by Qodo. Free code review for open-source maintainers.

deepak-glitch pushed a commit to deepak-glitch/career-ops that referenced this pull request May 3, 2026
… PDFs, archived 1 below-threshold)

- Scan: scan.mjs (Level 1/2) found 0 new from 73 companies
- Level 3 WebSearch (Greenhouse FDE + small-ATS Breezy/Remotive + Ashby) → 4 new URLs
- Pipeline: 3 reports written (santifer#442 Amplitude FDE II 3.4/5, santifer#443 AMFG AI Engineer 1.7/5, santifer#444 Continued AI Engineer 3.7/5)
- 1 URL closed: Mercor (Ashby ApiJobPosting null on 2026-05-03); archived in pipeline.md
- 2 PDFs generated for score >= 3.0 (Amplitude, Continued)
- merge-tracker.mjs: +3 added | verify-pipeline.mjs: 0 errors / 0 warnings
- cleanup-low-scores.mjs: 1 archived to reports/below-threshold/ (santifer#443 AMFG)

https://claude.ai/code/session_1777781731
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add structured insight summary for evaluations and downstream pattern analysis

2 participants