feat: add structured machine summaries to evaluations#444
feat: add structured machine summaries to evaluations#444syahmiharith wants to merge 10 commits intosantifer:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a standardized machine-readable Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI (analyze-patterns.mjs)
participant FS as File System (reports/*.md)
participant Parser as Machine Summary Parser
participant Analyzer as Pattern Analyzer
CLI->>FS: read report files
FS->>Parser: provide report contents
Parser->>Parser: extract fenced "## Machine Summary" YAML (or fallback to legacy markdown parsing)
Parser->>Analyzer: emit normalized report object (company, role, scores.global, gaps, decision, etc.)
Analyzer->>CLI: produce summaries / set exit code
CLI-->>Analyzer: --self-test (optional)
Analyzer-->>CLI: success/failure exit
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@analyze-patterns.mjs`:
- Around line 104-106: The current comment-stripping line in the loop (using
rawLine.replace(/\s+#.*$/, '') assigned to line) will remove content after a
whitespace-preceded '#' even when that '#' is inside quoted strings and corrupt
values parsed later by parseScalar; replace that single-regex approach with
logic that scans rawLine character-by-character to find the first '#' that is
not inside a single- or double-quoted region (and not escaped) and truncate from
that position (or simply leave the line unchanged if no such comment delimiter
is found); alternatively, switch this parsing path to use js-yaml for safe
comment handling and parsing instead of manual stripping—update the code around
rawLine -> line handling and ensure parseScalar receives intact quoted strings.
- Around line 100-137: Replace the hand-rolled parsing branch that picks between
JSON.parse(raw) and parseYamlLikeSummary(raw) with a call to the js-yaml loader
(yamlLoad(raw) / yaml.load) so the code becomes: parsed = yamlLoad(raw);
validate parsed is a non-null object, then filter its entries by
MACHINE_SUMMARY_FIELDS as before; remove or stop using parseYamlLikeSummary and
ensure the yaml loader is imported/aliased as yamlLoad where used.
In `@batch/batch-prompt.md`:
- Around line 168-195: The Machine Summary templates are inconsistent for the
hard_stops field: the main YAML example shows a placeholder item while the
embedded report template uses an empty list, which may cause worker LLMs to
default to [] even when blockers exist; update both Machine Summary occurrences
in this file so they match (either change the YAML example to show hard_stops:
[] or change the embedded report template to include a placeholder item) and
ensure the field name hard_stops and the numeric score/enum semantics remain
unchanged so parsers like analyze-patterns.mjs and modes/oferta.md keep working.
In `@modes/oferta.md`:
- Around line 145-171: The Machine Summary YAML example under the "Machine
Summary" section is inconsistent: the top example shows hard_stops with a
placeholder list item while the embedded report template hardcodes hard_stops:
[] — align them so workers won't omit real blockers; update the "Machine
Summary" example to use hard_stops: [] (and apply the same change for soft_gaps
and top_strengths where applicable) or alternatively change the embedded report
template to show the placeholder list item format, ensuring both occurrences use
the same representation for empty vs. populated lists.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 50d223ac-a2c4-4ba6-820b-0238663c35b2
📒 Files selected for processing (8)
CLAUDE.mdanalyze-patterns.mjsbatch/batch-prompt.mddocs/SCRIPTS.mdmodes/batch.mdmodes/oferta.mdpackage.jsontest-all.mjs
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@analyze-patterns.mjs`:
- Line 351: The current fallback expression for score (parseFloat(e.score) ||
reportData?.scores?.global || 0) treats a legitimate 0 as falsy and will
substitute the global score; fix this by explicitly checking e.score for
null/undefined (or checking that parseFloat(e.score) is a valid number) before
falling back—modify the logic around the score variable (referencing e.score and
reportData?.scores?.global) so that a real 0 is preserved rather than
overwritten by the global or default value.
- Around line 188-215: The scalar fields assigned from parseMachineSummary
(report.remote, report.seniority, report.domain, report.teamSize) can become
arrays if the YAML emits sequences; update the assignment logic in the
machineSummary handling block (where parseMachineSummary() is used) to
defensively coerce or validate values: for each of remote, seniority, domain,
teamSize, if the machineSummary value is a string assign it, otherwise if it's a
number coerce toString(), else leave the existing report value (skip
arrays/objects). This prevents downstream callers like classifyRemote(raw) which
call raw.toLowerCase() from receiving non-string types; use the same pattern
already used for topStrengths/normalizeList but as a simple string
coercion/guard for these scalar fields.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 136a1f7f-13ba-4286-aeed-20df826cdab6
📒 Files selected for processing (3)
analyze-patterns.mjsbatch/batch-prompt.mdmodes/oferta.md
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@analyze-patterns.mjs`:
- Around line 83-87: The normalizeList function currently coerces object inputs
into the string "[object Object]" which pollutes report.gaps; update
normalizeList to detect plain objects (i.e., typeof value === 'object' && value
!== null && !Array.isArray(value)) and return [] for those cases so mappings are
discarded rather than stringified; keep existing handling for arrays,
null/undefined/empty string, and other primitives. Ensure you modify the
normalizeList function (the one used to sanitize hard_stops/soft_gaps) so
malformed mapping inputs are rejected before they reach report.gaps.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4ee50b5a-7ac0-4dc3-a0ac-bb671853c649
📒 Files selected for processing (1)
analyze-patterns.mjs
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@analyze-patterns.mjs`:
- Around line 83-94: normalizeScalar currently drops boolean-ish values (e.g.,
remote: true) which can cause downstream failures; update normalizeScalar to
accept booleans by coercing true/false to "true"/"false" (or String(value)) for
typeof value === 'boolean' so it returns a string rather than null, ensuring
functions like classifyRemote that call .toLowerCase() won’t throw on boolean
inputs; modify the normalizeScalar function accordingly to handle boolean values
alongside string and number cases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f4209083-206b-4fad-b741-38ef7acebcf4
📒 Files selected for processing (1)
analyze-patterns.mjs
| function normalizeList(value) { | ||
| if (Array.isArray(value)) return value.map(v => String(v).trim()).filter(Boolean); | ||
| if (value === null || value === undefined || value === '') return []; | ||
| if (typeof value === 'object') return []; | ||
| return [String(value).trim()].filter(Boolean); | ||
| } | ||
|
|
||
| function normalizeScalar(value) { | ||
| if (typeof value === 'string') return value.trim() || null; | ||
| if (typeof value === 'number' && Number.isFinite(value)) return String(value); | ||
| return null; | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Normalizers look good; optional boolean coercion.
normalizeList now correctly discards mapping inputs, and normalizeScalar safely filters non-scalar types before they reach downstream consumers like classifyRemote (which calls .toLowerCase()). One edge case: if an agent ever emits remote: true or confidence: yes (the latter is still a plain string under YAML 1.2 core schema, but a boolean under YAML 1.1 legacy), normalizeScalar will drop it silently. If you want to be forgiving of boolean-ish values, a one-liner handles it:
🛡️ Optional defensive coercion for booleans
function normalizeScalar(value) {
if (typeof value === 'string') return value.trim() || null;
if (typeof value === 'number' && Number.isFinite(value)) return String(value);
+ if (typeof value === 'boolean') return String(value);
return null;
}Low-priority given the spec mandates strings and js-yaml@4.1.1 uses the core schema by default.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@analyze-patterns.mjs` around lines 83 - 94, normalizeScalar currently drops
boolean-ish values (e.g., remote: true) which can cause downstream failures;
update normalizeScalar to accept booleans by coercing true/false to
"true"/"false" (or String(value)) for typeof value === 'boolean' so it returns a
string rather than null, ensuring functions like classifyRemote that call
.toLowerCase() won’t throw on boolean inputs; modify the normalizeScalar
function accordingly to handle boolean values alongside string and number cases.
|
Hi, parseMachineSummary hard-codes LF newlines in its regex, so reports with CRLF line endings will not match and the Machine Summary YAML block will be silently ignored. Severity: action required | Category: reliability How to fix: Support CRLF in regex Agent prompt to fix - you can give this to your LLM of choice:
Found by Qodo. Free code review for open-source maintainers. |
… PDFs, archived 1 below-threshold) - Scan: scan.mjs (Level 1/2) found 0 new from 73 companies - Level 3 WebSearch (Greenhouse FDE + small-ATS Breezy/Remotive + Ashby) → 4 new URLs - Pipeline: 3 reports written (santifer#442 Amplitude FDE II 3.4/5, santifer#443 AMFG AI Engineer 1.7/5, santifer#444 Continued AI Engineer 3.7/5) - 1 URL closed: Mercor (Ashby ApiJobPosting null on 2026-05-03); archived in pipeline.md - 2 PDFs generated for score >= 3.0 (Amplitude, Continued) - merge-tracker.mjs: +3 added | verify-pipeline.mjs: 0 errors / 0 warnings - cleanup-low-scores.mjs: 1 archived to reports/below-threshold/ (santifer#443 AMFG) https://claude.ai/code/session_1777781731
Closes #442. Summary: adds a standardized fenced YAML Machine Summary block to interactive and batch evaluation reports; updates analyze-patterns.mjs to prefer the structured block while preserving legacy markdown fallbacks; documents the patterns command and adds a parser self-test to the quick test suite. Tests: node --check analyze-patterns.mjs; node analyze-patterns.mjs --self-test; node test-all.mjs --quick.
Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Tests
Chores