Skip to content

add Pascal/Lazarus support and stabilize LLM semantic extraction#682

Open
JClarQ wants to merge 4 commits intosafishamsi:v6from
JClarQ:v6
Open

add Pascal/Lazarus support and stabilize LLM semantic extraction#682
JClarQ wants to merge 4 commits intosafishamsi:v6from
JClarQ:v6

Conversation

@JClarQ
Copy link
Copy Markdown

@JClarQ JClarQ commented May 3, 2026

Summary

This pull request introduces comprehensive support for FreePascal and Lazarus projects while significantly improving the stability and resilience of the semantic extraction pipeline. It integrates robust AST extraction for Pascal files and implements a JSON repair mechanism to handle truncated or malformed LLM responses effectively.

Key Changes

1. FreePascal & Lazarus Support

  • Language Detection: Updated detect.py to recognize .pas, .pp, .inc, .lpr, .lfm, and .lpi extensions.
  • Tree-sitter Integration: Integrated tree-sitter-language-pack via pyproject.toml for dynamic Pascal grammar support.
  • Pascal Extraction: Implemented unit dependency resolution by extracting module names from uses clauses.
  • Lazarus Integration: Added support for Lazarus project (.lpi) and form (.lfm) files.

2. Semantic Extraction Stabilization

  • Adaptive JSON Repair: Added a _repair_json utility to recover partial data from truncated LLM outputs (e.g., gpt-5.4-mini) by automatically closing unclosed strings and brackets.
  • Robust Parsing: Updated _parse_llm_json to attempt repairs before discarding data, increasing resilience against token limits.
  • Refined Retry Logic: Modified _extract_with_adaptive_retry to distinguish between recoverable truncation and logic errors, reducing log noise and preventing unnecessary retries.

3. CLI & Backend Improvements

  • OpenAI Support: Added openai as a supported backend with gpt-5.4-mini as the default model.
  • Model Overrides: Added support for the GRAPHIFY_MODEL environment variable to allow easy model switching without code changes.
  • Semantic Update Flag: Implemented the --semantic flag for the update command to allow manual triggering of LLM-based enrichment during project updates.

Verification

  • Validated AST extraction and dependency mapping for complex Pascal projects.
  • Verified that the JSON repair utility successfully recovers data from intentionally truncated LLM streams.
  • Confirmed that the --semantic flag correctly triggers enrichment and merges findings into the knowledge graph.

JClarQ added 4 commits May 3, 2026 12:50
- Implement _repair_json to recover partial data from truncated LLM responses
- Update _parse_llm_json to attempt repair before failing
- Refine _extract_with_adaptive_retry to silence parse errors during recursive splitting
- Ensure errors are only reported when recursion depth is exhausted or for logic errors
@Qodo-Free-For-OSS
Copy link
Copy Markdown

Hi, Pascal uses imports are emitted as edges to _make_id(unitName) targets, but no corresponding nodes exist and file node IDs are path-derived, so build_from_json() drops these edges and the new unit dependency extraction doesn’t surface in the final graph.

Severity: action required | Category: correctness

How to fix: Resolve units to file IDs

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

Pascal uses imports are currently dropped because edge targets don’t match any existing node IDs.

Issue Context

File node IDs are path-based, and build_from_json() only keeps edges whose endpoints exist as nodes.

Fix Focus Areas

  • graphify/extract.py[814-846]
  • graphify/extract.py[958-969]
  • graphify/build.py[75-103]

Implementation notes

  • Update _import_pascal() to resolve unit names to likely file paths (e.g., sibling UnitName.pas / unitname.pp / unitname.inc), and use _make_id(str(resolved_path)) so the target matches the imported unit’s file node ID.
  • If resolution fails, consider emitting a lightweight placeholder node for the unit name so the import edge survives (optional, but must conform to node schema).
  • Add a small test fixture covering uses Foo; connecting to foo.pas in the same directory.

Spotted by Qodo code review - free for open-source projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants