feat(parsers): tree-sitter AST grammars for Java, C, C++, C#, Ruby, PHP#1313
Merged
gfargo merged 1 commit intoJun 22, 2026
Merged
Conversation
…y, PHP Upgrades the six regex-first structural parsers to real AST extraction via lazy-loaded tree-sitter grammars (COCO-1239 / #1239). Infrastructure: - Extends LazyTreeSitterLanguageId with java, c, cpp, csharp, ruby, php - Adds SHA-256-pinned manifest entries for all six grammars - Wires lazy cache paths into the runtime language resolver - Adds COCO_PREFETCH aliases: java, c, cpp/c++/cxx, cs/csharp/c#, rb/ruby, php New extractors: - javaTreeSitterParser: class/interface/enum/record/method/constructor, public|protected visibility → exported - cCppTreeSitterParser: combined C+C++ parser that tries tree-sitter-c for .c/.h files (smaller grammar) and tree-sitter-cpp for .cpp/.cc etc. (superset); handles function_definition with nested declarator chains, struct/class/enum/namespace/preproc_def, template declarations - csTreeSitterParser: class/interface/struct/record/enum/method/constructor, public|protected|internal modifier → exported - rubyTreeSitterParser: method/singleton_method/class/module, scope_resolution names unwrapped, all exported: true (Ruby has no static visibility gate) - phpTreeSitterParser: uses tree-sitter-php_only grammar so bare PHP code snippets parse without a <?php tag; function/method/class/interface/trait/ enum, private modifier → exported: false All new parsers follow the established registry pattern: prepended before the regex fallback; surrender gracefully (return undefined) when the .wasm isn't cached so no behaviour change occurs for users who haven't prefetched. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Upgrades the six regex-first structural fast-path parsers (Java, C, C++, C#, Ruby, PHP) to real tree-sitter AST extraction, matching the pattern already established for TypeScript, Python, Rust, and Go.
Why
Plane: COCO-15
How
Infrastructure (lazy-load pipeline)
cache.ts: extendsLazyTreeSitterLanguageIdwithjava | c | cpp | csharp | ruby | phpmanifest.ts: adds SHA-256-pinned WASM manifest entries for all six grammars (versions pinned at edit time, hashes verified against CDN)runtime.ts: wires the six new lazy cache paths into the language resolverprefetch.ts: addsCOCO_PREFETCHaliases —java,c,cpp/c++/cxx,cs/csharp/c#,rb/ruby,phpNew tree-sitter extractors (each in
src/lib/parsers/default/__tree_sitter__/)javaTreeSitterParser.ts—class_declaration,interface_declaration,enum_declaration,record_declaration,method_declaration,constructor_declaration;public|protectedmodifiers → exportedcCppTreeSitterParser.ts— combined C+C++ parser; preferstree-sitter-c(smaller, 613 KB) for.c/.hfiles and falls back totree-sitter-cpp(superset, 3.4 MB) for C++ extensions; handles nested declarator chains (pointer_declarator → function_declarator → identifier), qualified names (Widget::draw), template declarationscsTreeSitterParser.ts—class/interface/struct/record/enum/method/constructor_declaration;public|protected|internalmodifier → exportedrubyTreeSitterParser.ts—method,singleton_method(def self.name),class,module; scope-resolution names unwrapped; all exported (Ruby has no declaration-site visibility gate)phpTreeSitterParser.ts— usestree-sitter-php_onlyWASM so bare PHP code snippets parse without a leading<?phptag;function_definition,method_declaration,class/interface/trait/enum_declaration;privatevisibility → not exportedRegistry update (
structuralParserRegistry.ts)java,cpp,cs,rb,phpundefined) when the.wasmisn't cached — zero behaviour change for users who haven't runCOCO_PREFETCHWASM SHA-256 manifest pins
tree-sitter-java4fdeac4c…tree-sitter-cc852c2a8…tree-sitter-cpp174eb0de…tree-sitter-c-sharp6f69e1ca…tree-sitter-ruby09a96427…tree-sitter-php(php_only)fd1bcff3…Testing
tsc --noEmit --skipLibCheck).wasmnot cached (no behaviour change for existing users)🤖 Generated by the harbor agent loop. Reviewed by a human before merge.