Conversation
| (new_v : a) : ast_check_result = | ||
| match get_std_pair fg old_v new_v with | ||
| | None -> | ||
| (* TODO: Repl_file and Documentation have no std AST, so we skip the |
There was a problem hiding this comment.
I believe this is a preexisting bug
| let dump (type a) (fg : a t) fmt (v : a) = | ||
| match get_std fg v with | ||
| | Some (Std_value (std_fg, std_v)) -> Std_ast.Printast.ast std_fg fmt std_v | ||
| | None -> Printast.ast fg fmt v |
There was a problem hiding this comment.
We now properly dump ast for Documentation and Repl_file
| List.map ~f:mk_cmt (Lexer.comments ()) | ||
| in | ||
| let tokens = | ||
| (* mld files can not always be lexed using the ocaml lexer *) |
There was a problem hiding this comment.
This is no longer true, parsing documentation no longer goes through this help function.
|
I know this part of the code was a bit convoluted, thanks for trying to simplify it. Looks a bit hard to review though |
Remove `equal_core_type` and `equal` from Extended_ast, and `ast` and `equal` from Normalize_extended_ast. These functions are not used anywhere in the codebase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| (** A parsed fragment carrying both the extended AST (used for formatting) | ||
| and the standard AST (used to verify formatting preserves semantics, | ||
| see {!Std_ast}). *) | ||
| type ('a, 'b) paired = {extended: 'a; std: 'b} |
There was a problem hiding this comment.
I'm not sure to understand what this type provides. Do we want to keep the two AST paired anywhere outside of Translation_unit ?
What are the benefits from moving the code from Parse_with_comments into Extended_ast ?
There was a problem hiding this comment.
Parse_with_comments.parse should only be used for "ocaml code", code parsed by the ocaml parser/lexer. It should not be used for Documentation for instance (it's wrong in the main branch).
We need to dispatch on the syntax to call it only when appropriate, the dispatch is already done in Extended_ast, it feels natural to move it there.
There was a problem hiding this comment.
I'm not sure to understand what this type provides. Do we want to keep the two AST paired anywhere outside of
Translation_unit?
I've been trying to minimize behavior changes in the PR.
We could probably parse the Std_ast only when needed in equivalent, saving the parsing for files already properly formatted. If we go that direction, we would need to have access to the source as string so that we can parse the std_ast.
There was a problem hiding this comment.
We do have the source as string in Translation_unit, both before and after formatting. Perhaps Parse_with_comments could be extended with the dispatch ? (and possibly renamed)
There was a problem hiding this comment.
I can possible move back some parsing logic into Parse_with_comments. But I'm not sure it gives much benefit.
There was a problem hiding this comment.
I've added back Parse_with_comments.. Let me know what you think
There was a problem hiding this comment.
I'm not sure to see the point of keeping both Parse_with_comments and the paired type. I think I prefer to keep Parse_with_comments because otherwise, we just add a lot of code into Extended_ast that in fact work with something else than the extended ast.
I don't think the equivalent function make much sense. It should be 'a Std_ast.t -> ... -> 'a -> 'a -> ast_check_result and not work with the extended AST at all.
It also can reasonably be string -> string -> .. and we could unify Std_ast.t and Extended_ast.t:
val ext_parse : ('ext, 'std) t -> string -> 'ext
val std_parse : ('ext, 'std) t -> string -> 'std
val equivalent :
('ext, 'std) t
-> ...
-> string
-> string
(* or 'std -> 'std -> *)
-> ast_check_result
There was a problem hiding this comment.
The equivalent signature you propose forces to parse the same string multiple time when iterating to find a fix point. This is something Ive optimized long time ago in the early days of ocamlformat.
There was a problem hiding this comment.
equivalentcan't take a 'std because not all format have one.- Extracting comment is very much linked to the implementation of the parser. It seems unnatural to have the logic spread across to files.
- Parse_with_comment and paired are orthogonal.
because otherwise, we just add a lot of code into Extended_ast that in fact work with something else than the extended ast.
The Std_ast is pretty much an implementation detail of the check of some of the file format
There was a problem hiding this comment.
@Julow, think more about this. I wonder if we should inline fields from the "with_comment" record into Extended_ast.t.
- the prefix is only relevent for use_file (and structure maybe)
- comments is not relevent for documentation I think
- Source is a bit weird because it deal with "ocaml" token. which we don't exactly have/want for mll mly files for instance.
The change would mean that an Extended_ast.t contains every thing needed for printing and checking.
Source.ml only needs Parsetree types (locations, tokens), not the full Extended_ast module. Use `open Parsetree` directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make `normalize_code` a parameter of Normalize_std_ast functions instead of importing it from Normalize_extended_ast. This decouples the two normalization modules, with callers responsible for passing the normalize_code function. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extended_ast.t GADT now uses paired wrapper types that embed both the
extended and standard ASTs: `('a, 'b) paired = {extended: 'a; std: 'b}`.
For the 7 OCaml fragment types (Structure, Signature, Use_file,
Core_type, Module_type, Expression, Pattern), parsing produces both
ASTs in one operation. Repl_file and Documentation have no std
counterpart.
Parse.ast now internally calls Std_ast.Parse.ast. All code that
consumes the AST (map, Printast.ast, Fmt_ast, Cmts) works through the
paired type, accessing .extended for formatting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add functions that use the embedded std ASTs from the paired GADT: - `get_std`: extract the std AST with its Std_ast.t witness - `dump` / `dump_normalized`: print std AST for debug output - `equivalent`: check AST preservation using the std ASTs, returning Ast_preserved | Docstrings_moved _ | Ast_changed These will replace the manual Normalize_std_ast calls in Translation_unit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move the parsing pipeline (warnings, hash-bang, tokens, Source.t creation) from Parse_with_comments into Extended_ast: - Extended_ast.parse is the full parsing pipeline - Extended_ast.parse_toplevel dispatches Use_file vs Repl_file - Extended_ast.Parsed.t replaces Parse_with_comments.with_comments - Extended_ast.Warning50 replaces Parse_with_comments.Warning50 - Delete Parse_with_comments.ml/.mli Simplify Translation_unit: remove std_fg/std_parsed parameters from format and parse_and_format, use Extended_ast.equivalent for AST validation. Translation_unit no longer references Std_ast directly. Rewrite printast tool to use the unified Extended_ast.parse API. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Std_ast becomes a purely internal implementation detail. Remove constructors and functions that are no longer needed now that Translation_unit no longer references Std_ast directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add docstring explaining that the standard parser is used to verify formatting preserves semantics: two programs are equivalent if the compiler parses them identically. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The PR is now split into more digest commits |
e6c6c94 to
9c3fbe0
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Re-create Parse_with_comments with the parsing pipeline (Parsed type, Warning50 exception, parse, parse_toplevel) that was absorbed into Extended_ast. Re-export Extended_ast.Parse in the .mli. Update all callers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
While working on adding support for mll and mly files. I got very confused with the distinction between Extented_ast and Std_ast.
I started following what was done for
Repl_fileandDocumentationinStd_ast, that is, "no support". But then realized that this mean that we don't check the ast is preserved. (#2557 (comment))I figured it was more natural to keep the use of
Std_astan implementation detail of the equivalence check for some of the file format inExtended_astonly.This is an attempt at improving the current situation.
Commits
equal_core_type,equal,Normalize_extended_ast.equal/astopen Parsetreeinstead ofopen Extended_astnormalize_codea parameter instead of importing it('a, 'b) pairedrecord, Parse.ast produces both ASTs, update Fmt_ast/testsstd_fg/std_parsedthreadingRepl_file,Documentation,of_syntax,any_tSummary
Extended_astthe single entry point for parsing, validation, and AST operations('a, 'b) pairedrecord type, so the standard AST is produced alongside the extended one during parsingParse_with_commentsintoExtended_ast—Documentationbypasses the OCaml pipeline entirely (no hash-bang, w50, or token collection)Extended_ast.equivalentto check AST preservation,dump/dump_normalizedfor debug outputTranslation_unit: removestd_fg/std_parsedthreading, no longer referencesStd_astdirectlyStd_ast: remove unused constructors (Repl_file,Documentation),of_syntax, comment collectionSource->Extended_ast,Normalize_std_ast->Normalize_extended_ast)Extended_ast.equal,equal_core_type,Normalize_extended_ast.equal/ast)Test plan