Add ocamllex (.mll) file formatting support#2792
Draft
hhugo wants to merge 4 commits intosimplify-extended-std-astfrom
Draft
Add ocamllex (.mll) file formatting support#2792hhugo wants to merge 4 commits intosimplify-extended-std-astfrom
hhugo wants to merge 4 commits intosimplify-extended-std-astfrom
Conversation
Add support for formatting .mll (ocamllex) files with three modes controlled by the --reformat-mll option: - ocaml-block (default): format only embedded OCaml code blocks, preserving surrounding ocamllex syntax and comments verbatim - full: reformat the entire file structure with comment preservation using the Cmts infrastructure - no: disable formatting Parser (vendor/mll-parser/): - mll_grammar.mly: Adapted from upstream ocamllex lex/parser.mly with identical precedence declarations - mll_lexer.mll: Adapted from upstream lex/lexer.mll with stack-based action parsing and proper char/string/comment handling - 100% on 1774 valid .mll files from opam, validated by ocamllex Formatter (lib/Fmt_mll.ml): - Full formatter with Cmts.fmt_before/fmt_after at all AST locations - In-place formatter via Fmt_inplace module for ocaml-block mode Infrastructure: - Parse_with_comments: split into parse_ocaml (OCaml-specific) and parse_non_ocaml (for mll/other ASTs without OCaml lexer dependency) - Extended_ast: map returns updated AST, Printast implemented - mll_ast_equal: semantic comparison using OCaml AST normalization for code blocks (not string equality) - Fmt_inplace module for source-level OCaml block replacement Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comments like (**) produce empty text after delimiter stripping. Instead of asserting, emit just the delimiters (pro $ epi). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix string escape handling in mll lexer: store raw source text for string literals instead of unescaping, so "\r\n" is preserved - Fix comment placement between rule/and keywords and entry names: remove entry_keyword from the AST so comments attach to entry_name instead, matching how OCaml handles comments between let and bindings - Drain comments after rule entry args so comments after "parse" on lines with arguments are not dropped - Format headers and trailers as Structure (not Use_file) via a new fmt_code_structure formatter threaded through Fmt_mll and Fmt_inplace - Fix Cmts.Wrapped.fmt: filter out empty groups produced when whitespace-only lines are removed, preventing unbounded blank line expansion with --wrap-comments - Use Location.init_info instead of Location.init in mll_parser to avoid uncommenting a vendored declaration - Remove entry_keyword field from mll AST (no longer needed) - Simplify equal_mll: use normalize_code for code block comparison which checks comments inside code blocks - Remove format_inplace_mll: Ocaml_block mode now goes through the normal format loop via fmt_file Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is based on top of #2791
Add support for formatting ocamllex files with an "*.mll" extension.
This PR introduces a new config
reformat-mllwith 3 possible values:no, to keep the previous behavior and not reformat mll filesocaml-block, that only reformat ocaml code block and leave the rest untouchedfull, to reformat both the ocamllex syntax and the ocaml codeblockThe default value has been set to ocaml-block, because ocaml formatting has been well tested.
In the future, we could decide to reformat the entire file by default.
Similar behavior could be implemented in a future PR for ocamlyacc, menhir support (
.mly)I've extracted all mll files found in opam-repository package to test the lexer and parser (both have been heavily inspired by the upstream implementation)
Summary
mll-parserlibrary to parse.mll(ocamllex) files into an ASTFmt_mllmodule for formatting the mll-specific constructs (rules, definitions, character sets, regexps)Fmt_inplacemodule for in-place formatting of embedded OCaml code blocks within.mllfiles (actions, headers, trailers).mllsupport into the existing formatting pipeline (Translation_unit,Extended_ast,Conf,Syntax).mllformatting.mllfilesTest plan
dune runtestpasses.mllfile withocamlformat --impl foo.mlland verify outputtest/cli/mll.tcram teststest/passing/tests/basic.mll🤖 Generated with Claude Code