Skip to content

Add ocamllex (.mll) file formatting support#2792

Draft
hhugo wants to merge 4 commits intosimplify-extended-std-astfrom
mll-support
Draft

Add ocamllex (.mll) file formatting support#2792
hhugo wants to merge 4 commits intosimplify-extended-std-astfrom
mll-support

Conversation

@hhugo
Copy link
Copy Markdown
Collaborator

@hhugo hhugo commented Mar 26, 2026

This PR is based on top of #2791

Add support for formatting ocamllex files with an "*.mll" extension.

This PR introduces a new config reformat-mll with 3 possible values:

  • no, to keep the previous behavior and not reformat mll files
  • ocaml-block, that only reformat ocaml code block and leave the rest untouched
  • full, to reformat both the ocamllex syntax and the ocaml codeblock

The default value has been set to ocaml-block, because ocaml formatting has been well tested.
In the future, we could decide to reformat the entire file by default.

Similar behavior could be implemented in a future PR for ocamlyacc, menhir support (.mly)

I've extracted all mll files found in opam-repository package to test the lexer and parser (both have been heavily inspired by the upstream implementation)

Summary

  • Add a vendored mll-parser library to parse .mll (ocamllex) files into an AST
  • Introduce Fmt_mll module for formatting the mll-specific constructs (rules, definitions, character sets, regexps)
  • Add Fmt_inplace module for in-place formatting of embedded OCaml code blocks within .mll files (actions, headers, trailers)
  • Integrate .mll support into the existing formatting pipeline (Translation_unit, Extended_ast, Conf, Syntax)
  • Add CLI tests and a basic passing test for .mll formatting
  • Fix mll formatting bugs found by testing against 1774 opam .mll files

Test plan

  • dune runtest passes
  • Format a sample .mll file with ocamlformat --impl foo.mll and verify output
  • Review test/cli/mll.t cram tests
  • Review test/passing/tests/basic.mll

🤖 Generated with Claude Code

hhugo and others added 4 commits March 26, 2026 16:22
Add support for formatting .mll (ocamllex) files with three modes
controlled by the --reformat-mll option:

- ocaml-block (default): format only embedded OCaml code blocks,
  preserving surrounding ocamllex syntax and comments verbatim
- full: reformat the entire file structure with comment preservation
  using the Cmts infrastructure
- no: disable formatting

Parser (vendor/mll-parser/):
- mll_grammar.mly: Adapted from upstream ocamllex lex/parser.mly
  with identical precedence declarations
- mll_lexer.mll: Adapted from upstream lex/lexer.mll with
  stack-based action parsing and proper char/string/comment handling
- 100% on 1774 valid .mll files from opam, validated by ocamllex

Formatter (lib/Fmt_mll.ml):
- Full formatter with Cmts.fmt_before/fmt_after at all AST locations
- In-place formatter via Fmt_inplace module for ocaml-block mode

Infrastructure:
- Parse_with_comments: split into parse_ocaml (OCaml-specific) and
  parse_non_ocaml (for mll/other ASTs without OCaml lexer dependency)
- Extended_ast: map returns updated AST, Printast implemented
- mll_ast_equal: semantic comparison using OCaml AST normalization
  for code blocks (not string equality)
- Fmt_inplace module for source-level OCaml block replacement

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comments like (**) produce empty text after delimiter stripping.
Instead of asserting, emit just the delimiters (pro $ epi).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix string escape handling in mll lexer: store raw source text for
  string literals instead of unescaping, so "\r\n" is preserved
- Fix comment placement between rule/and keywords and entry names:
  remove entry_keyword from the AST so comments attach to entry_name
  instead, matching how OCaml handles comments between let and bindings
- Drain comments after rule entry args so comments after "parse" on
  lines with arguments are not dropped
- Format headers and trailers as Structure (not Use_file) via a new
  fmt_code_structure formatter threaded through Fmt_mll and Fmt_inplace
- Fix Cmts.Wrapped.fmt: filter out empty groups produced when
  whitespace-only lines are removed, preventing unbounded blank line
  expansion with --wrap-comments
- Use Location.init_info instead of Location.init in mll_parser to
  avoid uncommenting a vendored declaration
- Remove entry_keyword field from mll AST (no longer needed)
- Simplify equal_mll: use normalize_code for code block comparison
  which checks comments inside code blocks
- Remove format_inplace_mll: Ocaml_block mode now goes through the
  normal format loop via fmt_file

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hhugo hhugo marked this pull request as draft March 27, 2026 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant