Skip to content

feat(yaml_parser): lex YAML properties#9471

Open
l0ngvh wants to merge 1 commit intobiomejs:mainfrom
l0ngvh:yaml-lex-property
Open

feat(yaml_parser): lex YAML properties#9471
l0ngvh wants to merge 1 commit intobiomejs:mainfrom
l0ngvh:yaml-lex-property

Conversation

@l0ngvh
Copy link
Contributor

@l0ngvh l0ngvh commented Mar 13, 2026

Summary

Follow up of #9220. With this PR, YAML lexer can now lex yaml properties like

&anchor !!str

Having the parser consuming these tokens will be the work of a different PR

Test Plan

Added new YAML snippets containing properties to the lexer test suite

Docs

N/A

@changeset-bot
Copy link

changeset-bot bot commented Mar 13, 2026

⚠️ No Changeset found

Latest commit: f3debf9

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser A-Formatter Area: formatter A-Tooling Area: internal tools L-Yaml Language: Yaml labels Mar 13, 2026
@l0ngvh l0ngvh force-pushed the yaml-lex-property branch from 3c6b1fd to f3debf9 Compare March 13, 2026 09:29
@github-actions github-actions bot removed A-Formatter Area: formatter A-Tooling Area: internal tools labels Mar 13, 2026
@l0ngvh l0ngvh changed the title feat(yaml): Lex YAML properties feat(yaml_parser): lex YAML properties Mar 13, 2026
@l0ngvh l0ngvh marked this pull request as ready for review March 13, 2026 15:21
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

Walkthrough

The YAML lexer now supports block properties—anchors and tags—by introducing a new path to consume and lex properties starting with '!' or '&'. The mapping-start disambiguation logic has been refactored: consume_potential_mapping_start now accepts accumulated properties and a start coordinate, whilst consume_mapping_key has been renamed to consume_explicit_mapping_key. New helper functions handle anchor and tag property lexing, and corresponding tokens (ANCHOR_PROPERTY_LITERAL, TAG_PROPERTY_LITERAL) have been introduced. A comprehensive test suite validates property handling across various YAML contexts.

Possibly related PRs

Suggested reviewers

  • ematipico
  • dyc3
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(yaml_parser): lex YAML properties' directly and clearly summarises the main change—adding lexing support for YAML properties like anchors and tags.
Description check ✅ Passed The description is relevant and provides context: it explains this is a follow-up PR, references the PR it builds upon, describes what properties are now lexed, and clarifies that parser handling is future work.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use TruffleHog to scan for secrets in your code with verification capabilities.

Add a TruffleHog config file (e.g. trufflehog-config.yml, trufflehog.yml) to your project to customize detectors and scanning behavior. The tool runs only when a config file is present.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/biome_yaml_parser/src/lexer/mod.rs (1)

1131-1141: Minor note: is_tag_char is more permissive than the YAML spec.

The spec requires ns-uri-char (URI-safe characters) for tags, but this implementation accepts any non-blank character excluding ! and flow indicators. This is fine for lenient parsing—strict validation can be handled at the parser or analyzer level if needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_yaml_parser/src/lexer/mod.rs` around lines 1131 - 1141,
is_tag_char currently allows any non-blank char except '!' and flow indicators
which is more permissive than the YAML spec's ns-uri-char; either restrict
is_tag_char to only accept URI-safe characters per the spec or explicitly
document/flag the lenient behavior. Modify the is_tag_char implementation (and
keep is_anchor_char unchanged) to validate against the ns-uri-char character set
(or call a new helper like is_ns_uri_char(c)), or add a clear comment/TODO above
fn is_tag_char stating that tag validation is intentionally lenient and strict
URI validation is handled later.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/biome_yaml_parser/src/lexer/mod.rs`:
- Around line 1131-1141: is_tag_char currently allows any non-blank char except
'!' and flow indicators which is more permissive than the YAML spec's
ns-uri-char; either restrict is_tag_char to only accept URI-safe characters per
the spec or explicitly document/flag the lenient behavior. Modify the
is_tag_char implementation (and keep is_anchor_char unchanged) to validate
against the ns-uri-char character set (or call a new helper like
is_ns_uri_char(c)), or add a clear comment/TODO above fn is_tag_char stating
that tag validation is intentionally lenient and strict URI validation is
handled later.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3aaf1b66-38bb-4528-b220-1c9c683b5df6

📥 Commits

Reviewing files that changed from the base of the PR and between 058f7b5 and f3debf9.

📒 Files selected for processing (3)
  • crates/biome_yaml_parser/src/lexer/mod.rs
  • crates/biome_yaml_parser/src/lexer/tests/mod.rs
  • crates/biome_yaml_parser/src/lexer/tests/property.rs

@l0ngvh l0ngvh requested review from a team March 13, 2026 17:10
Copy link
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code seems fine , but I would like to see some parser tests too. Aren't they applicable?

Read all the description! Apologies

/// For example, `: abc` is a valid yaml mapping, which is equivalent
/// to `{null: abc}`
fn consume_mapping_key(&mut self, current: u8) -> LinkedList<LexToken> {
fn consume_explicit_mapping_key(&mut self, current: u8) -> LinkedList<LexToken> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to keep the docstring. Let's update it

@@ -125,21 +125,32 @@ impl<'src> YamlLexer<'src> {
}

/// Consume and disambiguate a YAML value to determine whether it's a YAML block map or just a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic changed, does the doc still apply?

LexToken::new(COMMENT, start, self.current_coordinate)
}

fn consume_block_properties(&mut self) -> LinkedList<LexToken> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, I noticed we don't use #[inline] throughout the parser.

Let's keep it as is, but maybe we could set up some benchmarks, and eventually see if the derive can help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Parser Area: parser L-Yaml Language: Yaml

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants