fix(core): use byte offsets for position reporting in raw-scoped script rules#1081
Merged
jdkato merged 1 commit intovale-cli:v3from Mar 4, 2026
Merged
Conversation
…pt rules Script rules with `scope: raw` return begin/end byte offsets in their match arrays, but AddAlert ignores these and performs a text search via FindLoc/initialPosition to determine the alert position. When the matched text appears multiple times in the document, this always reports the position of the first occurrence rather than the intended one. Add locFromByteOffset() to compute line:column directly from the byte offsets the script provides, bypassing the text-search path. The new path activates when the alert carries valid byte offsets within a raw-scope block, falling back to the existing FindLoc path otherwise. Relates to vale-cli#869, vale-cli#272. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2287cbb to
5fa2949
Compare
1 task
Member
|
Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1083
I found myself attempting to fix a raw "entire document" scope rule with a Tengo script. After working through the rule with Claude code, the alert's line and column values were not always correct. Claude discovered there was a bug in the upstream project. Here's the proposed fix.
Summary
When a Tengo script rule uses
scope: raw, Vale does not use thebeginandendbyte offsets returned by the script to calculate the alert's line and column. Instead, Vale extracts the matched text (scope[begin:end]) and performs a text search in the parsed document to determine the position. If the matched text appears multiple times in the document, Vale always reports the position of the first occurrence, regardless of which occurrence the script intended to flag.Vale version
v3.12.0 (still reproduces on v3.13.1)
Steps to reproduce
1. Create a script rule
styles/Example/FindTODO.yml:styles/config/scripts/find-todo.tengo:2. Create a test document
test.md(the word "TODO" appears on lines 1 and 3):# TODO list for the project This paragraph has a TODO that should be flagged by the script rule.In this document:
TODOfirst appears at byte offset 2 (in the heading, line 1)TODOnext appears at byte offset 54 (in the body text, line 3)3. Run Vale
Expected behavior
The alert should be reported at line 3 (where byte offset 54 falls), since the script returned
begin: 54, end: 58.Actual behavior
The alert is reported at line 1, column 3 — the position of the first occurrence of "TODO" in the document, not the occurrence at byte offset 54.
Additional observations
begin: 0, end: 1(matching just the first byte of the file) still reports the alert at1:3— confirming that byte offsets are completely ignored and Vale is searching for the extracted text.1:3and3:25— because Vale finds the first and second occurrences in order.Root cause
The bug flows through three files:
1.
internal/check/script.go— Script execution (correct)The
Runmethod correctly extractsbegin/endfrom the Tengo script output and setsa.Spananda.Matchcorrectly.2.
internal/core/file.go—AddAlert()discards byte offsetsFor
scope: raw,lintBlockis called withlookup=true. InsideAddAlert, becauselookup=true, the code always falls through toFindLoc(), ignoring the byte offsets the script provided.There is also a disambiguation attempt capped at 1000 characters:
For
scope: raw,ctxis the entire document, so any file over 1000 characters skips this disambiguation entirely.3.
internal/core/location.go—initialPosition()searches for textFindLocdelegates toinitialPosition(), which builds a regex froma.Match, finds all occurrences, and always returns the first one — never consultinga.Spanbyte offsets.Fix
This PR adds a
HasByteOffsetsflag to theAlertstruct and alocFromByteOffset()helper to compute line:column directly from byte offsets.The approach:
internal/core/alert.go— A newHasByteOffsets boolfield onAlertsignals thatSpancontains byte offsets into the raw document (not column ranges).internal/check/script.go— The script runner'sRunmethod setsHasByteOffsets: truewhen building alerts from Tengo script matches, since scripts always return byte offsets.internal/core/file.go—AddAlertchecksa.HasByteOffsetsto decide whether to uselocFromByteOffset()(direct byte-offset-to-line:column conversion) or the existingFindLoctext-search path.This is more precise than checking
blk.Context == blk.Textto detect raw-scope blocks, which would also match non-script alerts (code comment blocks, plain text blocks) whereSpancontains column ranges rather than byte offsets.The test expectation for
Scripts.CustomMsginchecks.featureis updated from4:19to1:2— the old value was a side effect of the buggy text-search finding the matched text at the wrong position.Related issues