Skip to content

Add bcp47_strict_language_tag validator#1489

Open
bfabio wants to merge 3 commits intogo-playground:masterfrom
bfabio:bcp47_strict_language_tag
Open

Add bcp47_strict_language_tag validator#1489
bfabio wants to merge 3 commits intogo-playground:masterfrom
bfabio:bcp47_strict_language_tag

Conversation

@bfabio
Copy link
Contributor

@bfabio bfabio commented Nov 8, 2025

Add strict BCP47 language tag validator enforcing RFC5646 and rejecting Unicode extensions unlike language.Parse().

Fixes Or Enhances

Fix #1221.

Make sure that you've checked the boxes below before you submit PR:

  • Tests exist or have been written that cover this particular change.

@go-playground/validator-maintainers

@bfabio bfabio requested a review from a team as a code owner November 8, 2025 11:02
@coveralls
Copy link

Coverage Status

coverage: 73.815% (+0.09%) from 73.73%
when pulling ba8f1be on bfabio:bcp47_strict_language_tag
into 6a38036 on go-playground:master.

Add strict BCP47 language tag validator enforcing RFC5646 and
rejecting Unicode extensions unlike language.Parse().

Fix go-playground#1221.
@bfabio bfabio force-pushed the bcp47_strict_language_tag branch from 505d53f to 8a96018 Compare March 18, 2026 06:51
@bfabio
Copy link
Contributor Author

bfabio commented Mar 18, 2026

@nodivbyzero rebased

bfabio added a commit to italia/publiccode-parser-go that referenced this pull request Mar 18, 2026
The `go-playground/validator` bcp47_language_tag validator accepted tags
that are not valid BCP47: go-playground/validator#1221.

Use the strict validator from go-playground/validator#1489 locally until
it lands upstream.

Fix #47.
bfabio added a commit to italia/publiccode-parser-go that referenced this pull request Mar 18, 2026
go-playground/validator's bcp47_language_tag accepted 3-letter ISO 639-2
codes (ita, fra, deu) and POSIX underscore tags (en_GB, hr_HR), which are
not valid BCP47.

validators/bcp47.go is a local copy of go-playground/validator#1489 (not
yet merged) and should be removed once that lands.

Fix #47.
bfabio added a commit to italia/publiccode-parser-go that referenced this pull request Mar 18, 2026
go-playground/validator's bcp47_language_tag accepted 3-letter ISO 639-2
codes (ita, fra, deu) and POSIX underscore tags (en_GB, hr_HR), which are
not valid BCP47.

validators/bcp47.go is a local copy of go-playground/validator#1489 (not
yet merged) and should be removed once that lands.

Fix #47.
baked_in.go Outdated
field := fl.Field()

if field.Kind() == reflect.String {
var languageTagRe = regexp.MustCompile(strings.Join([]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex compiled every call. That means compiling a huge regex every invocation, which is very expensive.
It's better to move it to a package-level var.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, fixed

@nodivbyzero
Copy link
Contributor

@bfabio Thanks for contributing!

Feels like isBCP47StrictLanguageTag still needs a bit more refactoring:

  1. Move regex to global
  2. Reduce ToLower/ToUpper
  3. Early rejection

Something like that:

var languageTagRe = regexp.MustCompile(strings.Join([]string{ ... }, ""))

func isBCP47StrictLanguageTag(fl FieldLevel) bool {
	field := fl.Field()

	if field.Kind() != reflect.String {
		panic(fmt.Sprintf("Bad field type %s", field.Type()))
	}

	raw := field.String()
	upper := strings.ToUpper(raw)
	lower := strings.ToLower(raw)
	lowerTagDash := lower + "-"

	m := languageTagRe.FindStringSubmatch(upper)
	if m == nil {
		return false
	}

	// reuse lower/upper everywhere instead of recomputing
	...
}

Also, is it possible to completely replace regex?

tag, err := language.Parse(languageTag)

and then validate against IANA constraints manually.

@bfabio
Copy link
Contributor Author

bfabio commented Mar 20, 2026

Thanks for the review

Feels like isBCP47StrictLanguageTag still needs a bit more refactoring:

1. Move regex to global

Done

2. Reduce ToLower/ToUpper

I refactored it a little bit more so they're not necessary. I don't even know why I did upcase it, but it was a while ago 🤷

3. Early rejection

Done.

Also, is it possible to completely replace regex?

tag, err := language.Parse(languageTag)

and then validate against IANA constraints manually.

I'm not sure what you mean here, a custom parser?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bcp47_language_tag doesn't fail on some non-BCP47 tags

3 participants