Skip to content

Parse error when parsing text containing '&' followed by non-ASCII characters #17

@ledsun

Description

@ledsun

I found a parsing error in gammo 0.3.0 with a very small repro case.

Environment

  • Ruby: 4.0.1
  • Gammo: 0.3.0

Reproduction

# frozen_string_literal: true

require "gammo"

html = String.new("ヘルスケア&パーソナルケア")
Gammo.new(html).parse

Expected result

I would expect this to be parsed as plain text.

Actual result

/home/ledsun/.rbenv/versions/4.0.1/lib/ruby/gems/4.0.0/gems/gammo-0.3.0/lib/gammo/tokenizer/escape.rb:101:in 'String#[]=': no implicit conversion of nil into String (TypeError)

        data[dst, dst1] = data[src, src1]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        from /home/ledsun/.rbenv/versions/4.0.1/lib/ruby/gems/4.0.0/gems/gammo-0.3.0/lib/gammo/tokenizer/escape.rb:101:in 'Gammo::Tokenizer::Escape#unescape_entity'
        from /home/ledsun/.rbenv/versions/4.0.1/lib/ruby/gems/4.0.0/gems/gammo-0.3.0/lib/gammo/tokenizer/escape.rb:63:in 'block in Gammo::Tokenizer::Escape#unescape'
        from /home/ledsun/.rbenv/versions/4.0.1/lib/ruby/gems/4.0.0/gems/gammo-0.3.0/lib/gammo/tokenizer/escape.rb:61:in 'String#each_byte'
        from /home/ledsun/.rbenv/versions/4.0.1/lib/ruby/gems/4.0.0/gems/gammo-0.3.0/lib/gammo/tokenizer/escape.rb:61:in 'Enumerator#with_index'
        from /home/ledsun/.rbenv/versions/4.0.1/lib/ruby/gems/4.0.0/gems/gammo-0.3.0/lib/gammo/tokenizer/escape.rb:61:in 'Gammo::Tokenizer::Escape#unescape'
        from /home/ledsun/.rbenv/versions/4.0.1/lib/ruby/gems/4.0.0/gems/gammo-0.3.0/lib/gammo/tokenizer/tokens.rb:50:in 'Gammo::Tokenizer::EscapedToken#load_data'
        from /home/ledsun/.rbenv/versions/4.0.1/lib/ruby/gems/4.0.0/gems/gammo-0.3.0/lib/gammo/tokenizer/tokens.rb:40:in 'Gammo::Tokenizer::EscapedToken#initialize'
        from /home/ledsun/.rbenv/versions/4.0.1/lib/ruby/gems/4.0.0/gems/gammo-0.3.0/lib/gammo/tokenizer.rb:354:in 'Gammo::Tokenizer#character_token'
        from /home/ledsun/.rbenv/versions/4.0.1/lib/ruby/gems/4.0.0/gems/gammo-0.3.0/lib/gammo/tokenizer.rb:92:in 'Gammo::Tokenizer#next_token'
        from /home/ledsun/.rbenv/versions/4.0.1/lib/ruby/gems/4.0.0/gems/gammo-0.3.0/lib/gammo/parser.rb:162:in 'Gammo::Parser#parse'
        from scripts/repro_gammo_escape_error.rb:7:in '<main>'```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions