Skip to content

net.html: unmatched closing tag breaks dom parsingΒ #26619

@erkode

Description

@erkode

Describe the bug

When parsing invalid HTML with an extra closing tag and no matching opening tag (for example, </a>), the open-tag stack context gets corrupted.

Reproduction Steps

Run:

import net.html

fn main() {
	content := '<!doctype html>
<html>
<body>
  <div>
    <a href="#">x</a></a>
  </div>
  <article class="news-post">hello</article>
</body>
</html>'
	mut doc := html.parse(content)

	// this works because it uses the global index, not the local stack that gets corrupted
	by_attr := doc.get_tags_by_attribute_value('class', 'news-post') 
	println('by_attr: ${by_attr.len}')

	// this does not work
	by_class := doc.get_tags_by_class_name('news-post')
	println('by_class: ${by_class.len}')
}

Expected Behavior

In my opinion, invalid unmatched closing tags should be safely ignored.

Current Behavior

Invalid unmatched closing tags break dom parsing.

Possible Solution

Codex suggestion:

In https://github.com/vlang/v/blob/master/vlib/net/html/dom.v#L125, preserve the stack size before the pop loop; if no matching opener is found, restore it and continue (ignore the unmatched closing tag).

 if is_close_tag(tag) {
 	temp_int = stack.peek()
 	temp_string = tag.name[1..]
+	old_stack_size := stack.size

 	for !is_null(temp_int) && temp_string != tag_list[temp_int].name
 		&& !tag_list[temp_int].closed {
 		dom.print_debug(temp_string + ' >> ' + tag_list[temp_int].name + ' ' +
 			(temp_string == tag_list[temp_int].name).str())
 		stack.pop()
 		temp_int = stack.peek()
 	}
+
+	if is_null(temp_int) || temp_string != tag_list[temp_int].name {
+		stack.size = old_stack_size
+		continue
+	}

 	temp_int = stack.peek()
 	temp_int = if !is_null(temp_int) { stack.pop() } else { root_index }
 	if is_null(temp_int) {
 		stack.push(root_index)
 	}
 	dom.print_debug('Removed ' + temp_string + ' -- ' + tag_list[temp_int].name)
 }

Additional Information/Context

No response

V version

V 0.5.0

Environment details (OS name and version, etc.)

V full version V 0.5.0 0d00c76.5e0489f
OS linux, Linux version 6.6.87.2-microsoft-standard-WSL2 (root@439a258ad544) (gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37) #1 SMP PREEMPT_DYNAMIC Thu Jun 5 18:30:46 UTC 2025 (WSL 2)
Processor 6 cpus, 64bit, little endian, AMD Ryzen 5 3500 6-Core Processor
Memory 5.34GB/7.72GB
V executable /root/v/v
V last modified time 2026-02-17 00:36:34
V home dir OK, value: /root/v
VMODULES OK, value: /root/.vmodules
VTMP OK, value: /tmp/v_0
Current working dir OK, value: /root/workspace/testando
Git version git version 2.53.0
V git status weekly.2026.07-76-g5e0489fe-dirty
.git/config present true
cc version cc (GCC) 15.2.1 20260209
gcc version gcc (GCC) 15.2.1 20260209
clang version clang version 21.1.8
tcc version tcc version 0.9.28rc 2025-02-13 HEAD@f8bd136d (x86_64 Linux)
tcc git status thirdparty-linux-amd64 696c1d84
emcc version N/A
glibc version ldd (GNU libc) 2.43

Note

You can use the πŸ‘ reaction to increase the issue's priority for developers.

Please note that only the πŸ‘ reaction to the issue itself counts as a vote.
Other reactions and those to comments will not be taken into account.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugThis tag is applied to issues which reports bugs.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions