fix: prevent SSRF bypass via userinfo in URL validation by themavik · Pull Request #12213 · mindsdb/mindsdb

themavik · 2026-02-10T15:55:40Z

Summary

The _split_url function in mindsdb/utilities/security.py used urlparse().netloc to compare URL origins against allow/deny lists. Since netloc includes the userinfo component (e.g. user:pass@host), an attacker could craft a URL like:

http://attacker@127.0.0.1:4444/

This would produce netloc = "attacker@127.0.0.1:4444", which does not match the blocklisted origin "127.0.0.1:4444", effectively bypassing the SSRF protection.

Changes

Replaced parsed_url.netloc with parsed_url.hostname + parsed_url.port in _split_url(). This:

Strips userinfo — hostname never includes user@ or user:pass@
Preserves port-aware matching — origins with explicit ports still compare correctly
Maintains the existing validation check — the netloc emptiness guard is kept to reject malformed URLs

Before

return parsed_url.scheme.lower(), parsed_url.netloc.lower()

After

hostname = parsed_url.hostname or ""
port = parsed_url.port
host = f"{hostname}:{port}" if port else hostname
return parsed_url.scheme.lower(), host.lower()

Reproduction

from urllib.parse import urlparse

url = "http://attacker@127.0.0.1:4444/"
parsed = urlparse(url)

# Before fix: netloc includes userinfo, bypass succeeds
print(parsed.netloc)    # "attacker@127.0.0.1:4444"

# After fix: hostname strips userinfo, bypass blocked
print(parsed.hostname)  # "127.0.0.1"
print(parsed.port)      # 4444

Test Plan

Verify _split_url("http://attacker@127.0.0.1:4444/") returns ("http", "127.0.0.1:4444") (matches blocklist)
Verify _split_url("https://example.com/path") returns ("https", "example.com") (no port)
Verify _split_url("http://host:8080/") returns ("http", "host:8080") (explicit port preserved)
Verify _split_url("invalid") raises ValueError (no scheme/host)

The `_split_url` function used `urlparse().netloc` to compare URL origins against allow/deny lists. Since `netloc` includes the userinfo component (e.g. `user:pass@host`), an attacker could craft a URL like `http://attacker@127.0.0.1:4444/` that would not match the blocklisted origin `http://127.0.0.1:4444/`, bypassing the SSRF protection. Replace `netloc` with `hostname` + `port` from `urlparse()`, which strips userinfo and only compares the actual host and port. This closes the bypass vector while preserving port-aware origin matching.

github-actions · 2026-02-10T15:55:52Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

entelligence-ai-pr-reviews · 2026-02-10T16:00:37Z

mindsdb/utilities/security.py

-    return parsed_url.scheme.lower(), parsed_url.netloc.lower()
+    hostname = parsed_url.hostname or ""
+    port = parsed_url.port
+    host = f"{hostname}:{port}" if port else hostname


Correctness: In _split_url, the concatenation f"{hostname}:{port}" for IPv6 addresses is ambiguous and causes collisions. For example, http://[2001:db8::1]:80 (Host 2001:db8::1, Port 80) and http://[2001:db8::1:80] (Host 2001:db8::1:80, No Port) both resolve to the same string 2001:db8::1:80, causing validate_urls to incorrectly treat different origins as identical. If a port is present, wrap IPv6 hostnames in brackets: host = f"[{hostname}]:{port}" if ":" in hostname else f"{hostname}:{port}".

🤖 AI Agent Prompt for Cursor/Windsurf

📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue

In `mindsdb/utilities/security.py`, update `_split_url` where `host` is constructed (around the line building `host = f"{hostname}:{port}" if port else hostname`). Ensure IPv6 literals are wrapped in brackets when appending the port so hosts match allowed/disallowed lists. Use the suggestion: if `port` and `":" in hostname`, build `f"[{hostname}]:{port}"`, else `f"{hostname}:{port}"`; keep `host = hostname` when no port.

themavik · 2026-02-20T17:50:39Z

I have read the CLA Document and I hereby sign the CLA

entelligence-ai-pr-reviews bot reviewed Feb 10, 2026

View reviewed changes

github-actions bot added a commit that referenced this pull request Feb 20, 2026

@themavik has signed the CLA in #12213

5bfedd9

fix: wrap IPv6 hostname in brackets when appending port

1bfa96a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix: prevent SSRF bypass via userinfo in URL validation#12213

fix: prevent SSRF bypass via userinfo in URL validation#12213
themavik wants to merge 2 commits intomindsdb:mainfrom
themavik:fix/12163-ssrf-bypass-url-validation

themavik commented Feb 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

entelligence-ai-pr-reviews bot Feb 10, 2026

Uh oh!

themavik commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

themavik commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Before

After

Reproduction

Test Plan

Uh oh!

github-actions bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

entelligence-ai-pr-reviews bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

themavik commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

themavik commented Feb 10, 2026 •

edited

Loading

github-actions bot commented Feb 10, 2026 •

edited

Loading