fix: prevent SSRF bypass via userinfo in URL validation#12213
fix: prevent SSRF bypass via userinfo in URL validation#12213themavik wants to merge 2 commits intomindsdb:mainfrom
Conversation
The `_split_url` function used `urlparse().netloc` to compare URL origins against allow/deny lists. Since `netloc` includes the userinfo component (e.g. `user:pass@host`), an attacker could craft a URL like `http://attacker@127.0.0.1:4444/` that would not match the blocklisted origin `http://127.0.0.1:4444/`, bypassing the SSRF protection. Replace `netloc` with `hostname` + `port` from `urlparse()`, which strips userinfo and only compares the actual host and port. This closes the bypass vector while preserving port-aware origin matching.
|
All contributors have signed the CLA ✍️ ✅ |
mindsdb/utilities/security.py
Outdated
| return parsed_url.scheme.lower(), parsed_url.netloc.lower() | ||
| hostname = parsed_url.hostname or "" | ||
| port = parsed_url.port | ||
| host = f"{hostname}:{port}" if port else hostname |
There was a problem hiding this comment.
Correctness: In _split_url, the concatenation f"{hostname}:{port}" for IPv6 addresses is ambiguous and causes collisions. For example, http://[2001:db8::1]:80 (Host 2001:db8::1, Port 80) and http://[2001:db8::1:80] (Host 2001:db8::1:80, No Port) both resolve to the same string 2001:db8::1:80, causing validate_urls to incorrectly treat different origins as identical. If a port is present, wrap IPv6 hostnames in brackets: host = f"[{hostname}]:{port}" if ":" in hostname else f"{hostname}:{port}".
🤖 AI Agent Prompt for Cursor/Windsurf
📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue
In `mindsdb/utilities/security.py`, update `_split_url` where `host` is constructed (around the line building `host = f"{hostname}:{port}" if port else hostname`). Ensure IPv6 literals are wrapped in brackets when appending the port so hosts match allowed/disallowed lists. Use the suggestion: if `port` and `":" in hostname`, build `f"[{hostname}]:{port}"`, else `f"{hostname}:{port}"`; keep `host = hostname` when no port.
|
I have read the CLA Document and I hereby sign the CLA |
Summary
Fixes #12163
The
_split_urlfunction inmindsdb/utilities/security.pyusedurlparse().netlocto compare URL origins against allow/deny lists. Sincenetlocincludes the userinfo component (e.g.user:pass@host), an attacker could craft a URL like:This would produce
netloc = "attacker@127.0.0.1:4444", which does not match the blocklisted origin"127.0.0.1:4444", effectively bypassing the SSRF protection.Changes
Replaced
parsed_url.netlocwithparsed_url.hostname+parsed_url.portin_split_url(). This:hostnamenever includesuser@oruser:pass@netlocemptiness guard is kept to reject malformed URLsBefore
After
Reproduction
Test Plan
_split_url("http://attacker@127.0.0.1:4444/")returns("http", "127.0.0.1:4444")(matches blocklist)_split_url("https://example.com/path")returns("https", "example.com")(no port)_split_url("http://host:8080/")returns("http", "host:8080")(explicit port preserved)_split_url("invalid")raisesValueError(no scheme/host)