Skip to content

Comments

fix: prevent SSRF bypass via userinfo in URL validation#12213

Open
themavik wants to merge 2 commits intomindsdb:mainfrom
themavik:fix/12163-ssrf-bypass-url-validation
Open

fix: prevent SSRF bypass via userinfo in URL validation#12213
themavik wants to merge 2 commits intomindsdb:mainfrom
themavik:fix/12163-ssrf-bypass-url-validation

Conversation

@themavik
Copy link

@themavik themavik commented Feb 10, 2026

Summary

Fixes #12163

The _split_url function in mindsdb/utilities/security.py used urlparse().netloc to compare URL origins against allow/deny lists. Since netloc includes the userinfo component (e.g. user:pass@host), an attacker could craft a URL like:

http://attacker@127.0.0.1:4444/

This would produce netloc = "attacker@127.0.0.1:4444", which does not match the blocklisted origin "127.0.0.1:4444", effectively bypassing the SSRF protection.

Changes

Replaced parsed_url.netloc with parsed_url.hostname + parsed_url.port in _split_url(). This:

  • Strips userinfohostname never includes user@ or user:pass@
  • Preserves port-aware matching — origins with explicit ports still compare correctly
  • Maintains the existing validation check — the netloc emptiness guard is kept to reject malformed URLs

Before

return parsed_url.scheme.lower(), parsed_url.netloc.lower()

After

hostname = parsed_url.hostname or ""
port = parsed_url.port
host = f"{hostname}:{port}" if port else hostname
return parsed_url.scheme.lower(), host.lower()

Reproduction

from urllib.parse import urlparse

url = "http://attacker@127.0.0.1:4444/"
parsed = urlparse(url)

# Before fix: netloc includes userinfo, bypass succeeds
print(parsed.netloc)    # "attacker@127.0.0.1:4444"

# After fix: hostname strips userinfo, bypass blocked
print(parsed.hostname)  # "127.0.0.1"
print(parsed.port)      # 4444

Test Plan

  • Verify _split_url("http://attacker@127.0.0.1:4444/") returns ("http", "127.0.0.1:4444") (matches blocklist)
  • Verify _split_url("https://example.com/path") returns ("https", "example.com") (no port)
  • Verify _split_url("http://host:8080/") returns ("http", "host:8080") (explicit port preserved)
  • Verify _split_url("invalid") raises ValueError (no scheme/host)

The `_split_url` function used `urlparse().netloc` to compare URL
origins against allow/deny lists.  Since `netloc` includes the
userinfo component (e.g. `user:pass@host`), an attacker could craft
a URL like `http://attacker@127.0.0.1:4444/` that would not match
the blocklisted origin `http://127.0.0.1:4444/`, bypassing the SSRF
protection.

Replace `netloc` with `hostname` + `port` from `urlparse()`, which
strips userinfo and only compares the actual host and port.  This
closes the bypass vector while preserving port-aware origin matching.
@github-actions
Copy link

github-actions bot commented Feb 10, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

return parsed_url.scheme.lower(), parsed_url.netloc.lower()
hostname = parsed_url.hostname or ""
port = parsed_url.port
host = f"{hostname}:{port}" if port else hostname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: In _split_url, the concatenation f"{hostname}:{port}" for IPv6 addresses is ambiguous and causes collisions. For example, http://[2001:db8::1]:80 (Host 2001:db8::1, Port 80) and http://[2001:db8::1:80] (Host 2001:db8::1:80, No Port) both resolve to the same string 2001:db8::1:80, causing validate_urls to incorrectly treat different origins as identical. If a port is present, wrap IPv6 hostnames in brackets: host = f"[{hostname}]:{port}" if ":" in hostname else f"{hostname}:{port}".

🤖 AI Agent Prompt for Cursor/Windsurf

📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue

In `mindsdb/utilities/security.py`, update `_split_url` where `host` is constructed (around the line building `host = f"{hostname}:{port}" if port else hostname`). Ensure IPv6 literals are wrapped in brackets when appending the port so hosts match allowed/disallowed lists. Use the suggestion: if `port` and `":" in hostname`, build `f"[{hostname}]:{port}"`, else `f"{hostname}:{port}"`; keep `host = hostname` when no port.

@themavik
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: SSRF Bypass via Improper URL Validation in MindsDB Editor

1 participant