fix(ssl): enhance SSL context cache for mTLS+rotation and bypass HTTP#3758
Merged
crivetimihai merged 12 commits intomainfrom Mar 23, 2026
Merged
fix(ssl): enhance SSL context cache for mTLS+rotation and bypass HTTP#3758crivetimihai merged 12 commits intomainfrom
crivetimihai merged 12 commits intomainfrom
Conversation
This was
linked to
issues
Mar 20, 2026
Member
|
Thanks @MohanLaksh. Addresses real performance issue (#3629, 49% CPU overhead from unnecessary SSL context creation) and mTLS cache correctness (#3579). Well-documented with clear problem/solution breakdown. |
713f2c4 to
807bfb7
Compare
- ssl_context_cache: support client_cert/client_key caching key + deterministic hash - ssl_context_cache: optional TTL and size limit via SSL_CONTEXT_CACHE_TTL/MAX_SIZE - ssl_context_cache: load_cert_chain for mTLS, clear timestamps on cache clear - tool_service: skip SSL context for http:// gateway URLs - tool_service: pass client_cert/client_key to get_cached_ssl_context - db: add Gateway client_cert/client_key columns (and new alembic migration) - schemas: add Gateway create/update/read fields for mTLS cert/key - test updates: new coverage for mTLS context loading and cache key divergence Closes #3629, #3579 Signed-off-by: Mohan Lakshmaiah <mohan.economist@gmail.com>
- Enhanced _make_gateway_payload() test helper to accept url, client_cert, and client_key parameters - Fixed SIGHUP test import path to use correct module location - All 18 tests now passing (12 SSL cache + 4 SIGHUP + 2 tool service) Closes #3629 Closes #3579 Signed-off-by: Mohan Lakshmaiah <mohan.economist@gmail.com>
- Added test for SSL_CONTEXT_CACHE_TTL ValueError validation - Added test for _is_expired with None timestamp - Added tests for _sighup_reload exception handling - Added tests for _sighup_handler logging and RuntimeError cases - Added pragma comments for lifespan-scoped functions (tested via simulation) - SSL context cache coverage: 98% - All 20 tests passing Related to #3629 and #3579 Signed-off-by: Mohan Lakshmaiah <mohan.economist@gmail.com>
- Fix mTLS data missing from tool cache: add client_cert/client_key to _build_tool_cache_payload gateway dict so mTLS works via cache lookups - Security: mask client_key (private key) in GatewayRead.masked() API responses, consistent with auth_value/oauth_config masking - Fix db.py type annotation: Gateway.client_cert/client_key columns use Text (str), not bytes — corrected Mapped[Optional[bytes]] to Mapped[Optional[str]] - Fix test_is_expired_returns_false_when_no_timestamp: enable TTL in test so it actually exercises the created_at-is-None branch (line 43) - Remove redundant mcpgateway/.gitignore (already covered by root) - Add tests for cache payload mTLS fields and client_key masking Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
…nfig - Extract _sighup_reload/_sighup_handler from lifespan closure into mcpgateway/handlers/signal_handlers.py for direct testability (100% coverage). Rewrite tests to call real code instead of simulating. - Encrypt client_key (TLS private key) before DB storage using EncryptionService, consistent with oauth_config secret handling. Decrypt at runtime in tool_service before passing to SSL context. - Wire up client_cert/client_key persistence in register_gateway and update_gateway — fields existed in DB model and schema but were never stored, making the mTLS feature non-functional end-to-end. - Remove unused ssl_context_cache_max_size/ssl_context_cache_ttl from config.py (dead code — ssl_context_cache.py reads env vars directly). - Add tests: _encrypt_client_key unit tests, register_gateway encryption integration test, encrypted client_key decryption in tool invocation path. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
- ssl.SSLContext.load_cert_chain() only accepts file paths, not PEM strings. When client_cert/client_key contain inline PEM content (stored in DB), write to secure temp files, load, then immediately clean up. Detect PEM via "-----BEGIN " prefix; file paths pass through directly. - Validate that both client_cert and client_key are provided or neither. Previously, providing only one silently skipped mTLS setup, turning a config error into a confusing auth failure. - Add tests: PEM temp file handling, mixed path/PEM combinations, OSError cleanup resilience, validation error for mismatched pairs. ssl_context_cache.py remains at 100% coverage. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
…A rotation - Thread client_cert/client_key through _initialize_gateway, connect_to_sse_server, connect_to_streamablehttp_server so mTLS is used during registration, activation, refresh, and health checks — not just tool invocation. - SIGHUP handler now also closes the MCP session pool so pooled connections reconnect with new TLS state after cert rotation. - Persist ca_certificate, ca_certificate_sig, signing_algorithm in update_gateway — fields were accepted by GatewayUpdate schema but silently ignored on save. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
- Thread client_cert/client_key through _check_single_gateway_health so mTLS-required upstreams pass health checks (was using self.create_ssl_context with CA-only, ignoring client certs). - Add MCPSessionPool.drain_all() that closes all sessions without setting _closed=True, so the pool stays operational and creates fresh sessions on demand. SIGHUP now calls drain_mcp_session_pool() instead of close_mcp_session_pool(), which previously disabled pooling entirely until restart. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
d0fc6a4 to
881c0a8
Compare
Member
Review CompleteRebased onto main (no conflicts) and addressed the following during review: Bugs Fixed
Security Fixes
Other Fixes
Test Coverage
Verified
|
crivetimihai
previously approved these changes
Mar 23, 2026
Member
crivetimihai
left a comment
There was a problem hiding this comment.
Approved after thorough review, bug fixes, security hardening, and end-to-end verification against live deployment.
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
…drain - update_gateway: test CA + mTLS field persistence, masked client_key preservation (covers lines 2005-2018) - MCPSessionPool.drain_all: test pooled session drain, active session drain, QueueEmpty race condition (covers lines 1410-1428) - drain_mcp_session_pool: test singleton call and no-op when pool is None (covers lines 2151-2152) - tool_service: test decryption failure fallback path (covers lines 3941-3942) Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
…overage - Health check mTLS SSL context creation with encrypted client_key decryption (lines 3348-3349) - Health check decrypt exception fallback (lines 3343-3344) - _initialize_gateway passes client_cert/client_key to SSE and StreamableHTTP connect methods (lines 5227, 5393-5394) - update_gateway re-init decrypt exception fallback (lines 2164-2165) - refresh decrypt exception fallback (lines 4684-4685) Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📋 PR Summary: SSL Context Caching with mTLS Support & HTTP Bypass
Branch:
fix/ssl-context-cache-mtls-rotationCloses: #3629, #3579
Files Changed: 12 files | +331 insertions | -24 deletions
🎯 Problem Statement
http://) URLs → 49% CPU overhead✅ Solution Overview
📁 Files Changed
ssl_context_cache.pytool_service.pymain.pyschemas.pydb.pyconfig.py.env.exampleself-signed-certificates.md615af4ab94b4_*.pytest_ssl_context_cache.pytest_tool_service_coverage.py.gitignore🔍 Key Changes by File
1.
ssl_context_cache.py— Core SSL Cache EnhancementBefore:
After:
New Features:
ctx.load_cert_chain()SSL_CONTEXT_CACHE_TTL)SSL_CONTEXT_CACHE_MAX_SIZE)2.
tool_service.py— HTTP Bypass & mTLS ResolutionHTTP Bypass (Lines 3986-3994):
mTLS Resolution (Lines 3918-3930):
Resolves: #3629 — ~49% CPU reduction for HTTP URLs
3.
main.py— SIGHUP Handler (Lines 1702-1741)Usage:
Resolves: #3579 NET-02 — Zero-downtime certificate rotation
4. Database Migration
File:
mcpgateway/alembic/versions/615af4ab94b4_add_gateway_client_cert_and_key.pyProperties:
20a0e0538ac5)Note: Fields already exist in
db.pymodel (lines 4609-4610). Migration ensures existing databases get the columns.Resolves: #3579 NET-01 — mTLS client certificate storage
5. API Schemas (
schemas.py)GatewayCreate (Line 2621-2622):
GatewayUpdate (Line 2972-2973):
GatewayRead (Line 3272-3273):
🧪 Test Coverage
test_get_cached_ssl_context_loads_client_cert_and_keytest_cache_key_different_for_client_cert_changestest_is_expired_returns_false_when_ttl_disabledtest_is_expired_returns_true_when_entry_ttl_elapsedtest_get_cached_ssl_context_clears_cache_when_over_limittest_tool_service_coverage.py🔧 Configuration
SSL_CONTEXT_CACHE_MAX_SIZESSL_CONTEXT_CACHE_TTL📊 Performance Impact
✅ Backward Compatibility