-
Notifications
You must be signed in to change notification settings - Fork 654
Fix authorization #4338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix authorization #4338
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors API key authorization in both the OpenAI API server and the proxy by replacing per-route Depends(check_api_key) enforcement with a shared ASGI AuthenticationMiddleware, and updates the proxy request forwarding to pass through original request headers/body (to preserve Authorization).
Changes:
- Removed
check_api_keydependency usage and introducedAuthenticationMiddleware(newserver_utils.py) for token auth. - Updated proxy forwarding to pass through inbound request headers/body, and added a custom streaming response wrapper to surface upstream non-200 responses.
- Added a JSON Content-Type validator dependency in the OpenAI API server for several endpoints.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
lmdeploy/serve/proxy/proxy.py |
Removes route-level auth dependencies, forwards raw request headers/body to upstream nodes, adds streaming error handling, and conditionally installs auth middleware. |
lmdeploy/serve/openai/server_utils.py |
Adds new ASGI AuthenticationMiddleware implementation for Bearer token validation. |
lmdeploy/serve/openai/api_server.py |
Removes check_api_key, adds validate_json_request, adjusts validation exception handler, and conditionally installs auth middleware. |
Comments suppressed due to low confidence (1)
lmdeploy/serve/proxy/proxy.py:441
- NodeManager.stream_generate/generate now require a FastAPI Request and call raw_request.body(), but large parts of the proxy (e.g., DistServe prefill/decode paths) still call these methods with a request dict. This will raise AttributeError at runtime and break DistServe routing. Consider either (a) keeping dict-based helpers for internally-mutated requests (prefill/migration) and adding separate raw_request forwarders, or (b) updating DistServe call sites to build the outbound body/headers explicitly (including auth) without relying on Request.
async def stream_generate(self, raw_request: Request, node_url: str, endpoint: str):
try:
target_url = node_url.rstrip('/') + endpoint
headers = self._prepare_headers(raw_request)
body_bytes = await raw_request.body()
async with aiohttp.ClientSession() as session:
async with session.post(target_url, headers=headers, data=body_bytes,
timeout=self.aiotimeout) as response:
if response.status != 200:
error_body = await response.read()
raise APIServerException(status_code=response.status, body=error_body)
async for line in response.content:
if line.strip():
yield line + b'\n\n'
except APIServerException:
# raise APIServerException again to be caught by the outer layer
raise
except (Exception, GeneratorExit, aiohttp.ClientError) as e: # noqa
logger.error(f'catched an exception: {e}')
# exception happened, reduce unfinished num
yield self.handle_api_timeout(node_url)
async def generate(self, raw_request: Request, node_url: str, endpoint: str):
try:
target_url = node_url.rstrip('/') + endpoint
headers = self._prepare_headers(raw_request)
body_bytes = await raw_request.body()
async with aiohttp.ClientSession() as session:
async with session.post(target_url, headers=headers, data=body_bytes,
timeout=self.aiotimeout) as response:
return await response.text()
except (Exception, GeneratorExit, aiohttp.ClientError, asyncio.CancelledError) as e: # noqa # yapf: disable
logger.error(f'catched an exception: {e}')
return self.handle_api_timeout(node_url)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lmdeploy/serve/openai/api_server.py
Outdated
| content_type = raw_request.headers.get('content-type', '').lower() | ||
| media_type = content_type.split(';', maxsplit=1)[0] | ||
| if media_type != 'application/json': | ||
| raise RequestValidationError(errors=["Unsupported Media Type: Only 'application/json' is allowed"]) |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate_json_request raises RequestValidationError for unsupported Content-Type, but RequestValidationError is meant for pydantic validation issues and doesn't carry HTTP status/headers. This also maps the error to 422 instead of the more appropriate 415 Unsupported Media Type. Consider raising HTTPException(status_code=415, ...) (or returning a JSONResponse) and keeping RequestValidationError for schema/field validation only.
| raise RequestValidationError(errors=["Unsupported Media Type: Only 'application/json' is allowed"]) | |
| raise HTTPException( | |
| status_code=HTTPStatus.UNSUPPORTED_MEDIA_TYPE.value, | |
| detail="Unsupported Media Type: Only 'application/json' is allowed", | |
| ) |
| 'body': exc.body | ||
| }), | ||
| ) | ||
| return JSONResponse(status_code=exc.status_code, content=exc.detail, headers=exc.headers) |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validation_exception_handler assumes RequestValidationError has .status_code/.detail/.headers, but FastAPI's RequestValidationError exposes .errors() and .body (and does not define those attributes). This will raise AttributeError the first time a validation error occurs. Either restore the previous handler logic using exc.errors()/exc.body, or switch to handling HTTPException for the custom 415 case.
| return JSONResponse(status_code=exc.status_code, content=exc.detail, headers=exc.headers) | |
| return JSONResponse( | |
| status_code=HTTPStatus.UNPROCESSABLE_ENTITY, | |
| content={'detail': exc.errors(), 'body': exc.body}, | |
| ) |
lmdeploy/serve/proxy/proxy.py
Outdated
| await send({'type': 'http.response.start', 'status': e.status_code, 'headers': self.raw_headers}) | ||
| await send({ | ||
| 'type': 'http.response.body', | ||
| 'body': e.body, |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProxyStreamingResponse sends error responses using self.raw_headers (typically "text/event-stream"), ignoring APIServerException.headers/content-type. If the upstream returns e.g. 401/403 with JSON, clients will receive the wrong content-type. Consider passing upstream response headers into APIServerException and using them (or at least setting application/json) when sending the error response start.
| await send({'type': 'http.response.start', 'status': e.status_code, 'headers': self.raw_headers}) | |
| await send({ | |
| 'type': 'http.response.body', | |
| 'body': e.body, | |
| # Use headers from APIServerException (defaulting to application/json) instead of streaming headers | |
| error_headers = [ | |
| (str(k).encode('latin-1'), str(v).encode('latin-1')) | |
| for k, v in (e.headers or {}).items() | |
| ] | |
| error_body = e.body if isinstance(e.body, (bytes, bytearray)) else str(e.body).encode('utf-8') | |
| await send({ | |
| 'type': 'http.response.start', | |
| 'status': e.status_code, | |
| 'headers': error_headers, | |
| }) | |
| await send({ | |
| 'type': 'http.response.body', | |
| 'body': error_body, |
| if scope['type'] not in ('http', 'websocket') or scope['method'] == 'OPTIONS': | ||
| # scope["type"] can be "lifespan" or "startup" for example, | ||
| # in which case we don't need to do anything | ||
| return self.app(scope, receive, send) |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AuthenticationMiddleware.call unconditionally reads scope['method'] when scope['type'] is 'websocket'. WebSocket scopes typically don't include a 'method' key, so this can raise KeyError and break websocket connections. Use scope.get('method') (or only check OPTIONS when scope['type']=='http').
| if scope['type'] not in ('http', 'websocket') or scope['method'] == 'OPTIONS': | |
| # scope["type"] can be "lifespan" or "startup" for example, | |
| # in which case we don't need to do anything | |
| return self.app(scope, receive, send) | |
| if scope['type'] not in ('http', 'websocket'): | |
| # scope["type"] can be "lifespan" or "startup" for example, | |
| # in which case we don't need to do anything | |
| return self.app(scope, receive, send) | |
| if scope['type'] == 'http' and scope.get('method') == 'OPTIONS': | |
| return self.app(scope, receive, send) |
|
|
||
|
|
||
| @router.get('/is_sleeping', dependencies=[Depends(check_api_key)]) | ||
| @router.get('/is_sleeping', dependencies=[Depends(validate_json_request)]) |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new validate_json_request dependency is attached to GET /is_sleeping. GET requests often omit Content-Type entirely, so this endpoint will start failing with a validation error even though it has no JSON body. Consider removing this dependency from GET routes (and any routes that don't require a JSON request body), or make validate_json_request conditional on HTTP method and/or presence of a body.
| @router.get('/is_sleeping', dependencies=[Depends(validate_json_request)]) | |
| @router.get('/is_sleeping') |
| if api_keys is not None and (tokens := [key for key in api_keys if key]): | ||
| from .server_utils import AuthenticationMiddleware | ||
|
|
||
| app.add_middleware(AuthenticationMiddleware, tokens=tokens) |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Authentication is now enforced via AuthenticationMiddleware, but it intentionally only checks paths starting with /v1. This means non-/v1 endpoints that previously required an API key (e.g. /generate, /update_weights, /sleep, /wakeup, /pooling, /is_sleeping) become unauthenticated when api_keys are configured. Consider expanding the middleware path check (or adding per-route protection) so all sensitive endpoints remain protected (keeping explicit allowlist exceptions like /health and docs).
| if request.stream is True: | ||
| response = node_manager.stream_generate(request_dict, node_url, '/v1/completions') | ||
| response = node_manager.stream_generate(raw_request, node_url, '/v1/completions') | ||
| background_task = node_manager.create_background_tasks(node_url, start) | ||
| return StreamingResponse(response, background=background_task, media_type='text/event-stream') | ||
| else: |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Hybrid /v1/completions streaming path, stream_generate can raise APIServerException (e.g., upstream returns non-200). Unlike /v1/chat/completions, this endpoint still uses StreamingResponse, so exceptions will likely surface as an unhandled 500 instead of a clean upstream status/body. Consider using ProxyStreamingResponse here as well (and in other streaming endpoints) for consistent error propagation.
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Please describe the motivation of this PR and the goal you want to achieve through this PR.
Modification
Please briefly describe what modification is made in this PR.
BC-breaking (Optional)
Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.
Checklist