Skip to content

Conversation

@lvhan028
Copy link
Collaborator

@lvhan028 lvhan028 commented Feb 9, 2026

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

Copilot AI review requested due to automatic review settings February 9, 2026 03:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors API key authorization in both the OpenAI API server and the proxy by replacing per-route Depends(check_api_key) enforcement with a shared ASGI AuthenticationMiddleware, and updates the proxy request forwarding to pass through original request headers/body (to preserve Authorization).

Changes:

  • Removed check_api_key dependency usage and introduced AuthenticationMiddleware (new server_utils.py) for token auth.
  • Updated proxy forwarding to pass through inbound request headers/body, and added a custom streaming response wrapper to surface upstream non-200 responses.
  • Added a JSON Content-Type validator dependency in the OpenAI API server for several endpoints.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
lmdeploy/serve/proxy/proxy.py Removes route-level auth dependencies, forwards raw request headers/body to upstream nodes, adds streaming error handling, and conditionally installs auth middleware.
lmdeploy/serve/openai/server_utils.py Adds new ASGI AuthenticationMiddleware implementation for Bearer token validation.
lmdeploy/serve/openai/api_server.py Removes check_api_key, adds validate_json_request, adjusts validation exception handler, and conditionally installs auth middleware.
Comments suppressed due to low confidence (1)

lmdeploy/serve/proxy/proxy.py:441

  • NodeManager.stream_generate/generate now require a FastAPI Request and call raw_request.body(), but large parts of the proxy (e.g., DistServe prefill/decode paths) still call these methods with a request dict. This will raise AttributeError at runtime and break DistServe routing. Consider either (a) keeping dict-based helpers for internally-mutated requests (prefill/migration) and adding separate raw_request forwarders, or (b) updating DistServe call sites to build the outbound body/headers explicitly (including auth) without relying on Request.
    async def stream_generate(self, raw_request: Request, node_url: str, endpoint: str):
        try:
            target_url = node_url.rstrip('/') + endpoint
            headers = self._prepare_headers(raw_request)
            body_bytes = await raw_request.body()
            async with aiohttp.ClientSession() as session:
                async with session.post(target_url, headers=headers, data=body_bytes,
                                        timeout=self.aiotimeout) as response:
                    if response.status != 200:
                        error_body = await response.read()
                        raise APIServerException(status_code=response.status, body=error_body)
                    async for line in response.content:
                        if line.strip():
                            yield line + b'\n\n'
        except APIServerException:
            # raise APIServerException again to be caught by the outer layer
            raise
        except (Exception, GeneratorExit, aiohttp.ClientError) as e:  # noqa
            logger.error(f'catched an exception: {e}')
            # exception happened, reduce unfinished num
            yield self.handle_api_timeout(node_url)

    async def generate(self, raw_request: Request, node_url: str, endpoint: str):
        try:
            target_url = node_url.rstrip('/') + endpoint
            headers = self._prepare_headers(raw_request)
            body_bytes = await raw_request.body()
            async with aiohttp.ClientSession() as session:
                async with session.post(target_url, headers=headers, data=body_bytes,
                                        timeout=self.aiotimeout) as response:
                    return await response.text()
        except (Exception, GeneratorExit, aiohttp.ClientError, asyncio.CancelledError) as e:  # noqa  # yapf: disable
            logger.error(f'catched an exception: {e}')
            return self.handle_api_timeout(node_url)


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

content_type = raw_request.headers.get('content-type', '').lower()
media_type = content_type.split(';', maxsplit=1)[0]
if media_type != 'application/json':
raise RequestValidationError(errors=["Unsupported Media Type: Only 'application/json' is allowed"])
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_json_request raises RequestValidationError for unsupported Content-Type, but RequestValidationError is meant for pydantic validation issues and doesn't carry HTTP status/headers. This also maps the error to 422 instead of the more appropriate 415 Unsupported Media Type. Consider raising HTTPException(status_code=415, ...) (or returning a JSONResponse) and keeping RequestValidationError for schema/field validation only.

Suggested change
raise RequestValidationError(errors=["Unsupported Media Type: Only 'application/json' is allowed"])
raise HTTPException(
status_code=HTTPStatus.UNSUPPORTED_MEDIA_TYPE.value,
detail="Unsupported Media Type: Only 'application/json' is allowed",
)

Copilot uses AI. Check for mistakes.
'body': exc.body
}),
)
return JSONResponse(status_code=exc.status_code, content=exc.detail, headers=exc.headers)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validation_exception_handler assumes RequestValidationError has .status_code/.detail/.headers, but FastAPI's RequestValidationError exposes .errors() and .body (and does not define those attributes). This will raise AttributeError the first time a validation error occurs. Either restore the previous handler logic using exc.errors()/exc.body, or switch to handling HTTPException for the custom 415 case.

Suggested change
return JSONResponse(status_code=exc.status_code, content=exc.detail, headers=exc.headers)
return JSONResponse(
status_code=HTTPStatus.UNPROCESSABLE_ENTITY,
content={'detail': exc.errors(), 'body': exc.body},
)

Copilot uses AI. Check for mistakes.
Comment on lines 83 to 86
await send({'type': 'http.response.start', 'status': e.status_code, 'headers': self.raw_headers})
await send({
'type': 'http.response.body',
'body': e.body,
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProxyStreamingResponse sends error responses using self.raw_headers (typically "text/event-stream"), ignoring APIServerException.headers/content-type. If the upstream returns e.g. 401/403 with JSON, clients will receive the wrong content-type. Consider passing upstream response headers into APIServerException and using them (or at least setting application/json) when sending the error response start.

Suggested change
await send({'type': 'http.response.start', 'status': e.status_code, 'headers': self.raw_headers})
await send({
'type': 'http.response.body',
'body': e.body,
# Use headers from APIServerException (defaulting to application/json) instead of streaming headers
error_headers = [
(str(k).encode('latin-1'), str(v).encode('latin-1'))
for k, v in (e.headers or {}).items()
]
error_body = e.body if isinstance(e.body, (bytes, bytearray)) else str(e.body).encode('utf-8')
await send({
'type': 'http.response.start',
'status': e.status_code,
'headers': error_headers,
})
await send({
'type': 'http.response.body',
'body': error_body,

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +48
if scope['type'] not in ('http', 'websocket') or scope['method'] == 'OPTIONS':
# scope["type"] can be "lifespan" or "startup" for example,
# in which case we don't need to do anything
return self.app(scope, receive, send)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AuthenticationMiddleware.call unconditionally reads scope['method'] when scope['type'] is 'websocket'. WebSocket scopes typically don't include a 'method' key, so this can raise KeyError and break websocket connections. Use scope.get('method') (or only check OPTIONS when scope['type']=='http').

Suggested change
if scope['type'] not in ('http', 'websocket') or scope['method'] == 'OPTIONS':
# scope["type"] can be "lifespan" or "startup" for example,
# in which case we don't need to do anything
return self.app(scope, receive, send)
if scope['type'] not in ('http', 'websocket'):
# scope["type"] can be "lifespan" or "startup" for example,
# in which case we don't need to do anything
return self.app(scope, receive, send)
if scope['type'] == 'http' and scope.get('method') == 'OPTIONS':
return self.app(scope, receive, send)

Copilot uses AI. Check for mistakes.


@router.get('/is_sleeping', dependencies=[Depends(check_api_key)])
@router.get('/is_sleeping', dependencies=[Depends(validate_json_request)])
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new validate_json_request dependency is attached to GET /is_sleeping. GET requests often omit Content-Type entirely, so this endpoint will start failing with a validation error even though it has no JSON body. Consider removing this dependency from GET routes (and any routes that don't require a JSON request body), or make validate_json_request conditional on HTTP method and/or presence of a body.

Suggested change
@router.get('/is_sleeping', dependencies=[Depends(validate_json_request)])
@router.get('/is_sleeping')

Copilot uses AI. Check for mistakes.
Comment on lines 1493 to 1496
if api_keys is not None and (tokens := [key for key in api_keys if key]):
from .server_utils import AuthenticationMiddleware

app.add_middleware(AuthenticationMiddleware, tokens=tokens)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Authentication is now enforced via AuthenticationMiddleware, but it intentionally only checks paths starting with /v1. This means non-/v1 endpoints that previously required an API key (e.g. /generate, /update_weights, /sleep, /wakeup, /pooling, /is_sleeping) become unauthenticated when api_keys are configured. Consider expanding the middleware path check (or adding per-route protection) so all sensitive endpoints remain protected (keeping explicit allowlist exceptions like /health and docs).

Copilot uses AI. Check for mistakes.
Comment on lines 818 to 822
if request.stream is True:
response = node_manager.stream_generate(request_dict, node_url, '/v1/completions')
response = node_manager.stream_generate(raw_request, node_url, '/v1/completions')
background_task = node_manager.create_background_tasks(node_url, start)
return StreamingResponse(response, background=background_task, media_type='text/event-stream')
else:
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Hybrid /v1/completions streaming path, stream_generate can raise APIServerException (e.g., upstream returns non-200). Unlike /v1/chat/completions, this endpoint still uses StreamingResponse, so exceptions will likely surface as an unhandled 500 instead of a clean upstream status/body. Consider using ProxyStreamingResponse here as well (and in other streaming endpoints) for consistent error propagation.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant