Skip to content

[Question] MLC concurrent request handling behavior #3407

@jimmyparadm

Description

@jimmyparadm

❓ General Questions

I've installed MLC LLM using prebuilt wheels and am running the MLC server with mlc_llm serve. After sending the first request—while the server is streaming output tokens—if I send a second request, token generation for the first pauses. It only resumes after the second request's prefill phase completes and starts token generation. Is this the expected behavior? I expected the two concurrent requests to be handled independently, without the second interfering with the first. Are there any settings that can modify this? Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionQuestion about the usage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions