[Question] MLC concurrent request handling behavior

## ❓ General Questions

I've installed MLC LLM using prebuilt wheels and am running the MLC server with mlc_llm serve. After sending the first request—while the server is streaming output tokens—if I send a second request, token generation for the first pauses. It only resumes after the second request's prefill phase completes and starts token generation. Is this the expected behavior? I expected the two concurrent requests to be handled independently, without the second interfering with the first. Are there any settings that can modify this? Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] MLC concurrent request handling behavior #3407

❓ General Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] MLC concurrent request handling behavior #3407

Description

❓ General Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions