-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
questionQuestion about the usageQuestion about the usage
Description
❓ General Questions
I've installed MLC LLM using prebuilt wheels and am running the MLC server with mlc_llm serve. After sending the first request—while the server is streaming output tokens—if I send a second request, token generation for the first pauses. It only resumes after the second request's prefill phase completes and starts token generation. Is this the expected behavior? I expected the two concurrent requests to be handled independently, without the second interfering with the first. Are there any settings that can modify this? Thank you.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionQuestion about the usageQuestion about the usage