-
Notifications
You must be signed in to change notification settings - Fork 3k
Ignore seed when splitting batch in chunks with groupby #3047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore seed when splitting batch in chunks with groupby #3047
Conversation
|
Hi @baberabb @StellaAthena |
Sorry, missed this! will take a closer look. Does this pass an individual |
|
yes in the AIME25 benchmark for instance, we pass a |
|
My understanding is And it wouldn't really break the Please correct me if I'm misunderstanding anything here! |
|
humm, so you're suggesting the following ?
Should we also update other models (sglang ??) to use the same logic if they support continuous batching ? |
yes! left a comment if it makes it more clear. And, would definitely appreciate if we could use the same approach for slang as well, if they support it. They use something similar to continuous batching, but I'm not as familiar with their API. |
|
you should add |
|
@baberabb @slimfrkha. Thanks for the good work. I am thinking of working on this PR to migrate lm_eval version in evalchemy to this commit. Could you share the test script you used to verify this PR using I set |
|
that PR is from a collegue of mine who also opened another PR to enable CUDA Graphs with vLLM Data Parallel when dp > 1. |
Okay, thanks for the quick reply @slimfrkha. Also, when you say “PR in evalchemy,” are you referring to this PR? It seems incomplete to me, just minor fixes like updating the P.S. I resolved the issue of mine setting |
|
yeah that PR |
) * feat(vllm_causallms): make collator ignore seed when splitting batch into chunks * fix(collator): revert PR changes * fix(vllm-causallm): update collator call with groupby None * feat(sglang-causallms): make generation accept a list of sampling params --------- Co-authored-by: Baber <[email protected]>
) * feat(vllm_causallms): make collator ignore seed when splitting batch into chunks * fix(collator): revert PR changes * fix(vllm-causallm): update collator call with groupby None * feat(sglang-causallms): make generation accept a list of sampling params --------- Co-authored-by: Baber <[email protected]>
Problem:
When evaluating reasoning models, we use n_repeat to generate for the same prompt several DIFFERENT answers. To ensure variability in generations for the same prompt, a different seed is given to each n_repeat value.
A recent PR was introduced in evalchemy to call vllm once instead of calling it for every n_repeat. This make evaluations way faster.
The problem is: because Collator currently splits on different seed values, this optimization is useless.
Solution:
In Collator, ignore seed when splitting batch in chunks with groupby. Seed doesn't have any incidence on the correctness of generations in VLLM (which is not the case for others args like temperature etc)
Key Changes
with n_repeat=64, x15 speedup in evaluting AIME25 for a ~7b/10b model
Tests
Reproduced AIME25 results for an open source reasoning model (skywork-or1) in evalchemy.