Skip to content

feat: Vectorize decoding to remove ThreadPoolExecutor overhead#119

Open
CodeMaverick-143 wants to merge 4 commits intogoogle-deepmind:mainfrom
CodeMaverick-143:main
Open

feat: Vectorize decoding to remove ThreadPoolExecutor overhead#119
CodeMaverick-143 wants to merge 4 commits intogoogle-deepmind:mainfrom
CodeMaverick-143:main

Conversation

@CodeMaverick-143
Copy link

This PR optimizes the decode method in PyTorchModel by replacing the per-sample ThreadPoolExecutor loop with a fully vectorized approach. This significantly reduces Python overhead during inference, particularly for larger batch sizes.

Changes

regress_lm/vocabs.py

Added get_allowed_tokens_mask to DecoderVocab.
This method computes a boolean mask of allowed next tokens for the entire batch in a single operation, running directly on the GPU. It supports standard numeric formatting and handles edge cases such as NaN and Inf.

regress_lm/pytorch/model.py

Refactored decode to utilize the new batch-level mask generation.
Removed reliance on ThreadPoolExecutor and eliminated the per-sample decoding loop.

Performance

  • Removes thread creation and context-switching overhead.
  • Shifts token mask computation to GPU/tensor operations where possible.
  • Correctness verified using start-of-sequence and mid-sequence generation tests.

@google-cla
Copy link

google-cla bot commented Dec 1, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@xingyousong
Copy link
Collaborator

Hi @CodeMaverick-143 - thanks for the PR. Just wondering, what is the diff in speed after this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants