feat: Vectorize decoding to remove ThreadPoolExecutor overhead by CodeMaverick-143 · Pull Request #119 · google-deepmind/regress-lm

CodeMaverick-143 · 2025-12-01T06:18:31Z

This PR optimizes the decode method in PyTorchModel by replacing the per-sample ThreadPoolExecutor loop with a fully vectorized approach. This significantly reduces Python overhead during inference, particularly for larger batch sizes.

Changes

regress_lm/vocabs.py

Added get_allowed_tokens_mask to DecoderVocab.
This method computes a boolean mask of allowed next tokens for the entire batch in a single operation, running directly on the GPU. It supports standard numeric formatting and handles edge cases such as NaN and Inf.

regress_lm/pytorch/model.py

Refactored decode to utilize the new batch-level mask generation.
Removed reliance on ThreadPoolExecutor and eliminated the per-sample decoding loop.

Performance

Removes thread creation and context-switching overhead.
Shifts token mask computation to GPU/tensor operations where possible.
Correctness verified using start-of-sequence and mid-sequence generation tests.

… decoding.

google-cla · 2025-12-01T06:18:37Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

xingyousong · 2026-01-17T16:32:17Z

Hi @CodeMaverick-143 - thanks for the PR. Just wondering, what is the diff in speed after this change?

feat: Implement vectorized token masking in vocab and use it in model…

d5137b2

… decoding.

CodeMaverick-143 added 2 commits December 1, 2025 23:09

Merge branch 'google-deepmind:main' into main

e41148b

Merge branch 'google-deepmind:main' into main

b6ecef6

Merge branch 'main' into main

f7756fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Vectorize decoding to remove ThreadPoolExecutor overhead#119

feat: Vectorize decoding to remove ThreadPoolExecutor overhead#119
CodeMaverick-143 wants to merge 4 commits intogoogle-deepmind:mainfrom
CodeMaverick-143:main

CodeMaverick-143 commented Dec 1, 2025

Uh oh!

google-cla bot commented Dec 1, 2025

Uh oh!

xingyousong commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CodeMaverick-143 commented Dec 1, 2025

Changes

regress_lm/vocabs.py

regress_lm/pytorch/model.py

Performance

Uh oh!

google-cla bot commented Dec 1, 2025

Uh oh!

xingyousong commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants