Skip to content

Add Reverso foundation model for zero-shot time series forecasting#3061

Open
shinfxh wants to merge 6 commits intounit8co:masterfrom
shinfxh:feature/reverso-foundation-model
Open

Add Reverso foundation model for zero-shot time series forecasting#3061
shinfxh wants to merge 6 commits intounit8co:masterfrom
shinfxh:feature/reverso-foundation-model

Conversation

@shinfxh
Copy link
Copy Markdown

@shinfxh shinfxh commented Apr 6, 2026

Checklist before merging this PR:

  • Mentioned all issues that this PR fixes or addresses.
  • Summarized the updates of this PR under Summary.
  • Added an entry under Unreleased in the Changelog.

Fixes #3034.

Summary

  • Add ReversoModel, a lightweight foundation model (200K-2.6M params) for zero-shot univariate time series forecasting, ported from shinfxh/reverso
  • Pure PyTorch implementation with no external dependencies beyond what Darts already requires (replaces flash-linear-attention with a parallel-scan delta rule, and FlashFFTConv with torch.fft)
  • Three pretrained variants on HuggingFace Hub: reverso-nano (200K), reverso-small (550K), reverso-base (2.6M)

Implementation details

  • ReversoModel extends FoundationModel with HuggingFaceConnector for weight loading
  • DeltaNet attention supports both recurrent form and chunked parallel.
  • Supports univariate point forecasting; multivariate targets are handled independently per component
  • MIT license for ported code is included in source files

Other Information

The CPU implementation of Reverso is still much slower than the GPU implementation at shinfxh/reverso. For production use it is still much more recommended to use the GPU implementation!

shinfxh added 5 commits April 6, 2026 03:19
Reverso is a lightweight (~3M params) foundation model combining long
convolutions with DeltaNet linear attention. Includes torch-native
implementation, HuggingFace integration, and unit tests for all three
variants (nano, small, full).

Closes unit8co#3034
@shinfxh shinfxh requested a review from dennisbader as a code owner April 6, 2026 22:05
@daidahao
Copy link
Copy Markdown
Contributor

daidahao commented Apr 7, 2026

@shinfxh
Thank you for the contribution and especially the chunked parallel optimisation in native PyTorch. The PR looks clean and in great shape already.

I will try to provide more comments and reviews in the coming days. In the meantime, could you update the documentation index.rst, README.md, covariates.md like what I did in #2980? That way, the new model would be properly advertised to our users!

Copy link
Copy Markdown
Contributor

@daidahao daidahao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @shinfxh for this PR and congratulations on releasing the models.

The PR is in very good shape already so I mostly made minor comments on wordings. The main changes I am requesting would be the unit tests that cover fidelity tests, invalid usage, and fine-tuning, etc. Also, you might need to merge with main branch, to use the lazy import for ReversoModel.

Because Reverso is branded as efficient, I would suggest adding some tips to further boost its inference speed, such as using GPU/MPS, bfloat16, which can be set via pl_trainer_kwargs, and using reverso-nano via hub_model_name.


Reverso is a highly parameter efficient model that achieves comparable performance with models 100x its size.

A combination of long convolutions and DeltaNet sequence mixing modules are used.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Header is supposed to be short. Please consider removing this line here.

Model intro should go into ReversoModel docstring.

d_intermediate: int = 256,
output_bottleneck_dim: int = 48,
expand_v: float = 1.0,
state_weaving: int | bool = False,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why state_weaving, use_norm, learn_bias, and use_output_pe could be int here? Can we not assume they are all bools?

if self.use_norm:
x_min = x.min(1, keepdim=True)[0].detach()
x_max = x.max(1, keepdim=True)[0].detach()
x_range = torch.clamp(x_max - x_min, min=1e-5).detach()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling .detach() seems unnecessary here.

input_chunk_length: int,
output_chunk_length: int,
output_chunk_shift: int = 0,
hub_model_name: str = "shinfxh/reverso-small",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If shinfxh/reverso-base has the best performance, can we not use it as the default model? What is the trade-off here?

Comment on lines +299 to +301
Number of time steps in the past to take as a model input (per chunk). Applies to the target
series. For Reverso, ``input_chunk_length`` must be less than or equal to the model's context
length (2048 for all Reverso variants).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Number of time steps in the past to take as a model input (per chunk). Applies to the target
series. For Reverso, ``input_chunk_length`` must be less than or equal to the model's context
length (2048 for all Reverso variants).
Number of time steps in the past to take as a model input (per chunk). Applies to the target
series. For Reverso, maximum is 2048.

return str(dest_path)


class TestReversoModel:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these reverso-specific tests should be in a separate file test_reverso.py.

Comment on lines +463 to +467
reverso_variant_dirs = {
"shinfxh/reverso-nano": reverso_artefacts_dir / "tiny_reverso_nano",
"shinfxh/reverso-small": reverso_artefacts_dir / "tiny_reverso_small",
"shinfxh/reverso-base": reverso_artefacts_dir / "tiny_reverso_full",
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need three mock models here? I think one "tiny" reverso-nano would be enough.


class TestReversoModel:
series = generate_series(n_variables=2, length=100, prefix="A")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current test suite might be incomplete. Looking at TestTimesFM2p5Model, the following tests seem to be missing here:

  • test_fidelity: this is important to make sure the implementaton is correct here, against the original. I recommend using the actual reverso-full for this one.
  • test_default: default model should be deterministic.
  • test_probabilistic: reverso cannot be probabilistic, so an error is expected here.
  • test_multivariate: reverso shoould support multivariate.
  • test_covariates: reverso does not support covariates, so errors are expected.
  • test_multiple_series: reverso supports forecasting on multiple time series at the same time.

Except for test_fidelity, all other tests should use a tiny mock reverso to reduce overhead.


### For users of the library:

- Added `ReversoModel`, a new foundation model for zero-shot time series forecasting. Reverso is a highly parameter-efficient model (200K-2.6M params) that matches accuracy of models 100x its size. [#3034](https://github.com/unit8co/darts/issues/3034) by [Xinghong Fu](https://github.com/shinfxh).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Added `ReversoModel`, a new foundation model for zero-shot time series forecasting. Reverso is a highly parameter-efficient model (200K-2.6M params) that matches accuracy of models 100x its size. [#3034](https://github.com/unit8co/darts/issues/3034) by [Xinghong Fu](https://github.com/shinfxh).
- Added new forecasting model `ReversoModel`: a family of highly parameter-efficient (200K-2.6M) foundation models that matches accuracy of models 100x their sizes. It supports univariate, multivariate, and multiple time series forecasting without training and can be fine-tuned on your own data. [#3034](https://github.com/unit8co/darts/issues/3034) by [Xinghong Fu](https://github.com/shinfxh).

x = x_past.permute(0, 2, 1).reshape(-1, L)

# left-pad with per-series first value to seq_len
if L < self.seq_len:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, do we have to pad the context here? I imagine removing the padding could speed up inference a lot for small input_chunk_length. But we might introduce train-test mismatch here if the model only performs well with 2048 input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[New Model] Reverso Foundation Model

2 participants