Add Reverso foundation model for zero-shot time series forecasting by shinfxh · Pull Request #3061 · unit8co/darts

shinfxh · 2026-04-06T22:05:48Z

Checklist before merging this PR:

Mentioned all issues that this PR fixes or addresses.
Summarized the updates of this PR under Summary.
Added an entry under Unreleased in the Changelog.

Fixes #3034.

Summary

Add ReversoModel, a lightweight foundation model (200K-2.6M params) for zero-shot univariate time series forecasting, ported from shinfxh/reverso
Pure PyTorch implementation with no external dependencies beyond what Darts already requires (replaces flash-linear-attention with a parallel-scan delta rule, and FlashFFTConv with torch.fft)
Three pretrained variants on HuggingFace Hub: reverso-nano (200K), reverso-small (550K), reverso-base (2.6M)

Implementation details

ReversoModel extends FoundationModel with HuggingFaceConnector for weight loading
DeltaNet attention supports both recurrent form and chunked parallel.
Supports univariate point forecasting; multivariate targets are handled independently per component
MIT license for ported code is included in source files

Other Information

The CPU implementation of Reverso is still much slower than the GPU implementation at shinfxh/reverso. For production use it is still much more recommended to use the GPU implementation!

Reverso is a lightweight (~3M params) foundation model combining long convolutions with DeltaNet linear attention. Includes torch-native implementation, HuggingFace integration, and unit tests for all three variants (nano, small, full). Closes unit8co#3034

daidahao · 2026-04-07T15:47:42Z

@shinfxh
Thank you for the contribution and especially the chunked parallel optimisation in native PyTorch. The PR looks clean and in great shape already.

I will try to provide more comments and reviews in the coming days. In the meantime, could you update the documentation index.rst, README.md, covariates.md like what I did in #2980? That way, the new model would be properly advertised to our users!

daidahao

Thank you @shinfxh for this PR and congratulations on releasing the models.

The PR is in very good shape already so I mostly made minor comments on wordings. The main changes I am requesting would be the unit tests that cover fidelity tests, invalid usage, and fine-tuning, etc. Also, you might need to merge with main branch, to use the lazy import for ReversoModel.

Because Reverso is branded as efficient, I would suggest adding some tips to further boost its inference speed, such as using GPU/MPS, bfloat16, which can be set via pl_trainer_kwargs, and using reverso-nano via hub_model_name.

daidahao · 2026-04-12T21:01:44Z

darts/models/forecasting/reverso_model.py

+
+Reverso is a highly parameter efficient model that achieves comparable performance with models 100x its size.
+
+A combination of long convolutions and DeltaNet sequence mixing modules are used.


Header is supposed to be short. Please consider removing this line here.

Model intro should go into ReversoModel docstring.

daidahao · 2026-04-12T21:06:14Z

darts/models/forecasting/reverso_model.py

+        d_intermediate: int = 256,
+        output_bottleneck_dim: int = 48,
+        expand_v: float = 1.0,
+        state_weaving: int | bool = False,


Is there a reason why state_weaving, use_norm, learn_bias, and use_output_pe could be int here? Can we not assume they are all bools?

daidahao · 2026-04-12T21:08:05Z

darts/models/forecasting/reverso_model.py

+        if self.use_norm:
+            x_min = x.min(1, keepdim=True)[0].detach()
+            x_max = x.max(1, keepdim=True)[0].detach()
+            x_range = torch.clamp(x_max - x_min, min=1e-5).detach()


Calling .detach() seems unnecessary here.

daidahao · 2026-04-12T21:15:39Z

darts/models/forecasting/reverso_model.py

+        input_chunk_length: int,
+        output_chunk_length: int,
+        output_chunk_shift: int = 0,
+        hub_model_name: str = "shinfxh/reverso-small",


If shinfxh/reverso-base has the best performance, can we not use it as the default model? What is the trade-off here?

daidahao · 2026-04-12T21:17:30Z

darts/models/forecasting/reverso_model.py

+            Number of time steps in the past to take as a model input (per chunk). Applies to the target
+            series. For Reverso, ``input_chunk_length`` must be less than or equal to the model's context
+            length (2048 for all Reverso variants).


Suggested change

Number of time steps in the past to take as a model input (per chunk). Applies to the target

series. For Reverso, ``input_chunk_length`` must be less than or equal to the model's context

length (2048 for all Reverso variants).

Number of time steps in the past to take as a model input (per chunk). Applies to the target

series. For Reverso, maximum is 2048.

daidahao · 2026-04-12T21:37:30Z

darts/tests/models/forecasting/test_foundation.py

+        return str(dest_path)
+
+
+class TestReversoModel:


All these reverso-specific tests should be in a separate file test_reverso.py.

daidahao · 2026-04-12T21:42:01Z

darts/tests/models/forecasting/test_foundation.py

+reverso_variant_dirs = {
+    "shinfxh/reverso-nano": reverso_artefacts_dir / "tiny_reverso_nano",
+    "shinfxh/reverso-small": reverso_artefacts_dir / "tiny_reverso_small",
+    "shinfxh/reverso-base": reverso_artefacts_dir / "tiny_reverso_full",
+}


Do you need three mock models here? I think one "tiny" reverso-nano would be enough.

daidahao · 2026-04-12T21:47:28Z

darts/tests/models/forecasting/test_foundation.py

+
+class TestReversoModel:
+    series = generate_series(n_variables=2, length=100, prefix="A")
+


I think the current test suite might be incomplete. Looking at TestTimesFM2p5Model, the following tests seem to be missing here:

test_fidelity: this is important to make sure the implementaton is correct here, against the original. I recommend using the actual reverso-full for this one.

test_default: default model should be deterministic.

test_probabilistic: reverso cannot be probabilistic, so an error is expected here.

test_multivariate: reverso shoould support multivariate.

test_covariates: reverso does not support covariates, so errors are expected.

test_multiple_series: reverso supports forecasting on multiple time series at the same time.

Except for test_fidelity, all other tests should use a tiny mock reverso to reduce overhead.

daidahao · 2026-04-12T21:53:06Z

CHANGELOG.md


 ### For users of the library:

+- Added `ReversoModel`, a new foundation model for zero-shot time series forecasting. Reverso is a highly parameter-efficient model (200K-2.6M params) that matches accuracy of models 100x its size. [#3034](https://github.com/unit8co/darts/issues/3034) by [Xinghong Fu](https://github.com/shinfxh).


Suggested change

- Added `ReversoModel`, a new foundation model for zero-shot time series forecasting. Reverso is a highly parameter-efficient model (200K-2.6M params) that matches accuracy of models 100x its size. [#3034](https://github.com/unit8co/darts/issues/3034) by [Xinghong Fu](https://github.com/shinfxh).

- Added new forecasting model `ReversoModel`: a family of highly parameter-efficient (200K-2.6M) foundation models that matches accuracy of models 100x their sizes. It supports univariate, multivariate, and multiple time series forecasting without training and can be fine-tuned on your own data. [#3034](https://github.com/unit8co/darts/issues/3034) by [Xinghong Fu](https://github.com/shinfxh).

daidahao · 2026-04-12T22:06:57Z

darts/models/forecasting/reverso_model.py

+        x = x_past.permute(0, 2, 1).reshape(-1, L)
+
+        # left-pad with per-series first value to seq_len
+        if L < self.seq_len:


Just curious, do we have to pad the context here? I imagine removing the padding could speed up inference a lot for small input_chunk_length. But we might introduce train-test mismatch here if the model only performs well with 2048 input.

shinfxh added 5 commits April 6, 2026 03:19

add chunking. recommended chunk sizes: nano 512, small 128, base 64

ed5ec9c

fix changelog

e229743

fix formatting

71624f7

model links

f08f419

shinfxh requested a review from dennisbader as a code owner April 6, 2026 22:05

update documentation

dfcf28a

daidahao suggested changes Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Reverso foundation model for zero-shot time series forecasting#3061

Add Reverso foundation model for zero-shot time series forecasting#3061
shinfxh wants to merge 6 commits intounit8co:masterfrom
shinfxh:feature/reverso-foundation-model

shinfxh commented Apr 6, 2026

Uh oh!

daidahao commented Apr 7, 2026

Uh oh!

daidahao left a comment •

edited

Loading

Uh oh!

daidahao Apr 12, 2026

Uh oh!

daidahao Apr 12, 2026

Uh oh!

daidahao Apr 12, 2026

Uh oh!

daidahao Apr 12, 2026

Uh oh!

daidahao Apr 12, 2026

Uh oh!

daidahao Apr 12, 2026

Uh oh!

daidahao Apr 12, 2026

Uh oh!

daidahao Apr 12, 2026

Uh oh!

daidahao Apr 12, 2026

Uh oh!

daidahao Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		Reverso is a highly parameter efficient model that achieves comparable performance with models 100x its size.

		A combination of long convolutions and DeltaNet sequence mixing modules are used.


		class TestReversoModel:
		series = generate_series(n_variables=2, length=100, prefix="A")


		### For users of the library:

		- Added `ReversoModel`, a new foundation model for zero-shot time series forecasting. Reverso is a highly parameter-efficient model (200K-2.6M params) that matches accuracy of models 100x its size. [#3034](https://github.com/unit8co/darts/issues/3034) by [Xinghong Fu](https://github.com/shinfxh).

	- Added `ReversoModel`, a new foundation model for zero-shot time series forecasting. Reverso is a highly parameter-efficient model (200K-2.6M params) that matches accuracy of models 100x its size. [#3034](https://github.com/unit8co/darts/issues/3034) by [Xinghong Fu](https://github.com/shinfxh).
	- Added new forecasting model `ReversoModel`: a family of highly parameter-efficient (200K-2.6M) foundation models that matches accuracy of models 100x their sizes. It supports univariate, multivariate, and multiple time series forecasting without training and can be fine-tuned on your own data. [#3034](https://github.com/unit8co/darts/issues/3034) by [Xinghong Fu](https://github.com/shinfxh).

Conversation

shinfxh commented Apr 6, 2026

Summary

Implementation details

Other Information

Uh oh!

daidahao commented Apr 7, 2026

Uh oh!

daidahao left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daidahao left a comment •

edited

Loading