Add Reverso foundation model for zero-shot time series forecasting#3061
Add Reverso foundation model for zero-shot time series forecasting#3061shinfxh wants to merge 6 commits intounit8co:masterfrom
Conversation
Reverso is a lightweight (~3M params) foundation model combining long convolutions with DeltaNet linear attention. Includes torch-native implementation, HuggingFace integration, and unit tests for all three variants (nano, small, full). Closes unit8co#3034
|
@shinfxh I will try to provide more comments and reviews in the coming days. In the meantime, could you update the documentation |
There was a problem hiding this comment.
Thank you @shinfxh for this PR and congratulations on releasing the models.
The PR is in very good shape already so I mostly made minor comments on wordings. The main changes I am requesting would be the unit tests that cover fidelity tests, invalid usage, and fine-tuning, etc. Also, you might need to merge with main branch, to use the lazy import for ReversoModel.
Because Reverso is branded as efficient, I would suggest adding some tips to further boost its inference speed, such as using GPU/MPS, bfloat16, which can be set via pl_trainer_kwargs, and using reverso-nano via hub_model_name.
|
|
||
| Reverso is a highly parameter efficient model that achieves comparable performance with models 100x its size. | ||
|
|
||
| A combination of long convolutions and DeltaNet sequence mixing modules are used. |
There was a problem hiding this comment.
Header is supposed to be short. Please consider removing this line here.
Model intro should go into ReversoModel docstring.
| d_intermediate: int = 256, | ||
| output_bottleneck_dim: int = 48, | ||
| expand_v: float = 1.0, | ||
| state_weaving: int | bool = False, |
There was a problem hiding this comment.
Is there a reason why state_weaving, use_norm, learn_bias, and use_output_pe could be int here? Can we not assume they are all bools?
| if self.use_norm: | ||
| x_min = x.min(1, keepdim=True)[0].detach() | ||
| x_max = x.max(1, keepdim=True)[0].detach() | ||
| x_range = torch.clamp(x_max - x_min, min=1e-5).detach() |
There was a problem hiding this comment.
Calling .detach() seems unnecessary here.
| input_chunk_length: int, | ||
| output_chunk_length: int, | ||
| output_chunk_shift: int = 0, | ||
| hub_model_name: str = "shinfxh/reverso-small", |
There was a problem hiding this comment.
If shinfxh/reverso-base has the best performance, can we not use it as the default model? What is the trade-off here?
| Number of time steps in the past to take as a model input (per chunk). Applies to the target | ||
| series. For Reverso, ``input_chunk_length`` must be less than or equal to the model's context | ||
| length (2048 for all Reverso variants). |
There was a problem hiding this comment.
| Number of time steps in the past to take as a model input (per chunk). Applies to the target | |
| series. For Reverso, ``input_chunk_length`` must be less than or equal to the model's context | |
| length (2048 for all Reverso variants). | |
| Number of time steps in the past to take as a model input (per chunk). Applies to the target | |
| series. For Reverso, maximum is 2048. |
| return str(dest_path) | ||
|
|
||
|
|
||
| class TestReversoModel: |
There was a problem hiding this comment.
All these reverso-specific tests should be in a separate file test_reverso.py.
| reverso_variant_dirs = { | ||
| "shinfxh/reverso-nano": reverso_artefacts_dir / "tiny_reverso_nano", | ||
| "shinfxh/reverso-small": reverso_artefacts_dir / "tiny_reverso_small", | ||
| "shinfxh/reverso-base": reverso_artefacts_dir / "tiny_reverso_full", | ||
| } |
There was a problem hiding this comment.
Do you need three mock models here? I think one "tiny" reverso-nano would be enough.
|
|
||
| class TestReversoModel: | ||
| series = generate_series(n_variables=2, length=100, prefix="A") | ||
|
|
There was a problem hiding this comment.
I think the current test suite might be incomplete. Looking at TestTimesFM2p5Model, the following tests seem to be missing here:
test_fidelity: this is important to make sure the implementaton is correct here, against the original. I recommend using the actual reverso-full for this one.test_default: default model should be deterministic.test_probabilistic: reverso cannot be probabilistic, so an error is expected here.test_multivariate: reverso shoould support multivariate.test_covariates: reverso does not support covariates, so errors are expected.test_multiple_series: reverso supports forecasting on multiple time series at the same time.
Except for test_fidelity, all other tests should use a tiny mock reverso to reduce overhead.
|
|
||
| ### For users of the library: | ||
|
|
||
| - Added `ReversoModel`, a new foundation model for zero-shot time series forecasting. Reverso is a highly parameter-efficient model (200K-2.6M params) that matches accuracy of models 100x its size. [#3034](https://github.com/unit8co/darts/issues/3034) by [Xinghong Fu](https://github.com/shinfxh). |
There was a problem hiding this comment.
| - Added `ReversoModel`, a new foundation model for zero-shot time series forecasting. Reverso is a highly parameter-efficient model (200K-2.6M params) that matches accuracy of models 100x its size. [#3034](https://github.com/unit8co/darts/issues/3034) by [Xinghong Fu](https://github.com/shinfxh). | |
| - Added new forecasting model `ReversoModel`: a family of highly parameter-efficient (200K-2.6M) foundation models that matches accuracy of models 100x their sizes. It supports univariate, multivariate, and multiple time series forecasting without training and can be fine-tuned on your own data. [#3034](https://github.com/unit8co/darts/issues/3034) by [Xinghong Fu](https://github.com/shinfxh). |
| x = x_past.permute(0, 2, 1).reshape(-1, L) | ||
|
|
||
| # left-pad with per-series first value to seq_len | ||
| if L < self.seq_len: |
There was a problem hiding this comment.
Just curious, do we have to pad the context here? I imagine removing the padding could speed up inference a lot for small input_chunk_length. But we might introduce train-test mismatch here if the model only performs well with 2048 input.
Checklist before merging this PR:
Fixes #3034.
Summary
ReversoModel, a lightweight foundation model (200K-2.6M params) for zero-shot univariate time series forecasting, ported from shinfxh/reversoflash-linear-attentionwith a parallel-scan delta rule, andFlashFFTConvwithtorch.fft)reverso-nano(200K),reverso-small(550K),reverso-base(2.6M)Implementation details
ReversoModelextendsFoundationModelwithHuggingFaceConnectorfor weight loadingOther Information
The CPU implementation of Reverso is still much slower than the GPU implementation at shinfxh/reverso. For production use it is still much more recommended to use the GPU implementation!