Skip to content

Feature Request: partition_by support for window aggregations #587

@simonez-tuidi

Description

@simonez-tuidi

Description

Following in the steps of 551, it would be great to add support for partition_by in window aggregations.

Description

It would be useful to support SQL-like window aggregations with PARTITION BY semantics in lag/rolling transformations.

Currently, mlforecast supports groupby in rolling transformations (e.g. RollingQuantile), which aggregates values across multiple time series that share the same value in a column.

Example:

RollingQuantile(p=0.5, window_size=3, groupby=["brand"])

This aggregates values from all time series belonging to the same brand within the specified window.

However, there are many forecasting use cases where the desired behavior is different:
we want to aggregate within the same time series, but only over rows that share the same value in another column.

This corresponds to SQL window functions with PARTITION BY.

Example SQL:

SELECT
avg(qty_sold) OVER (
PARTITION BY article, store, is_promo
ORDER BY CAST(date AS timestamp)
RANGE BETWEEN INTERVAL 7 DAYS PRECEDING AND INTERVAL 1 DAY PRECEDING
) AS rolling_avg_by_promo_7
FROM sales

Here the aggregation happens within each (article, store) time series, but separately for is_promo=True and is_promo=False observations.


Proposed Feature

Introduce a partition_by argument for rolling transformations.

The behavior would be:

  • aggregation happens within each unique_id series

  • but only over rows where the values of partition_by columns match the current row

  • the rolling window still follows the time ordering of the series

Conceptually:

partition_by = conditional filtering inside the series
groupby = aggregation across series

Example

Input

df = pd.DataFrame(
{
"unique_id": ["a", "a", "a", "a", "b", "b", "b", "b"],
"ds": [1, 2, 3, 4, 1, 2, 3, 4],
"y": [1, 2, 3, 4, 10, 20, 30, 40],
"promo": [True, True, False, True, False, True, False, True],
}
)

Transformation

tfm = RollingMean(3, min_samples=1, partition_by=["promo"])

ts = TimeSeries(freq=1, lag_transforms={1: [tfm]})

prep = ts.fit_transform(
df,
id_col="unique_id",
time_col="ds",
target_col="y",
dropna=False,
static_features=[],
)

Expected Output

expected_by_key = {
("a", 1): np.nan,
("a", 2): 1.0,
("a", 3): np.nan,
("a", 4): 1.5,
("b", 1): np.nan,
("b", 2): np.nan,
("b", 3): 10.0,
("b", 4): 20.0,
}

Motivation

This functionality is very useful for many forecasting scenarios:

  • promotion-aware features

  • price regime segmentation

  • holiday vs non-holiday patterns

  • weather condition segmentation

  • state-dependent rolling statistics

Currently these features require manual preprocessing with pandas or SQL, which:

  • duplicates logic outside the feature pipeline

  • prevents reuse of mlforecast transformations

  • complicates reproducibility and backtesting

Supporting partition_by natively would allow users to define these features directly in lag transformations, consistent with the current API.


Suggested API

Example:

RollingMean(
window_size=7,
min_samples=1,
partition_by=["promo"]
)

Possible interaction rules:

  • partition_by operates within unique_id

  • groupby aggregates across unique_ids

  • both could potentially coexist

Example:

RollingMean(7, partition_by=["promo"], groupby=["brand"])

(though this interaction may require further design discussion).


Summary

Adding partition_by would enable SQL-style window semantics inside a single time series, which is a common requirement in retail and demand forecasting use cases.

This would significantly expand the feature engineering capabilities of mlforecast while keeping the API consistent with the existing lag transformation design.

Use case

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions