Description
Following in the steps of 551, it would be great to add support for partition_by in window aggregations.
Description
It would be useful to support SQL-like window aggregations with PARTITION BY semantics in lag/rolling transformations.
Currently, mlforecast supports groupby in rolling transformations (e.g. RollingQuantile), which aggregates values across multiple time series that share the same value in a column.
Example:
RollingQuantile(p=0.5, window_size=3, groupby=["brand"])
This aggregates values from all time series belonging to the same brand within the specified window.
However, there are many forecasting use cases where the desired behavior is different:
we want to aggregate within the same time series, but only over rows that share the same value in another column.
This corresponds to SQL window functions with PARTITION BY.
Example SQL:
SELECT
avg(qty_sold) OVER (
PARTITION BY article, store, is_promo
ORDER BY CAST(date AS timestamp)
RANGE BETWEEN INTERVAL 7 DAYS PRECEDING AND INTERVAL 1 DAY PRECEDING
) AS rolling_avg_by_promo_7
FROM sales
Here the aggregation happens within each (article, store) time series, but separately for is_promo=True and is_promo=False observations.
Proposed Feature
Introduce a partition_by argument for rolling transformations.
The behavior would be:
-
aggregation happens within each unique_id series
-
but only over rows where the values of partition_by columns match the current row
-
the rolling window still follows the time ordering of the series
Conceptually:
partition_by = conditional filtering inside the series
groupby = aggregation across series
Example
Input
df = pd.DataFrame(
{
"unique_id": ["a", "a", "a", "a", "b", "b", "b", "b"],
"ds": [1, 2, 3, 4, 1, 2, 3, 4],
"y": [1, 2, 3, 4, 10, 20, 30, 40],
"promo": [True, True, False, True, False, True, False, True],
}
)
Transformation
tfm = RollingMean(3, min_samples=1, partition_by=["promo"])
ts = TimeSeries(freq=1, lag_transforms={1: [tfm]})
prep = ts.fit_transform(
df,
id_col="unique_id",
time_col="ds",
target_col="y",
dropna=False,
static_features=[],
)
Expected Output
expected_by_key = {
("a", 1): np.nan,
("a", 2): 1.0,
("a", 3): np.nan,
("a", 4): 1.5,
("b", 1): np.nan,
("b", 2): np.nan,
("b", 3): 10.0,
("b", 4): 20.0,
}
Motivation
This functionality is very useful for many forecasting scenarios:
-
promotion-aware features
-
price regime segmentation
-
holiday vs non-holiday patterns
-
weather condition segmentation
-
state-dependent rolling statistics
Currently these features require manual preprocessing with pandas or SQL, which:
-
duplicates logic outside the feature pipeline
-
prevents reuse of mlforecast transformations
-
complicates reproducibility and backtesting
Supporting partition_by natively would allow users to define these features directly in lag transformations, consistent with the current API.
Suggested API
Example:
RollingMean(
window_size=7,
min_samples=1,
partition_by=["promo"]
)
Possible interaction rules:
-
partition_by operates within unique_id
-
groupby aggregates across unique_ids
-
both could potentially coexist
Example:
RollingMean(7, partition_by=["promo"], groupby=["brand"])
(though this interaction may require further design discussion).
Summary
Adding partition_by would enable SQL-style window semantics inside a single time series, which is a common requirement in retail and demand forecasting use cases.
This would significantly expand the feature engineering capabilities of mlforecast while keeping the API consistent with the existing lag transformation design.
Use case
No response
Description
Following in the steps of 551, it would be great to add support for
partition_byin window aggregations.Description
It would be useful to support SQL-like window aggregations with
PARTITION BYsemantics in lag/rolling transformations.Currently,
mlforecastsupportsgroupbyin rolling transformations (e.g.RollingQuantile), which aggregates values across multiple time series that share the same value in a column.Example:
This aggregates values from all time series belonging to the same brand within the specified window.
However, there are many forecasting use cases where the desired behavior is different:
we want to aggregate within the same time series, but only over rows that share the same value in another column.
This corresponds to SQL window functions with
PARTITION BY.Example SQL:
Here the aggregation happens within each
(article, store)time series, but separately foris_promo=Trueandis_promo=Falseobservations.Proposed Feature
Introduce a
partition_byargument for rolling transformations.The behavior would be:
aggregation happens within each
unique_idseriesbut only over rows where the values of
partition_bycolumns match the current rowthe rolling window still follows the time ordering of the series
Conceptually:
Example
Input
Transformation
Expected Output
Motivation
This functionality is very useful for many forecasting scenarios:
promotion-aware features
price regime segmentation
holiday vs non-holiday patterns
weather condition segmentation
state-dependent rolling statistics
Currently these features require manual preprocessing with pandas or SQL, which:
duplicates logic outside the feature pipeline
prevents reuse of
mlforecasttransformationscomplicates reproducibility and backtesting
Supporting
partition_bynatively would allow users to define these features directly in lag transformations, consistent with the current API.Suggested API
Example:
Possible interaction rules:
partition_byoperates withinunique_idgroupbyaggregates acrossunique_idsboth could potentially coexist
Example:
(though this interaction may require further design discussion).
Summary
Adding
partition_bywould enable SQL-style window semantics inside a single time series, which is a common requirement in retail and demand forecasting use cases.This would significantly expand the feature engineering capabilities of
mlforecastwhile keeping the API consistent with the existing lag transformation design.Use case
No response