-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
Originally from this discussion: #19278 (comment)
It turns out aggregate functions can still be used with window frames even if they don't implement retract_batch, if the window frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. And if the accumulator consumes the internal state as part of evaluate, it can return incorrect results.
To Reproduce
In aggregate.slt, try this:
query ITRRR
SELECT
timestamp,
tags,
value,
median(value) OVER (
PARTITION BY tags
ORDER BY timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
),
percentile_cont(value, 0.5) OVER (
PARTITION BY tags
ORDER BY timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM median_window_test
ORDER BY tags, timestamp;
----
1 tag1 10 10 10
2 tag1 20 15 15
3 tag1 30 20 20
4 tag1 40 25 25
5 tag1 50 30 30
1 tag2 60 60 60
2 tag2 70 65 65
3 tag2 80 70 70
4 tag2 90 75 75
5 tag2 100 80 80We would expect those last two columns to be the same. However the actual results are:
query ITRRR
SELECT
timestamp,
tags,
value,
median(value) OVER (
PARTITION BY tags
ORDER BY timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
),
percentile_cont(value, 0.5) OVER (
PARTITION BY tags
ORDER BY timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM median_window_test
ORDER BY tags, timestamp;
----
1 tag1 10 10 10
2 tag1 20 15 20
3 tag1 30 20 30
4 tag1 40 25 40
5 tag1 50 30 50
1 tag2 60 60 60
2 tag2 70 65 70
3 tag2 80 70 80
4 tag2 90 75 90
5 tag2 100 80 100- See last column; it exactly follows the input data because it keeps discarding its state
Expected behavior
I've identified the following aggregate functions which likely exhibit this behaviour:
- percentile_cont
- string_agg
- distinct median
median was fixed by #19278
We need to fix these accumulators to not consume the internal state on evaluate. We also would want to update the documentation around accumulators to make this behaviour obvious so we don't fall into the same trap later.
Additional context
Comment on why consuming internal state is bad for windows: #19278 (comment)