-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[Enhancement] Iceberg: skip files before open() using runtime filter min/max vs manifest column stats #71005
Description
Feature request
Is your feature request related to a problem? Please describe.
When executing star-schema queries that join a large Iceberg fact table with a small
filtered dimension table, StarRocks waits for runtime filters (RF) to arrive before
scanning (PRECONDITION_BLOCK). However, the RF is then only applied as a row-level
predicate after each file is already opened and data pages are read from remote storage
(S3/HDFS).
The Iceberg manifest already contains per-file column statistics (lower_bounds /
upper_bounds) that are available on the BE before any file is opened. Files whose
entire column range falls outside the RF's [min, max] are still opened and read,
wasting remote I/O unnecessarily.
Describe the solution you'd like
After the runtime filter arrives and before open() is called on each file, compare
the RF's [min, max] against the corresponding column's per-file stats from the Iceberg
manifest. If the file's range and the RF range do not overlap, skip the file entirely —
no file open, no remote read. This eliminates I/O at the file level rather than
discarding rows after they are already read.
Describe alternatives you've considered
Coordinator-side pruning: prune the scan range list at the FE before dispatching to
BEs. This saves split metadata transfer in addition to I/O, but requires protocol
changes (new Thrift fields + RPC). The BE-side file-skip described above delivers the
primary I/O benefit with no protocol changes and is a simpler first step.
Additional context
Trino benchmarks on TPC-DS show 50–66% reduction in data read on queries where dynamic
partition pruning is effective. The per-file min/max statistics are already populated in
THdfsScanRange.min_max_values from Iceberg manifest metadata — no new metadata
infrastructure is needed.