[Enhancement] Iceberg: skip files before open() using runtime filter min/max vs manifest column stats

## Feature request

  **Is your feature request related to a problem? Please describe.**

  When executing star-schema queries that join a large Iceberg fact table with a small
  filtered dimension table, StarRocks waits for runtime filters (RF) to arrive before
  scanning (PRECONDITION_BLOCK). However, the RF is then only applied as a row-level
  predicate after each file is already opened and data pages are read from remote storage
  (S3/HDFS).

  The Iceberg manifest already contains per-file column statistics (`lower_bounds` /
  `upper_bounds`) that are available on the BE before any file is opened. Files whose
  entire column range falls outside the RF's `[min, max]` are still opened and read,
  wasting remote I/O unnecessarily.

  **Describe the solution you'd like**

  After the runtime filter arrives and before `open()` is called on each file, compare
  the RF's `[min, max]` against the corresponding column's per-file stats from the Iceberg
  manifest. If the file's range and the RF range do not overlap, skip the file entirely —
  no file open, no remote read. This eliminates I/O at the file level rather than
  discarding rows after they are already read.

  **Describe alternatives you've considered**

  Coordinator-side pruning: prune the scan range list at the FE before dispatching to
  BEs. This saves split metadata transfer in addition to I/O, but requires protocol
  changes (new Thrift fields + RPC). The BE-side file-skip described above delivers the
  primary I/O benefit with no protocol changes and is a simpler first step.

  **Additional context**

  Trino benchmarks on TPC-DS show 50–66% reduction in data read on queries where dynamic
  partition pruning is effective. The per-file min/max statistics are already populated in
  `THdfsScanRange.min_max_values` from Iceberg manifest metadata — no new metadata
  infrastructure is needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Iceberg: skip files before open() using runtime filter min/max vs manifest column stats #71005

Feature request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Enhancement] Iceberg: skip files before open() using runtime filter min/max vs manifest column stats #71005

Description

Feature request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions