-
Notifications
You must be signed in to change notification settings - Fork 420
feat: Add support for rest scan planning #2864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| return FileScanTask( | ||
| data_file=data_file, | ||
| delete_files=resolved_deletes, | ||
| residual=rest_task.residual_filter if rest_task.residual_filter else ALWAYS_TRUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are residual filters bounded in the fs task ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for review @singhpk234! The residual filters from REST are not bound in the normal sense. Currently the residual is only used for the optimize check in count().
The actual row filtering still uses the full row_filter, not the residual. This works correctly but is slightly inefficient.
| Returns: | ||
| PlanningResponse the result of the scan plan request representing the status | ||
| Raises: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Raises: | |
| Raises: |
| def _should_use_rest_planning(self) -> bool: | ||
| """Check if REST scan planning should be used for this scan.""" | ||
| from pyiceberg.catalog.rest import RestCatalog | ||
|
|
||
| if not isinstance(self.catalog, RestCatalog): | ||
| return False | ||
| return self.catalog.is_rest_scan_planning_enabled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be inclined to create a method on the Catalog, eg:
@property
@abstractmethod
def use_server_side_planning(self, identifier: str | Identifier) -> bool:
"""Support for Server Side Planning"""
Have the MetastoreCatalog implement it, and return False. And rename is_rest_scan_planning_enabled to support_server_side_planning. Now we have to go though multiple jumps.
This would also clean up _plan_files_rest below.
| # REST content-type to DataFileContent | ||
| CONTENT_TYPE_MAP: dict[str, DataFileContent] = { | ||
| "data": DataFileContent.DATA, | ||
| "position-deletes": DataFileContent.POSITION_DELETES, | ||
| "equality-deletes": DataFileContent.EQUALITY_DELETES, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move this to a static method on DataFileContent?
| return False | ||
| return self.catalog.is_rest_scan_planning_enabled() | ||
|
|
||
| def _plan_files_rest(self) -> Iterable[FileScanTask]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, To keep the same language:
| def _plan_files_rest(self) -> Iterable[FileScanTask]: | |
| def _plan_files_server_side(self) -> Iterable[FileScanTask]: |
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, would be good to get those cleaned up. Apart from that, this looks great to me! Thanks @geruh for working on this, very exciting to see this being added 👍
related to #2775
Rationale for this change
Adds synchornous client-side support for REST server side scan planning, allowing for scanning if the rest catalog supports it.
This PR cherry-picks and builds on two WIP PRs:
Currently scanning is enable with rest-scan-planning-enabled=true in catalog properties.
TODO: spec handling
Are these changes tested?
Integration tests added with manual testing
Are there any user-facing changes?
yes