Skip to content

Remove redundant getSnapshotAt calls per commit#791

Open
brishi19791 wants to merge 8 commits intoapache:mainfrom
brishi19791:users/rbokka/removeRedundantDeltaCalls
Open

Remove redundant getSnapshotAt calls per commit#791
brishi19791 wants to merge 8 commits intoapache:mainfrom
brishi19791:users/rbokka/removeRedundantDeltaCalls

Conversation

@brishi19791
Copy link
Contributor

This PR removes redundant DeltaLog.getSnapshotAt(version) calls in the Delta source conversion path that were happening for every commit. getSnapshotAt can internally trigger an expensive Spark job and associated network I/O (e.g., listing/reading Delta log metadata from remote storage) to resolve the snapshot for a given version. We now fetch the snapshot once per commit/version and reuse it to construct the InternalTable (via a DeltaTableExtractor.table(Snapshot, tableName) overload), instead of re-resolving the same snapshot multiple times.

Impact

  • Avoids redundant snapshot resolution work per commit/version (and the Spark job + network calls it may trigger).
  • Reduces end-to-end conversion latency, especially for large commit backlogs.
  • No intended functional behavior change; performance optimization only.

@kevinjqliu
Copy link
Contributor

We now fetch the snapshot once per commit/version and reuse it to construct the InternalTable (via a DeltaTableExtractor.table(Snapshot, tableName) overload), instead of re-resolving the same snapshot multiple times.

Thats awesome, thanks for adding the perf improvement. It might be a good idea to add a test (either here or a follow up PR) to verify that the entire conversion process should only read the DeltaLog once.

Copy link
Contributor

@ashvina ashvina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement @brishi19791
LGTM with minor suggestions. IMO, this change doesn’t just remove redundant DeltaLog/snapshot lookups, it anchors all derived state to a single, known Snapshot, which eliminates subtle race windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants