-
Notifications
You must be signed in to change notification settings - Fork 2k
[Spark] Add UCDeltaTableReadTest to cover time travel READ, CDF READ #5792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: openinx <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR ports the Python UnityCatalogManagedTableReadSuite integration tests to the Spark+UC Java integration test suite. It adds comprehensive test coverage for read operations on Unity Catalog Delta tables, including time travel queries and Change Data Feed (CDF) functionality.
Key changes:
- Adds
UCDeltaTableReadTestwith parameterized tests for both EXTERNAL and MANAGED table types - Tests time travel READ operations using VERSION AS OF and TIMESTAMP AS OF syntax
- Tests CDF READ operations with both version and timestamp parameters, with proper handling of catalog-managed vs external table differences
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| List<List<String>> versionResult1 = | ||
| sql("SELECT * FROM %s VERSION AS OF %d ORDER BY id", tableName, currentVersion); | ||
| check(versionResult1, List.of(List.of("1"), List.of("2"), List.of("3"))); | ||
|
|
||
| // Test VERSION AS OF with SQL syntax | ||
| List<List<String>> versionResult2 = | ||
| sql("SELECT * FROM %s VERSION AS OF %d ORDER BY id", tableName, currentVersion); | ||
| check(versionResult2, List.of(List.of("1"), List.of("2"), List.of("3"))); |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 56-58 and 61-63 contain duplicate code that performs the same operation. Both blocks execute identical VERSION AS OF queries and checks. The comment on line 55 says "Test VERSION AS OF with DataFrameReader API" but the code uses SQL syntax, and line 60 says "Test VERSION AS OF with SQL syntax" which is the same as the previous block. One of these duplicate blocks should be removed, or if testing different APIs is intended, the DataFrameReader API test should actually use the DataFrame API instead of SQL.
| // Test VERSION AS OF with DataFrameReader API | ||
| List<List<String>> versionResult1 = | ||
| sql("SELECT * FROM %s VERSION AS OF %d ORDER BY id", tableName, currentVersion); | ||
| check(versionResult1, List.of(List.of("1"), List.of("2"), List.of("3"))); | ||
|
|
||
| // Test VERSION AS OF with SQL syntax | ||
| List<List<String>> versionResult2 = | ||
| sql("SELECT * FROM %s VERSION AS OF %d ORDER BY id", tableName, currentVersion); | ||
| check(versionResult2, List.of(List.of("1"), List.of("2"), List.of("3"))); | ||
|
|
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name 'versionResult1' followed by 'versionResult2' suggests these are meant to test different scenarios or produce different results, but both variables store identical query results. Consider using more descriptive names that reflect what each test is validating, or removing the duplicate test if only one is needed.
| // Test VERSION AS OF with DataFrameReader API | |
| List<List<String>> versionResult1 = | |
| sql("SELECT * FROM %s VERSION AS OF %d ORDER BY id", tableName, currentVersion); | |
| check(versionResult1, List.of(List.of("1"), List.of("2"), List.of("3"))); | |
| // Test VERSION AS OF with SQL syntax | |
| List<List<String>> versionResult2 = | |
| sql("SELECT * FROM %s VERSION AS OF %d ORDER BY id", tableName, currentVersion); | |
| check(versionResult2, List.of(List.of("1"), List.of("2"), List.of("3"))); | |
| // Test VERSION AS OF | |
| List<List<String>> versionAsOfResult = | |
| sql("SELECT * FROM %s VERSION AS OF %d ORDER BY id", tableName, currentVersion); | |
| check(versionAsOfResult, List.of(List.of("1"), List.of("2"), List.of("3"))); |
| message.toLowerCase().contains("path based access") | ||
| || message.toLowerCase().contains("catalog-managed"), |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message validation uses 'message.toLowerCase().contains(...)' for some checks but 'message.contains(...)' for others without converting to lowercase. This inconsistency could lead to fragile tests if the error message case changes. Consider either consistently using toLowerCase() for all case-insensitive checks, or using case-sensitive exact matches if the message format is guaranteed.
| message.toLowerCase().contains("path based access") | |
| || message.toLowerCase().contains("catalog-managed"), | |
| message.contains("path based access") | |
| || message.contains("catalog-managed"), |
| Assertions.assertTrue( | ||
| message.contains("UPDATE_DELTA_METADATA") | ||
| || message.toLowerCase().contains("catalog-managed"), |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message validation pattern is inconsistent with the one used in testChangeDataFeedWithTimestamp. Here, 'UPDATE_DELTA_METADATA' is checked with case-sensitive contains() while 'catalog-managed' uses case-insensitive toLowerCase().contains(). Consider using a consistent approach across both methods for better maintainability.
| Assertions.assertTrue( | |
| message.contains("UPDATE_DELTA_METADATA") | |
| || message.toLowerCase().contains("catalog-managed"), | |
| String lowerMessage = message == null ? "" : message.toLowerCase(); | |
| Assertions.assertTrue( | |
| lowerMessage.contains("update_delta_metadata") | |
| || lowerMessage.contains("catalog-managed"), |
| Assertions.assertTrue( | ||
| message.contains("AccessDeniedException") | ||
| || message.toLowerCase().contains("access denied") | ||
| || message.toLowerCase().contains("catalog-managed"), |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message validation pattern is inconsistent across the test methods. This check uses 'AccessDeniedException' (case-sensitive), 'access denied' (case-insensitive), and 'catalog-managed' (case-insensitive). Consider standardizing the approach to either always use case-sensitive or case-insensitive matching for better code consistency.
| Assertions.assertTrue( | |
| message.contains("AccessDeniedException") | |
| || message.toLowerCase().contains("access denied") | |
| || message.toLowerCase().contains("catalog-managed"), | |
| String messageLower = message.toLowerCase(); | |
| Assertions.assertTrue( | |
| messageLower.contains("accessdeniedexception") | |
| || messageLower.contains("access denied") | |
| || messageLower.contains("catalog-managed"), |
| /** | ||
| * Read operation test suite for Delta Table operations through Unity Catalog. | ||
| * | ||
| * <p>Covers time travel, change data feed, streaming, and path-based access scenarios. Tests are |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class documentation mentions 'streaming' as one of the covered scenarios, but there are no streaming test methods in this class. According to the PR description, streaming tests will be covered separately in UCDeltaStreamingTest. The documentation should be updated to remove the reference to streaming to accurately reflect what this test class covers.
| * <p>Covers time travel, change data feed, streaming, and path-based access scenarios. Tests are | |
| * <p>Covers time travel, change data feed, and path-based access scenarios. Tests are |
|
The failures are : |
Which Delta project/connector is this regarding?
Description
Port the python UnityCatalogManagedTableReadSuite into the spark+uc integration tests.
About the test_streaming_read, it will be covered separately in the
UCDeltaStreamingTest( https://github.com/delta-io/delta/pull/5719/files ) . So let's just ignore it here.How was this patch tested?
build/sbt "sparkUnityCatalog/testOnly io.sparkuctest.UCDeltaTableDMLTest"Does this PR introduce any user-facing changes?
No