-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Added unit tests for parquet-dataformat plugin #20180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/datafusion
Are you sure you want to change the base?
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Comment |
|
❌ Gradle check result for 04b25fa: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for d5718c8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
| * Unit tests for ParquetExecutionEngine covering engine lifecycle, writer creation, | ||
| * file loading and tracking, refresh operations, and resource management. | ||
| */ | ||
| public class ParquetExecutionEngineTest extends OpenSearchTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, let's rename these classes to end with Tests.java
|
|
||
| // ===== Basic Engine Functionality Tests ===== | ||
|
|
||
| public void testEngineCreationAndInitialization() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any complex tests with interleaved flows involving document indexing, refreshes, flushes. There should be validations on WriteResult, RefreshResult, WriterFileSet, CatalogSnapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will cover the scenarios in the next commit
|
❌ Gradle check result for 72f8c7b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for eaa92a7: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for c461315: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
rayshrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add all new ParquetDataFormat UTs in the E2E github action ?
| import java.nio.file.*; | ||
| import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid * imports
| super.tearDown(); | ||
| } | ||
|
|
||
| private Schema createRealSchema() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit - DummySchema instead of RealSchema wherever used
| engine.close(); | ||
| } | ||
|
|
||
| public void testWriterLifecycleAndBasicOperations() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we verify end to end parquet file creation as well ?
| assertEquals(PARQUET_DATA_FORMAT.name(), dataFormat.name()); | ||
| } | ||
|
|
||
| public void testRefreshReturnsEmptyResult() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be skipped for now - serves no purpose.
| Writer<ParquetDocumentInput> writer = engine.createWriter(generation); | ||
| assertNotNull(writer); | ||
|
|
||
| ParquetDocumentInput input = writer.newDocumentInput(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create multiple DocumentInputs, add some fields to those document input and then do documentInput.addToWriter for all. Then flush the writer and verify FileInfos returned from flush and also whether the parquet file got created and has all the necessary data that we added - this would really test the entire end to end flow.
| mockManagedVSR = mock(ManagedVSR.class); | ||
| mockMappedFieldType = mock(MappedFieldType.class); | ||
| mockParquetField = mock(ParquetField.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create concrete objects instead of mocks here as well.
| assertSame("AddDoc should return WriteResult from VSRManager", expectedResult, actualResult); | ||
| verify(vsrManager).addToManagedVSR(input); | ||
| verifyNoMoreInteractions(vsrManager); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we are creating a new writer object it will instantiate it's own VSRManager, why are we asserting something on a mock VSRManager ?
Same comment applies to all other places where we are asserting on the mock VSRManager
|
❌ Gradle check result for 77be482: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
| assertNotNull(ex.getCause()); | ||
| } | ||
|
|
||
| public void testSupportedFieldTypesReturnsEmptyList() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we aim to remove to utilize this method in future (validation of index creation ?)
If yes, we should skip this test and add a corresponding test later when concrete implementation of supportedFieldTypes is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove it for now.
| assertTrue(supportedTypes.isEmpty()); | ||
| } | ||
|
|
||
| public void testGetMergerReturnsParquetMergeExecutorInstance() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's skip this test as well and take it up in a later PR where we will be adding UTs related to Merges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure we can remove it for now
| } | ||
|
|
||
| public void testCompleteWriterWorkflowWithMultipleDocuments() throws Exception { | ||
| long generation = 42L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move it to a constant and use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sure
|
|
||
| boolean hasParquetFile = parquetFileSet.getFiles().stream() | ||
| .anyMatch(f -> f.endsWith(".parquet") && f.contains("_parquet_file_generation_" + generation)); | ||
| assertTrue("Should create Parquet file with correct naming pattern", hasParquetFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not just rely on FileInfos but also add a check to ensure physical file exists.
|
This PR is stalled because it has been open for 30 days with no activity. |
Signed-off-by: Tanvir Alam <[email protected]>
Signed-off-by: Tanvir Alam <[email protected]>
Signed-off-by: Tanvir Alam <[email protected]>
Signed-off-by: Tanvir Alam <[email protected]>
|
❌ Gradle check result for 3b9c61e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 23df254: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for d3bd66b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 0f201aa: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Tanvir Alam <[email protected]>
|
❌ Gradle check result for 998010c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
We are adding unit tests for parquet-dataformat plugin classes. In this PR, we are covering :