Skip to content

Conversation

@cocosz
Copy link

@cocosz cocosz commented Dec 7, 2025

Description

We are adding unit tests for parquet-dataformat plugin classes. In this PR, we are covering :

  • ParquetWriter
  • ParquetExecutionEngine
  • ParquetDocumentInput
  • ArrowBufferPool

@cocosz cocosz requested a review from a team as a code owner December 7, 2025 22:09
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 7, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2025

❌ Gradle check result for 04b25fa: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

❌ Gradle check result for d5718c8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

* Unit tests for ParquetExecutionEngine covering engine lifecycle, writer creation,
* file loading and tracking, refresh operations, and resource management.
*/
public class ParquetExecutionEngineTest extends OpenSearchTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let's rename these classes to end with Tests.java


// ===== Basic Engine Functionality Tests =====

public void testEngineCreationAndInitialization() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any complex tests with interleaved flows involving document indexing, refreshes, flushes. There should be validations on WriteResult, RefreshResult, WriterFileSet, CatalogSnapshot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will cover the scenarios in the next commit

@cocosz cocosz changed the title Added unit tests for parquetwriter and parquetexecutionengine classes Added unit tests for parquet-dataformat plugin Dec 11, 2025
@github-actions
Copy link
Contributor

❌ Gradle check result for 72f8c7b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for eaa92a7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for c461315: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

@rayshrey rayshrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add all new ParquetDataFormat UTs in the E2E github action ?

Comment on lines 21 to 22
import java.nio.file.*;
import java.util.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid * imports

super.tearDown();
}

private Schema createRealSchema() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - DummySchema instead of RealSchema wherever used

engine.close();
}

public void testWriterLifecycleAndBasicOperations() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we verify end to end parquet file creation as well ?

assertEquals(PARQUET_DATA_FORMAT.name(), dataFormat.name());
}

public void testRefreshReturnsEmptyResult() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be skipped for now - serves no purpose.

Writer<ParquetDocumentInput> writer = engine.createWriter(generation);
assertNotNull(writer);

ParquetDocumentInput input = writer.newDocumentInput();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create multiple DocumentInputs, add some fields to those document input and then do documentInput.addToWriter for all. Then flush the writer and verify FileInfos returned from flush and also whether the parquet file got created and has all the necessary data that we added - this would really test the entire end to end flow.

Comment on lines 44 to 46
mockManagedVSR = mock(ManagedVSR.class);
mockMappedFieldType = mock(MappedFieldType.class);
mockParquetField = mock(ParquetField.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create concrete objects instead of mocks here as well.

Comment on lines 89 to 91
assertSame("AddDoc should return WriteResult from VSRManager", expectedResult, actualResult);
verify(vsrManager).addToManagedVSR(input);
verifyNoMoreInteractions(vsrManager);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we are creating a new writer object it will instantiate it's own VSRManager, why are we asserting something on a mock VSRManager ?

Same comment applies to all other places where we are asserting on the mock VSRManager

@github-actions
Copy link
Contributor

❌ Gradle check result for 77be482: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

assertNotNull(ex.getCause());
}

public void testSupportedFieldTypesReturnsEmptyList() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we aim to remove to utilize this method in future (validation of index creation ?)
If yes, we should skip this test and add a corresponding test later when concrete implementation of supportedFieldTypes is done.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove it for now.

assertTrue(supportedTypes.isEmpty());
}

public void testGetMergerReturnsParquetMergeExecutorInstance() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's skip this test as well and take it up in a later PR where we will be adding UTs related to Merges.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we can remove it for now

}

public void testCompleteWriterWorkflowWithMultipleDocuments() throws Exception {
long generation = 42L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move it to a constant and use it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure


boolean hasParquetFile = parquetFileSet.getFiles().stream()
.anyMatch(f -> f.endsWith(".parquet") && f.contains("_parquet_file_generation_" + generation));
assertTrue("Should create Parquet file with correct naming pattern", hasParquetFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not just rely on FileInfos but also add a check to ensure physical file exists.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Jan 18, 2026
@github-actions
Copy link
Contributor

❌ Gradle check result for 3b9c61e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 23df254: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for d3bd66b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 0f201aa: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 998010c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stalled Issues that have stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants