Skip to content

Paimon Source and Hudi 1.1 for unpartitioned table data validation fails. #777

@vinishjail97

Description

@vinishjail97

Search before asking

  • I had searched in the issues and found no similar issues.

Please describe the bug 🐞

Error:    ITConversionController.testVariousOperations:266->checkDatasetEquivalence:955->checkDatasetEquivalence:1029->lambda$checkDatasetEquivalence$10:1036 Datasets have different row counts when reading from Spark. Source: PAIMON, Target: HUDI ==> expected: <100> but was: <0>

In Hudi 1.x, all the partition paths from MDT are coming in as empty causing the failures as compared to 0.x.

  protected List<PartitionPath> listPartitionPaths(List<String> relativePartitionPaths) {
    List<String> matchedPartitionPaths;
    try {
      if (isPartitionedTable()) {
        if (queryType == HoodieTableQueryType.INCREMENTAL && incrementalQueryStartTime.isPresent() && !isBeforeTimelineStarts()) {
          HoodieTimeline timelineToQuery = findInstantsInRange();
          matchedPartitionPaths = TimelineUtils.getWrittenPartitions(timelineToQuery);
        } else {
          matchedPartitionPaths = tableMetadata.getPartitionPathWithPathPrefixes(relativePartitionPaths);
        }
      } else {
        matchedPartitionPaths = Collections.singletonList(StringUtils.EMPTY_STRING);
      }
    } catch (IOException e) {
      throw new HoodieIOException("Error fetching partition paths", e);
    }

https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java#L346

Are you willing to submit PR?

  • I am willing to submit a PR!
  • I am willing to submit a PR but need help getting started!

Code of Conduct

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions