Skip to content

Hudi 1.1 and ICEBERG nested partitioned filter data validation fails.Β #775

@vinishjail97

Description

@vinishjail97

Search before asking

  • I had searched in the issues and found no similar issues.

Please describe the bug 🐞

➜  test_table_4da7fea6_f8d5_4571_bc53_1cfecdabebfb_v1 ls -ltr
total 0
drwxr-xr-x@  8 vinishreddy  staff  256 22 Dec 17:36 WARN
drwxr-xr-x@  8 vinishreddy  staff  256 22 Dec 17:36 INFO
drwxr-xr-x@  8 vinishreddy  staff  256 22 Dec 17:36 ERROR
drwxr-xr-x@  8 vinishreddy  staff  256 22 Dec 17:36 __HIVE_DEFAULT_PARTITION__
drwxr-xr-x@ 20 vinishreddy  staff  640 22 Dec 17:36 metadata

...

avro-tools tojson .hoodie/timeline/20251223013640337_20251223013644463.commit | jq -r '.partitionToWriteStats.map[][].path.string'

25/12/22 17:42:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
__HIVE_DEFAULT_PARTITION__/ffb2f34c-47b9-4579-9698-c43fdc33dc68-3_0-36-40_20251223013640337.parquet
ERROR/ffb2f34c-47b9-4579-9698-c43fdc33dc68-2_0-36-40_20251223013640337.parquet
INFO/ffb2f34c-47b9-4579-9698-c43fdc33dc68-1_0-36-40_20251223013640337.parquet
WARN/ffb2f34c-47b9-4579-9698-c43fdc33dc68-0_0-36-40_20251223013640337.parquet

The expected rows of the table with filter level=INFO with source format as HUDI do not match with the actual rows from from ICEBERG.

Expected with filter of level=INFO. 
org.opentest4j.AssertionFailedError: Datasets are not equivalent when reading from Spark. Source: HUDI, Target: ICEBERG ==> 
Expected :[{"key":"073d4b89-b136-4764-ab85-f076e1bac6ec","ts":1766453800328,"level":"INFO","double_field":0.8905820524458916,"float_field":0.28693622,"int_field":-721922711,"long_field":-1510732180141577976,"boolean_field":true,"string_field":"PRNXgshrMN","byt ...

Actual   :[{"key":"073d4b89-b136-4764-ab85-f076e1bac6ec","ts":1766453800328,"level":"WARN","double_field":0.8905820524458916,"float_field":0.28693622,"int_field":-721922711,"long_field":-1510732180141577976,"boolean_field":true,"string_field":"PRNXgshrMN","byt ...

Are you willing to submit PR?

  • I am willing to submit a PR!
  • I am willing to submit a PR but need help getting started!

Code of Conduct

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions