Skip to content

HIVE-29484: HMS.getFields() fails with 'SchemaReader not supported' for Avro and Hbase tables#6350

Open
rtrivedi12 wants to merge 2 commits intoapache:masterfrom
rtrivedi12:HIVE-29484
Open

HIVE-29484: HMS.getFields() fails with 'SchemaReader not supported' for Avro and Hbase tables#6350
rtrivedi12 wants to merge 2 commits intoapache:masterfrom
rtrivedi12:HIVE-29484

Conversation

@rtrivedi12
Copy link
Contributor

What changes were proposed in this pull request?

org.apache.hadoop.hive.serde2.avro.AvroSerDe and org.apache.hadoop.hive.hbase.HBaseSerDe are added to the default value of MetastoreConf.ConfVars.SERDES_USING_METASTORE_FOR_SCHEMA.

Why are the changes needed?

HiveMetaStore.get_fields_with_environment_context() checks whether a table's SerDe is listed in metastore.serdes.using.metastore.for.schema. If it is, columns are returned directly from tbl.getSd().getCols(). If not, HMS delegates to StorageSchemaReader, whose default implementation (DefaultStorageSchemaReader) unconditionally throws:

MetaException: java.lang.UnsupportedOperationException: Storage schema reading not supported

Does this PR introduce any user-facing change?

Yes. Previously, calling HMS.getFields() on a table using AvroSerDe or HBaseSerDe would fail with MetaException: Storage schema reading not supported. After this change, the call succeeds and returns the columns stored in the metastore, consistent with the behavior for all other built-in SerDes

How was this patch tested?

A new test testGetFieldsForStorageSerDes() is added to TestHiveMetaStore

mvn test -Dtest.groups= \
  -Dtest="TestEmbeddedHiveMetaStore#testGetFieldsForStorageSerDes+TestRemoteHiveMetaStore#testGetFieldsForStorageSerDes" \
  -pl standalone-metastore/metastore-server

Copy link
Contributor

@soumyakanti3578 soumyakanti3578 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please divide the test for each SerDe, otherwise the PR looks good!

* "Storage schema reading not supported".
*/
@Test
public void testGetFieldsForStorageSerDes() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably divide this into two tests for the two SerDes as they are completely different and not related to each other. Also, the comment above should be edited accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @soumyakanti3578 !
I have separated the tests for AvroSerDe and HBaseSerDe.

For the AvroSerDe fix, I've modified and kept it out of SERDES_USING_METASTORE_FOR_SCHEMA and handled it at the HMSHandler level instead. Adding AvroSerDe to that config list has two correctness problems:

Breaks schema evolution in HiveServer2 -Table.getColsInternal() uses hasMetastoreBasedSchema() which checks SERDES_USING_METASTORE_FOR_SCHEMA. If AvroSerDe is in the list, it short-circuits directly to SD columns for all Avro tables . This would cause HS2 to silently serve stale schema.

Permanently blocks ALTER TABLE <avro_tbl> UPDATE COLUMNS -AlterTableUpdateColumnsOperation throws UnsupportedOperationException for any SerDe in SERDES_USING_METASTORE_FOR_SCHEMA. This would prevent users from ever syncing an Avro table's SD columns from the live SerDe schema.

@sonarqubecloud
Copy link

Comment on lines +1996 to 1998
"org.apache.hadoop.hive.serde2.OpenCSVSerde," +
"org.apache.hadoop.hive.hbase.HBaseSerDe",
"SerDes retrieving schema from metastore. This is an internal parameter."),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config is deprecated, We don't need this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @saihemanth-cloudera ! One of the test was failing which is using HiveConf, hence I updated this config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it is better to change the test to use metastoreconf instead of hive conf

"org.apache.hadoop.hive.serde2.OpenCSVSerde," +
"org.apache.iceberg.mr.hive.HiveIcebergSerDe",
"org.apache.iceberg.mr.hive.HiveIcebergSerDe," +
"org.apache.hadoop.hive.hbase.HBaseSerDe",
Copy link
Contributor

@saihemanth-cloudera saihemanth-cloudera Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hbase schema can evolve, so why do want to insist on using metastore schema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HbaseSerDe unlike Avroserde is not configured to adopt to external schema changes. If the schema evolves in HBase, the getFieldsFromDeserializer() method in Hive still returns schema from metastore SD . So, schema is always derived from 'hbase.column.mappings' property in metastore. In case of AvroSerDe, Hive reads the live schema from external url. So, it seems safe to have HbaseSerDe in Metastore Schema config list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants