HIVE-29484: HMS.getFields() fails with 'SchemaReader not supported' for Avro and Hbase tables#6350
HIVE-29484: HMS.getFields() fails with 'SchemaReader not supported' for Avro and Hbase tables#6350rtrivedi12 wants to merge 2 commits intoapache:masterfrom
Conversation
soumyakanti3578
left a comment
There was a problem hiding this comment.
Please divide the test for each SerDe, otherwise the PR looks good!
| * "Storage schema reading not supported". | ||
| */ | ||
| @Test | ||
| public void testGetFieldsForStorageSerDes() throws Exception { |
There was a problem hiding this comment.
We should probably divide this into two tests for the two SerDes as they are completely different and not related to each other. Also, the comment above should be edited accordingly.
There was a problem hiding this comment.
Thanks for the review @soumyakanti3578 !
I have separated the tests for AvroSerDe and HBaseSerDe.
For the AvroSerDe fix, I've modified and kept it out of SERDES_USING_METASTORE_FOR_SCHEMA and handled it at the HMSHandler level instead. Adding AvroSerDe to that config list has two correctness problems:
Breaks schema evolution in HiveServer2 -Table.getColsInternal() uses hasMetastoreBasedSchema() which checks SERDES_USING_METASTORE_FOR_SCHEMA. If AvroSerDe is in the list, it short-circuits directly to SD columns for all Avro tables . This would cause HS2 to silently serve stale schema.
Permanently blocks ALTER TABLE <avro_tbl> UPDATE COLUMNS -AlterTableUpdateColumnsOperation throws UnsupportedOperationException for any SerDe in SERDES_USING_METASTORE_FOR_SCHEMA. This would prevent users from ever syncing an Avro table's SD columns from the live SerDe schema.
…eserve Avro schema evolution in HS2
|
| "org.apache.hadoop.hive.serde2.OpenCSVSerde," + | ||
| "org.apache.hadoop.hive.hbase.HBaseSerDe", | ||
| "SerDes retrieving schema from metastore. This is an internal parameter."), |
There was a problem hiding this comment.
This config is deprecated, We don't need this change.
There was a problem hiding this comment.
Thanks for the review @saihemanth-cloudera ! One of the test was failing which is using HiveConf, hence I updated this config.
There was a problem hiding this comment.
IMO, it is better to change the test to use metastoreconf instead of hive conf
| "org.apache.hadoop.hive.serde2.OpenCSVSerde," + | ||
| "org.apache.iceberg.mr.hive.HiveIcebergSerDe", | ||
| "org.apache.iceberg.mr.hive.HiveIcebergSerDe," + | ||
| "org.apache.hadoop.hive.hbase.HBaseSerDe", |
There was a problem hiding this comment.
Hbase schema can evolve, so why do want to insist on using metastore schema?
There was a problem hiding this comment.
HbaseSerDe unlike Avroserde is not configured to adopt to external schema changes. If the schema evolves in HBase, the getFieldsFromDeserializer() method in Hive still returns schema from metastore SD . So, schema is always derived from 'hbase.column.mappings' property in metastore. In case of AvroSerDe, Hive reads the live schema from external url. So, it seems safe to have HbaseSerDe in Metastore Schema config list.



What changes were proposed in this pull request?
org.apache.hadoop.hive.serde2.avro.AvroSerDe and org.apache.hadoop.hive.hbase.HBaseSerDe are added to the default value of MetastoreConf.ConfVars.SERDES_USING_METASTORE_FOR_SCHEMA.
Why are the changes needed?
HiveMetaStore.get_fields_with_environment_context()checks whether a table's SerDe is listed inmetastore.serdes.using.metastore.for.schema. If it is, columns are returned directly from tbl.getSd().getCols(). If not, HMS delegates to StorageSchemaReader, whose default implementation (DefaultStorageSchemaReader) unconditionally throws:Does this PR introduce any user-facing change?
Yes. Previously, calling HMS.getFields() on a table using AvroSerDe or HBaseSerDe would fail with MetaException: Storage schema reading not supported. After this change, the call succeeds and returns the columns stored in the metastore, consistent with the behavior for all other built-in SerDes
How was this patch tested?
A new test testGetFieldsForStorageSerDes() is added to TestHiveMetaStore