Hive Catalog fails to read Spark DataSource tables - incorrectly treats Parquet as SequenceFile

### Steps to reproduce the behavior (Required)
1. Create a Spark DataSource table in Spark SQL:
  CREATE TABLE test_spark_datasource_table (
      id INT,
      name STRING
  )
  USING PARQUET
  LOCATION 'hdfs://lan/user/hive/warehouse/test.db/test_table';

  2. Insert test data into the table in Spark SQL: 
  INSERT INTO test_spark_datasource_table VALUES
  (1, 'Alice'),
  (2, 'Bob');

  3. Query the table through Hive Catalog in StarRocks:
  SELECT * FROM hive_catalog.test_db.test_spark_datasource_table;

### Expected behavior (Required)
StarRocks should correctly identify the table as a Parquet table and read the actual schema and data from the Parquet files.

### Real behavior (Required)
Query fails with the following error:
  Failed to open the off-heap table scanner. java exception details: java.io.IOException: Failed to open the hive reader.at com.starrocks.hive.reader.HiveScanner.open(HiveScanner.java:218)
  Caused by: java.io.IOException: hdfs:/user/hive/warehouse/test.db/test_table/part-00004-b0139440-a3b4-4755-864c-0c1c386da2f8-c000.snappy.parquet
  not a SequenceFile
      at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:2036)
      at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1982)
      at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1931)
      at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1945)
      at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)
      at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
      at com.starrocks.hive.reader.HiveScanner.initReader(HiveScanner.java:194)
      at com.starrocks.hive.reader.HiveScanner.open(HiveScanner.java:214)

### Root Cause:
  Spark DataSource tables store their metadata differently from traditional Hive tables. When created with USING PARQUET/ORC/etc, Spark stores:
  - A fake SerDe (SequenceFile) in the standard Hive metastore fields
  - A fake column schema (cols)
  - The actual file format and schema in TABLE_PARAMS with keys like spark.sql.sources.provider and spark.sql.sources.schema
  
### StarRocks version (Required)
 - 3.3.11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hive Catalog fails to read Spark DataSource tables - incorrectly treats Parquet as SequenceFile #70956

Steps to reproduce the behavior (Required)

Expected behavior (Required)

Real behavior (Required)

Root Cause:

StarRocks version (Required)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hive Catalog fails to read Spark DataSource tables - incorrectly treats Parquet as SequenceFile #70956

Description

Steps to reproduce the behavior (Required)

Expected behavior (Required)

Real behavior (Required)

Root Cause:

StarRocks version (Required)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions