[SPARK-57459][SQL] Support nanosecond-precision timestamp types in the Avro datasource (v1 and v2) by MaxGekk · Pull Request #56825 · apache/spark

MaxGekk · 2026-06-26T20:34:19Z

What changes were proposed in this pull request?

Umbrella: SPARK-56822 (Timestamps with nanosecond precision).

This PR adds read and write support for the nanosecond-capable timestamp types TIMESTAMP_NTZ(p) and TIMESTAMP_LTZ(p) (p in 7-9) to the Avro datasource (v1 AvroFileFormat and v2 AvroTable), reaching parity with the microsecond TimestampType / TimestampNTZType, and removes the SPARK-57166 rejection guardrail.

SchemaConverters: map TimestampLTZNanosType / TimestampNTZNanosType to the Avro timestamp-nanos / local-timestamp-nanos logical types (available in the bundled Avro 1.12.1, on a long storing epoch-nanoseconds), carrying the fractional-second precision via the spark.sql.catalyst.type property. The reverse direction maps these logical types back, defaulting to nanosecond precision (9) for files written by external tools that lack the property.
AvroSerializer: pack the internal (epochMicros, nanosWithinMicro) value into a single epoch-nanoseconds Long (DateTimeUtils.timestampNanosToEpochNanos), surfacing values outside the signed-int64 epoch-nanos range (~1677-09-21 .. 2262-04-11) as a DATETIME_OVERFLOW error.
AvroDeserializer: unpack epoch-nanoseconds via floorDiv / floorMod and truncate the sub-microsecond digits to the declared precision.
AvroUtils.supportsDataType: drop the AnyTimestampNanoType rejection so the types are accepted by both the v1 and v2 write/read paths.

Like the Parquet path, nanosecond timestamps are always proleptic Gregorian and are therefore exempt from datetime rebasing.

Why are the changes needed?

To extend nanosecond-precision timestamp support (umbrella SPARK-56822) to the Avro datasource so it can read and write TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) with p in 7-9, matching the existing microsecond timestamp behavior and the Parquet/ORC nanosecond support.

Does this PR introduce any user-facing change?

Yes. With spark.sql.timestampNanosTypes.enabled=true, columns of type TIMESTAMP_NTZ(7-9) / TIMESTAMP_LTZ(7-9) can now be written to and read from Avro files. Previously such columns were rejected with UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE. This is a change within the unreleased master/branch only.

How was this patch tested?

Added tests in AvroSuite:

round-trip for precisions 7-9 for both NTZ and LTZ, across the v1 and v2 sources, including nulls and inferred-schema precision preservation;
external-reader unit-correctness: decode the written file with a plain Avro GenericDatumReader and assert the stored epoch-nanoseconds and the logical-type name;
reading a plain Avro file produced without the spark.sql.catalyst.type property (defaults to nanosecond precision);
writing an out-of-range value fails loudly with DATETIME_OVERFLOW.

Ran AvroV1Suite / AvroV2Suite (new tests pass on both) plus AvroSerdeSuite, AvroV1/V2LogicalTypeSuite, and AvroCatalystDataConversionSuite (no regressions), and sql / avro scalastyle.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 2.1, Claude Opus 4.8

…e Avro datasource (v1 and v2) ### What changes were proposed in this pull request? This PR adds read and write support for the nanosecond-capable timestamp types `TIMESTAMP_NTZ(p)` and `TIMESTAMP_LTZ(p)` (p in 7-9) to the Avro datasource (v1 `AvroFileFormat` and v2 `AvroTable`), reaching parity with the microsecond `TimestampType` / `TimestampNTZType`, and removes the SPARK-57166 rejection guardrail. - `SchemaConverters`: map `TimestampLTZNanosType` / `TimestampNTZNanosType` to the Avro `timestamp-nanos` / `local-timestamp-nanos` logical types (available in the bundled Avro 1.12.1), carrying the precision via the `spark.sql.catalyst.type` property; the reverse direction maps them back, defaulting to nanosecond precision for files written by external tools. - `AvroSerializer`: pack the internal `(epochMicros, nanosWithinMicro)` value into epoch-nanoseconds (Long), surfacing out-of-range values as `DATETIME_OVERFLOW`. - `AvroDeserializer`: unpack epoch-nanoseconds via floorDiv/floorMod and truncate the sub-microsecond digits to the declared precision. - `AvroUtils.supportsDataType`: drop the `AnyTimestampNanoType` rejection so the types are supported by both the v1 and v2 paths. Nanosecond timestamps are always proleptic Gregorian, so they are exempt from datetime rebasing, matching the Parquet path. ### Why are the changes needed? To extend nanosecond-precision timestamp support (umbrella SPARK-56822) to the Avro datasource so it can read and write `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` with p in 7-9. ### Does this PR introduce _any_ user-facing change? Yes. With `spark.sql.timestampNanosTypes.enabled=true`, columns of type `TIMESTAMP_NTZ(7-9)` / `TIMESTAMP_LTZ(7-9)` can now be written to and read from Avro files (previously rejected with `UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE`). ### How was this patch tested? Added tests in `AvroSuite` (round-trip for precisions 7-9 on v1/v2, external-reader unit-correctness, reading a plain Avro file without the catalyst-type property, and write overflow) and ran `AvroV1Suite` / `AvroV2Suite` plus the Avro serde and logical-type suites. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 2.1, Claude Opus 4.8

Introduce `DateTimeUtils.epochNanosToTimestampNanos(epochNanos, precision)` as the inverse of `timestampNanosToEpochNanos`, and use it from both the Avro deserializer and the Parquet `TimestampNanosParquetOps` converter, removing the duplicated floorDiv/floorMod + precision-truncation logic. Add unit tests for the new helper in `DateTimeUtilsSuite`.

…ared DateTimeUtils helper Consolidate the encode-with-overflow wrapper (try timestampNanosToEpochNanos, catch ArithmeticException -> DATETIME_OVERFLOW naming the sink) that was duplicated in AvroSerializer (sink="Avro") and TimestampNanosParquetOps (sink="Parquet INT64") into a single DateTimeUtils.timestampNanosToEpochNanos (value, isNtz, sink) overload, mirroring the decode-path consolidation (epochNanosToTimestampNanos). Behavior is unchanged. Co-authored-by: Isaac

…d overflow helper The prior commit hoisted the encode-with-overflow wrapper out of TimestampNanosParquetOps into DateTimeUtils.timestampNanosToEpochNanos (value, isNtz, sink); update the suite's two packing/overflow tests to call the shared helper with sink="Parquet INT64". Behavior asserted is unchanged (combine result and DATETIME_OVERFLOW condition). Co-authored-by: Isaac

MaxGekk · 2026-06-27T06:01:33Z

@uros-b @stevomitric Could you review this PR, please.

MaxGekk added 4 commits June 26, 2026 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-57459][SQL] Support nanosecond-precision timestamp types in the Avro datasource (v1 and v2)#56825

[SPARK-57459][SQL] Support nanosecond-precision timestamp types in the Avro datasource (v1 and v2)#56825
MaxGekk wants to merge 4 commits into
apache:masterfrom
MaxGekk:nanos-avro

MaxGekk commented Jun 26, 2026

Uh oh!

MaxGekk commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

MaxGekk commented Jun 26, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant