[SPARK-57457][SQL] Support nanosecond-precision timestamp types in the CSV datasource (v1 and v2) by vinodkc · Pull Request #56818 · apache/spark

vinodkc · 2026-06-26T17:26:40Z

What changes were proposed in this pull request?

This PR adds nanosecond-precision timestamp support (TIMESTAMP_NTZ(p) and TIMESTAMP_LTZ(p)) to the CSV datasource, for both the v1 (CSVFileFormat) and v2 (CSVTable) paths.

Specifically:

Parser (UnivocityParser): adds TimestampNTZNanosType and TimestampLTZNanosType cases that delegate to the existing parseWithoutTimeZoneNanos / parseNanos formatter methods.
Generator (UnivocityGenerator): adds the corresponding write-path cases that delegate to formatWithoutTimeZoneNanos / formatNanos.

Why are the changes needed?

CSV rejected nanos timestamp types in its datasource capability checks and lacked the conversions to round-trip them, so these columns could not be written or read through CSV.

Does this PR introduce any user-facing change?

Yes. Users can write and read TimestampNTZNanosType(p) / TimestampLTZNanosType(p) (p in 7..9) with CSV

How was this patch tested?

CsvFunctionsSuite — updated the existing from_csv nanosecond timestamp test: the test now asserts successful parsing and correct truncated value rather than expecting an UNSUPPORTED_DATATYPE exception.
FileBasedDataSourceSuite — new end-to-end round-trip test covering both v1 and v2 source paths, precisions (7–9), and both TimestampNTZNanosType and TimestampLTZNanosType, verifying that a DataFrame written to CSV and read back with a matching schema produces identical results.

Was this patch authored or co-authored using generative AI tooling?

Yes, Generated-by: Claude Code (Sonnet 4.6) was used to assist with this patch.

…1 and v2)

MaxGekk

0 blocking, 0 non-blocking, 0 nits. LGTM — a minimal, faithful extension of the CSV datasource to nanosecond TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) (p ∈ 7..9), mirroring the micro-precision converters and dropping the capability gate on both v1 (CSVFileFormat) and v2 (CSVTable).

Verification

The read/write paths are symmetric (parseWithoutTimeZoneNanos/parseNanos ↔ formatWithoutTimeZoneNanos/formatNanos), all four formatter methods exist with matching signatures over TimestampNanosVal, and both capability gates are updated (the prior "not supported" test correctly drops CSV). Two deliberate behaviors I checked and cleared: the nanos LTZ case omits the micro TimestampType legacy-parse fallback (appropriate — nanos has no Spark 1.x/2.0 legacy data, and micro TimestampNTZType has none either), and the 3-digit default timestamp format truncates sub-millisecond digits for all timestamp types (pre-existing, not a nanos regression; tests use explicit formats). Coverage is solid: FileBasedDataSourceSuite round-trips v1+v2 × NTZ/LTZ × precisions 7–9 (write → read-back → checkAnswer), and the updated from_csv test validates the read path against an independently-authored string with precision truncation.

@HyukjinKwon — could you take a look as well, since you've been close to the CSV/Univocity datasource paths?

MaxGekk · 2026-06-27T09:54:47Z

+1, LGTM. Merging to master/4.x.
Thank you, @vinodkc and @dongjoon-hyun @HyukjinKwon for review.

…e CSV datasource (v1 and v2) ### What changes were proposed in this pull request? This PR adds nanosecond-precision timestamp support (`TIMESTAMP_NTZ(p)` and `TIMESTAMP_LTZ(p)`) to the `CSV` datasource, for both the v1 (`CSVFileFormat`) and v2 (`CSVTable`) paths. Specifically: - Parser (`UnivocityParser`): adds `TimestampNTZNanosType` and `TimestampLTZNanosType` cases that delegate to the existing `parseWithoutTimeZoneNanos` / `parseNanos` formatter methods. - Generator (`UnivocityGenerator`): adds the corresponding write-path cases that delegate to `formatWithoutTimeZoneNanos` / f`ormatNanos`. ### Why are the changes needed? `CSV` rejected nanos timestamp types in its datasource capability checks and lacked the conversions to round-trip them, so these columns could not be written or read through CSV. ### Does this PR introduce _any_ user-facing change? Yes. Users can write and read `TimestampNTZNanosType(p)` / `TimestampLTZNanosType(p)` (p in 7..9) with CSV ### How was this patch tested? - `CsvFunctionsSuite` — updated the existing from_csv nanosecond timestamp test: the test now asserts successful parsing and correct truncated value rather than expecting an UNSUPPORTED_DATATYPE exception. - `FileBasedDataSourceSuite` — new end-to-end round-trip test covering both v1 and v2 source paths, precisions (7–9), and both TimestampNTZNanosType and TimestampLTZNanosType, verifying that a DataFrame written to CSV and read back with a matching schema produces identical results. ### Was this patch authored or co-authored using generative AI tooling? Yes, Generated-by: Claude Code (Sonnet 4.6) was used to assist with this patch. Closes #56818 from vinodkc/spark-57457-nanosecond-csv. Authored-by: Vinod KC <vinod.kc.in@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit ab78cb5) Signed-off-by: Max Gekk <max.gekk@gmail.com>

Support nanosecond-precision timestamp types in the CSV datasource (v…

504b061

…1 and v2)

MaxGekk approved these changes Jun 26, 2026

View reviewed changes

dongjoon-hyun approved these changes Jun 26, 2026

View reviewed changes

HyukjinKwon approved these changes Jun 27, 2026

View reviewed changes

MaxGekk closed this in ab78cb5 Jun 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-57457][SQL] Support nanosecond-precision timestamp types in the CSV datasource (v1 and v2)#56818

[SPARK-57457][SQL] Support nanosecond-precision timestamp types in the CSV datasource (v1 and v2)#56818
vinodkc wants to merge 1 commit into
apache:masterfrom
vinodkc:spark-57457-nanosecond-csv

vinodkc commented Jun 26, 2026

Uh oh!

MaxGekk left a comment

Uh oh!

MaxGekk commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

vinodkc commented Jun 26, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Verification

Uh oh!

MaxGekk commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants