Skip to content

Comments

[SPARK-54759][SQL] SQL scripting Cursor support#53530

Closed
srielau wants to merge 88 commits intoapache:masterfrom
srielau:cursors
Closed

[SPARK-54759][SQL] SQL scripting Cursor support#53530
srielau wants to merge 88 commits intoapache:masterfrom
srielau:cursors

Conversation

@srielau
Copy link
Contributor

@srielau srielau commented Dec 18, 2025

What changes were proposed in this pull request?

in this PR we propose to add SQL Standard CURSOR support to SQL Scripting.
This includes:

  • DECLARE cursor AS query
  • OPEN cursor USING expressions
  • FETCH cursor INTO variables
  • CLOSE cursor

Here is the draft spec:
https://docs.google.com/document/d/1EbNtBXmC3p-myidiwHYkwCBTy4swiksxN6NZ1iX3S-I/edit?usp=sharing

Why are the changes needed?

Cursors are commonly used in traditional DBMS. Sometimes teh lofic can be replaces with relational queries or a FOR loop.
But often it cannot. This makes migration to Spark hard.
Note that cursors are just teh SQL equivalent of iterating over a dataframe. A common use case in other languages.

Does this PR introduce any user-facing change?

Yes: https://docs.google.com/document/d/1EbNtBXmC3p-myidiwHYkwCBTy4swiksxN6NZ1iX3S-I/edit?usp=sharing

How was this patch tested?

New tests have been written in cursors.sql

Was this patch authored or co-authored using generative AI tooling?

Generated by Claude Sonnet using Cursor.

Implements DECLARE CURSOR, OPEN, FETCH, and CLOSE statements for SQL scripting.

Major changes:
- Added grammar rules for cursor statements (DECLARE CURSOR, OPEN, FETCH, CLOSE)
- Implemented logical plan nodes (DeclareCursor, OpenCursor, FetchCursor, CloseCursor)
- Created physical execution nodes (*CursorExec) for cursor operations
- Extended SqlScriptingExecutionContext to manage cursor state (declaration, open/close status, result sets)
- Added ResolveFetchCursor analyzer rule to resolve variable references in FETCH statements
- Updated ResolveReferences to skip FetchCursor nodes (let ResolveFetchCursor handle them)
- Added cursor declaration order validation to CompoundBodyParsingContext
- Enabled CONTINUE exception handlers by default (required for cursor iteration)
- Added error conditions: CURSOR_NO_MORE_ROWS (SQLSTATE 02000) for NOT FOUND handling
- Cursors are automatically closed when exiting their declaring scope

Known limitations:
- CONTINUE handler interaction with FETCH in iterative contexts (REPEAT loops) needs further investigation
- Basic cursor operations (single FETCH without exception handling) work correctly

Testing:
- Compiles cleanly (catalyst and sql modules)
- Grammar changes validated
- Simple cursor test case works (DECLARE, OPEN, FETCH, CLOSE, VALUES)
@srielau srielau marked this pull request as draft December 18, 2025 16:08
@github-actions github-actions bot added the SQL label Dec 18, 2025
@github-actions github-actions bot added the DOCS label Dec 20, 2025
@srielau srielau changed the title [SPARK-54759][SQL][WIP] Cursor support [SPARK-54759][SQL] SQL scripting Cursor support Dec 20, 2025
@srielau srielau marked this pull request as ready for review December 20, 2025 19:43
…all passing)

- Created SqlScriptingCursorSuite.scala with 16 comprehensive cursor tests
- All tests passing, covering core functionality:
  * Feature toggle (enabled/disabled)
  * Cursor lifecycle (DECLARE, OPEN, FETCH, CLOSE)
  * State transitions and validation
  * Parameterized cursors with USING clause
  * FETCH INTO STRUCT variables
  * Exception handling (NOT FOUND, EXIT HANDLER, CONTINUE HANDLER)
  * Cross-frame cursor access (handler accessing outer cursor)
  * Cursor iteration with REPEAT loops
  * Declaration order validation

- Remaining: 70 tests to migrate from cursors.sql (86 total)
- See CURSOR_TEST_MIGRATION_TODO.md for detailed migration plan
…, fix test structure

- Tests 27-31 were failing with INVALID_SET_SYNTAX because SET VAR statements
  were inside QUERY-DELIMITER blocks
- Fixed test structure: moved variable declarations outside delimiters
- Changed from 'SET VAR' to 'DECLARE' for session variable initialization
- Tests should now properly test FETCH INTO with session variables
…e fix

Tests 27-31 now show UNRESOLVED_VARIABLE errors, revealing that
accessing session-level DECLARE variables from within scripts is
not currently supported. These tests document the current limitation.
…olden file

Summary:
- 81/86 tests passing correctly
- 5 tests (27-31) document known limitation: session variable access from scripts
- No INTERNAL_ERROR messages found
- All cursor behaviors validated:
  * CURSOR_NO_MORE_ROWS as completion condition (SQL standard compliant)
  * Cross-frame cursor access working
  * Cursor shadowing working
  * Declaration order enforcement working
  * All error messages using toSQLId() for identifier quoting

Golden file is validated and ready for use as source of truth for Scala test suite.
@srielau
Copy link
Contributor Author

srielau commented Jan 16, 2026

should we add an overload of def getCursorState that accepts a CursorReference? This code snippet repeats 3 times.
The debate between sql scripts with goldenfiles and scala unittests is very much philosophical.
Personbally I find a sql.out file much more readable than a scala file.
More importantly though I have very much hostile to unit-tests which only tests certain phases (e.g. parsing).
They break due to design changes, and they do not test outcome. And outcomes are what matters.

Golden files, of course have teh downside risk, that verytime they are re-generated teh user has to actually look at the diff.

If there are strong feelings to convert the SQL scripts to Scala, I can certainly do that. Curiously, in another PR I was asked to do the inverse....

Test suite has been turned into Scala.

Some further reflection: Turns out Cursor is terribly bad (or my prompts) about preventing drift when regenerating golden files.
So this exercise did in fact flush out bugs.

@miland-db
Copy link
Contributor

Change that should unblock failing tests after removing returnCurrentWithoutAdvancing is implemented in this PR #53894. After it is merged, please rebase this changes onto it.

cc @srielau @cloud-fan

@miland-db
Copy link
Contributor

miland-db commented Jan 27, 2026

Thanks for incorporating the change from #53894, but let's please merge that PR independently from this one. They clearly do 2 different things, and the "fix" PR is self contained and has its purpose. It will be easier to debug and understand changes later if we merge it one by one.

@srielau
Copy link
Contributor Author

srielau commented Jan 27, 2026

Thanks for incorporating the change from #53894, but let's please merge that PR independently from this one. They clearly do 2 different things, and the "fix" PR is self contained and has its purpose. It will be easier to debug and understand changes later if we merge it one by one.

Yes, I'm awaiting this PR's merge. At that point the next merge to master shoudl cancel out teh changes.

Copy link
Contributor

@miland-db miland-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for integration and changes in SQL Scripting.

@gengliangwang
Copy link
Member

Thanks, merging to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants