Skip to content

Add UNION ALL support for gpu_processing (legacy Sirius)#516

Open
ducndh wants to merge 3 commits intosirius-db:devfrom
ducndh:feature/union-all-gpu-processing
Open

Add UNION ALL support for gpu_processing (legacy Sirius)#516
ducndh wants to merge 3 commits intosirius-db:devfrom
ducndh:feature/union-all-gpu-processing

Conversation

@ducndh
Copy link
Copy Markdown
Collaborator

@ducndh ducndh commented Mar 26, 2026

gpu_processing queries with UNION ALL fall back to CPU

Implementation:

  • Added GPUPhysicalUnion operator (src/legacy/operator/gpu_physical_union.cpp) following the same pipeline construction pattern as DuckDB's PhysicalUnion — creates a union pipeline that shares the same sink as the current pipeline, builds each child into its respective pipeline, and respects order-dependent sinks.
  • Added gpu_plan_set_operation.cpp to map LogicalSetOperation → GPUPhysicalUnion in the plan generator. Only UNION ALL is supported; EXCEPT/INTERSECT throw NotImplementedException for now but you can try to build them
  • Wired up CreatePlan(LogicalSetOperation&) in GPUPhysicalPlanGenerator to route LOGICAL_UNION through the new operator.
  • Refactored GPUResultCollection from raw DataChunk* array with manual new[]/delete[] to vector<unique_ptr>, fixing a potential issue when multiple pipelines append results concurrently.
  • Fixed pipeline dependency scheduling in gpu_executor.cpp to check dependencies across all pipelines in a meta-pipeline, not just the base pipeline — without this, UNION ALL pipelines could be scheduled before their build-side dependencies completed.

ducndh and others added 2 commits March 26, 2026 02:24
Implement GPUPhysicalUnion operator that follows DuckDB's PhysicalUnion
pipeline-splitter pattern: both children feed into the same downstream
sink via CreateUnionPipeline. Also fix GPUResultCollection to use
vector<unique_ptr<DataChunk>> instead of raw array to handle multiple
Sink calls from union pipelines without heap corruption.

Repro: CALL gpu_processing('SELECT * FROM t1 UNION ALL SELECT * FROM t2')

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… meta-pipeline

The scheduler only checked the base pipeline's dependencies when deciding
scheduling order. When a hash join build-side dependency was added to a
non-base pipeline (e.g. the probe pipeline in a UNION ALL meta-pipeline),
the dependency was missed and the meta-pipeline was scheduled before its
build side completed, causing NULL pointer dereference.

Fix: check dependencies of ALL pipelines within the meta-pipeline, not
just the base pipeline.

Repro: CALL gpu_processing('WITH cte AS (...) SELECT ... FROM cte UNION ALL SELECT ... JOIN cte ...')

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ducndh ducndh requested a review from wmalpica March 26, 2026 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant