Skip to content

feat: Transparent GPU execution via optimizer extension#518

Open
mbrobbel wants to merge 17 commits intosirius-db:devfrom
mbrobbel:sirius-optimizer
Open

feat: Transparent GPU execution via optimizer extension#518
mbrobbel wants to merge 17 commits intosirius-db:devfrom
mbrobbel:sirius-optimizer

Conversation

@mbrobbel
Copy link
Copy Markdown
Member

Summary

  • Adds transparent GPU execution so users can run plain SQL without wrapping queries in CALL gpu_execution('...')
  • Uses two DuckDB extension hooks (no DuckDB core modifications):
    • OptimizerExtension (post-optimization): captures optimized logical plan via Copy() when the query uses only GPU-supported operators
    • OnFinalizePrepare: generates Sirius physical plan from the captured copy and replaces DuckDB's CPU physical plan with a custom PhysicalSiriusExecution source operator
  • Queries with unsupported operators (WINDOW, UNNEST, etc.) silently fall back to CPU
  • Controlled via SET sirius_transparent_execution = true/false (default: true)

Usage

LOAD 'sirius.duckdb_extension';
-- With ~/.sirius/sirius.cfg present, GPU execution is automatic:
SELECT * FROM lineitem WHERE l_quantity > 10 ORDER BY l_extendedprice LIMIT 100;

-- Disable per-session:
SET sirius_transparent_execution = false;

New files

File Purpose
src/include/transparent/sirius_optimizer_extension.hpp + .cpp Post-optimization hook + is_acceleratable_query()
src/include/transparent/physical_sirius_execution.hpp + .cpp Custom PhysicalOperator (EXTENSION type) wrapping Sirius GPU engine
test/cpp/integration/test_transparent_execution.cpp Integration tests: filter, aggregation, join, top-N, fallback, disable

Test plan

  • Run integration tests on GPU machine: sirius_unittest "[transparent]"
  • Run TPC-H queries with transparent execution enabled (plain SQL, no gpu_execution() wrapper)
  • Verify fallback: queries with WINDOW functions execute on CPU without error
  • Verify SET sirius_transparent_execution = false disables GPU interception
  • Regression: existing gpu_execution() tests still pass

🤖 Generated with Claude Code

mbrobbel and others added 5 commits March 30, 2026 14:19
Users can now run plain SQL that automatically executes on the GPU,
eliminating the need to wrap queries in CALL gpu_execution('...').

The implementation uses two DuckDB extension hooks:
- OptimizerExtension (post-optimization): captures a copy of the
  optimized logical plan when the query uses only GPU-supported operators
- OnFinalizePrepare: generates a Sirius physical plan from the captured
  copy and replaces DuckDB's CPU physical plan with a custom
  PhysicalSiriusExecution source operator

Queries with unsupported operators (WINDOW, UNNEST, etc.) silently fall
back to CPU execution. Controlled by SET sirius_transparent_execution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three key fixes to make transparent execution work end-to-end:

1. Override CanRequestRebind() to return true — DuckDB only calls
   OnFinalizePrepare when at least one registered state can request
   rebind. Without this, our hook was never invoked.

2. Add pre_optimize_function to disable IN_CLAUSE and
   COMPRESSED_MATERIALIZATION optimizers before DuckDB's built-in
   optimizers run. These produce internal functions Sirius can't handle.
   The post-optimize hook re-enables them to avoid leaking state.

3. Return a real LocalSourceState instead of nullptr to avoid null
   pointer dereference in DuckDB's PipelineExecutor.

Also updates tests/scripts to use transparent execution:
- test_gpu_execution_tpch.cpp: uses SET sirius_transparent_execution
  instead of CALL gpu_execution() wrapper
- run_tpch_parquet.sh: both engines use orig/ query directory
- run_tpcds_super.sh: plain SQL instead of gpu_execution() wrapper
- performance_test.py: SET-based toggle instead of wrapping function

Validated on RTX PRO 6000 — filter, projection, aggregation,
ORDER BY queries all execute on GPU transparently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The sorted comparison pass was re-running the query wrapped in a
subquery (SELECT * FROM (...) t ORDER BY), which itself went through
transparent GPU execution and could fail for complex plans. Instead,
collect rows from the already-materialized GPU and CPU results, sort
them in C++, and compare directly. This avoids re-running queries
entirely and is also simpler.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Convert all remaining inline CALL gpu_execution() and
SELECT * FROM gpu_execution() usages to use compare_gpu_vs_cpu()
which uses transparent execution. This includes:
- order by multipartition parquet
- order by with decimal column (duckdb + parquet)
- order by with varchar column (duckdb + parquet)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The pre-optimization plan shape can differ from the post-optimization
shape (e.g. subqueries get flattened), so is_acceleratable_query() may
not match the same queries in both hooks. Unconditionally disable
IN_CLAUSE and COMPRESSED_MATERIALIZATION in the pre-hook; the post-hook
re-enables them regardless. This fixes TPC-H Q2 which has subqueries
that transform during optimization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mbrobbel mbrobbel marked this pull request as ready for review March 31, 2026 20:06
mbrobbel and others added 2 commits March 31, 2026 22:18
…urce of truth

The duplicated operator allow-list in is_acceleratable_query() was
fragile — it could easily get out of sync with create_plan(). Instead,
always Copy() the plan in the optimizer hook and let create_plan() in
OnFinalizePrepare determine GPU support. If create_plan() throws
NotImplementedException, we silently fall back to CPU.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mbrobbel mbrobbel marked this pull request as draft March 31, 2026 20:24
- README: Update intro to describe transparent execution as the primary
  usage mode, add usage example with plain SQL
- execution-flow: Add Step 1 (optimizer extension hooks + OnFinalizePrepare)
  and Step 2 (PhysicalSiriusExecution), move explicit gpu_execution()
  path to Step 1b (legacy)
- configuration: Add sirius_transparent_execution SET variable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mbrobbel mbrobbel marked this pull request as ready for review April 1, 2026 13:13
@mbrobbel mbrobbel added duckdb Work related to DuckDB labels Apr 2, 2026
@mbrobbel mbrobbel requested a review from bwyogatama April 2, 2026 12:34
// Disable optimizers that produce DuckDB-internal functions Sirius can't handle.
// The post-hook re-enables them so non-GPU queries aren't affected.
auto& disabled = duckdb::DBConfig::GetConfig(context).options.disabled_optimizers;
disabled.insert(duckdb::OptimizerType::IN_CLAUSE);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are IN_CLAUSE and COMPRESSED_MATERIALIZATION treated seperately from the ones in the config. Or are those disabled optimizers separate from the ones we are disabling for duckdb?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is duckdb specific optimization that doesn't apply to sirius. E.g. Compressed Materialization is to store intermediate result in a compressed duckdb format

// Disable fallback so GPU errors are not silently hidden
con->Query("SET enable_duckdb_fallback = false;");
// Enable transparent GPU execution
con->Query("SET sirius_transparent_execution = true;");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not still need to disable the duckdb fallback so it doesnt fall back to duckdb on failure?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can enable the fallback again now that GTC has passed?

if (float_tolerance.has_value()) {
// Check if this looks like a float value for tolerance comparison
try {
double gpu_d = std::stod(gpu_rows[r][c]);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if its a decimal value? we want to compare decimals exactly. could we not do something like use the duckdb to cudf converter than then compare tables?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it shoudl be chekcing the types

@bwyogatama
Copy link
Copy Markdown
Collaborator

Okay this looks good to me based on what i see i guess one thing is that let's name it set gpu_execution instead of sirius_transparent_execution

@bwyogatama
Copy link
Copy Markdown
Collaborator

One other thing that I am thinking maybe we should start taking note is I believe the config that we modified here is global and affect all connection. For example, if i set sirius_transparent_execution to true, it will set it to true in all the connections. I guess it's fine because we assume there is only one connection right now but still worth noting

mbrobbel and others added 6 commits April 8, 2026 09:12
Addresses PR sirius-db#518 review feedback to use a shorter, clearer name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ison

Check the result's column LogicalType to determine if float tolerance
should be applied, rather than attempting stod() on every value.
Decimals and other numeric types are now compared exactly via string
equality.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Override GetDataInternal instead of GetData (GetData is no longer
  virtual in PhysicalOperator)
- Use OptimizerExtension::Register() instead of directly pushing to
  config.optimizer_extensions (field moved to ExtensionCallbackManager)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mbrobbel added a commit to mbrobbel/sirius that referenced this pull request Apr 9, 2026
Decouple sirius_pipeline, sirius_meta_pipeline, and
sirius_pipeline_converter from sirius_engine so that pipeline
construction can happen at plan/bind time without an engine instance.

Introduce pipeline_build_context — a lightweight struct carrying only
the plan-time parameters that pipeline construction needs (currently
just preserve_insertion_order). This replaces the sirius_engine&
reference that was previously threaded through the entire pipeline
build chain.

Key changes:
- sirius_pipeline: constructor takes pipeline_build_context& instead
  of sirius_engine&. Runtime ClientContext set via set_client_context()
  before execution.
- sirius_meta_pipeline: takes pipeline_build_context& instead of
  sirius_engine&. get_engine() replaced with get_build_context().
- sirius_pipeline_converter: takes pipeline_build_context& for plan-time
  work. construct_sirius_specific_operator extracted as a free function.
  wire_data_repositories takes sirius_engine& as parameter (only
  remaining runtime dependency).
- sirius_engine::create_child_pipeline removed (logic inlined into
  sirius_pipeline_build_state::create_child_pipeline).
- sirius_engine::initialize_internal creates the pipeline_build_context
  and sets client_context on all pipelines after conversion.

This is Phase 1 of moving planning from initialize_internal() to
the optimizer stage (issue sirius-db#545, PRs sirius-db#518, sirius-db#529).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mbrobbel added a commit to mbrobbel/sirius that referenced this pull request Apr 9, 2026
Decouple sirius_pipeline, sirius_meta_pipeline, and
sirius_pipeline_converter from sirius_engine so that pipeline
construction can happen at plan/bind time without an engine instance.

Introduce pipeline_build_context — a lightweight struct carrying only
the plan-time parameters that pipeline construction needs (currently
just preserve_insertion_order). This replaces the sirius_engine&
reference that was previously threaded through the entire pipeline
build chain.

Key changes:
- sirius_pipeline: constructor takes pipeline_build_context& instead
  of sirius_engine&. Runtime ClientContext set via set_client_context()
  before execution.
- sirius_meta_pipeline: takes pipeline_build_context& instead of
  sirius_engine&. get_engine() replaced with get_build_context().
- sirius_pipeline_converter: takes pipeline_build_context& for plan-time
  work. construct_sirius_specific_operator extracted as a free function.
  wire_data_repositories takes sirius_engine& as parameter (only
  remaining runtime dependency).
- sirius_engine::create_child_pipeline removed (logic inlined into
  sirius_pipeline_build_state::create_child_pipeline).
- sirius_engine::initialize_internal creates the pipeline_build_context
  and sets client_context on all pipelines after conversion.

This is Phase 1 of moving planning from initialize_internal() to
the optimizer stage (issue sirius-db#545, PRs sirius-db#518, sirius-db#529).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mbrobbel and others added 2 commits April 9, 2026 21:49
…te multi-format tests

The transparent execution optimizer pre-hook was missing
STATISTICS_PROPAGATION in the disabled optimizer set. This optimizer can
fold ungrouped aggregates into EXPRESSION_GET + DUMMY_SCAN plans with
COLUMN_DATA_SCAN sources that the GPU pipeline cannot schedule, causing
the test suite to hang.

Also update test_gpu_execution_multi_format.cpp to use the transparent
execution pattern (SET gpu_execution = true/false, in-memory result
comparison) instead of the old CALL gpu_execution() + re-query approach.
With ENABLE_GPU_EXECUTION defaulting to true, the old "CPU baseline"
queries were silently routing through the transparent GPU path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mbrobbel mbrobbel requested a review from felipeblazing April 14, 2026 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

duckdb Work related to DuckDB

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants