Skip to content

Conversation

@mattcuento
Copy link
Contributor

@mattcuento mattcuento commented Jan 2, 2026

Which issue does this PR close?

Closes #1358.
Part of #32.

Rationale for this change

As part of this epic, adding server-side support for Substrait plans (to the Scheduler ExecuteQuery API). Substrait support is gated by the substrait feature with conditional depdencies for datafusion-substrait and therefore substrait. This feature is on by default.

The sql field was marked as deprecated from the Ballista protocol two years back here. This is clean up of the field. The field ID and name have been marked as reserved to prevent corrupted requests if client/server versions drift across this change.

What changes are included in this PR?

  • Bumped base image to rust:1.92-trixie in order to pull an updated protobuf-compiler version that supports optional fields for proto3 by default.

Are there any user-facing changes?

@milenkovicm
Copy link
Contributor

thanks @mattcuento, would you be interested in adding substrait as well as part of this PR?

@mattcuento
Copy link
Contributor Author

thanks @mattcuento, would you be interested in adding substrait as well as part of this PR?

Sure thing!

@mattcuento mattcuento force-pushed the remove-query-type-sql-1358 branch 3 times, most recently from 7119f99 to 887096b Compare January 4, 2026 02:58
@mattcuento mattcuento changed the title chore: Remove deprecated Sql field from ExecuteQueryParams.Query feat: support substrait_plan and remove deprecated sql field from ExecuteQueryParams.Query Jan 4, 2026
@mattcuento
Copy link
Contributor Author

Hey @milenkovicm, I added support for Substrait on ExecuteQuery. My assumption from the epic is that this is strictly intended to be a server-side change for use with different front-ends, and that we won't be changing the client to serialize into Substrait (sounds like that would be an unnecessary extra serialization step on top of LogicalPlan).

Assuming that's true, it was a little tricky to test this well aside from calling the scheduler RPC service directly (test_substrait_compatibility). I'm brand new to the project, so if you have any opinions/advice on where or how to better test this do let me know!

@mattcuento mattcuento mentioned this pull request Jan 4, 2026
3 tasks
Copy link
Contributor

@milenkovicm milenkovicm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think change make sense, and test as well, thanks @mattcuento

there is one issue with docker build and looks like issue with disk space failing other (not sure how to fix)

@milenkovicm
Copy link
Contributor

Maybe as a follow up we should put a bit more documentation around this and example(s)

@milenkovicm
Copy link
Contributor

Also, could we gate substrait with config option, which could be on by default?
Users not needing it could disable it at compile time.

@mattcuento mattcuento force-pushed the remove-query-type-sql-1358 branch 2 times, most recently from 1816b45 to f75d42e Compare January 5, 2026 01:55
@mattcuento
Copy link
Contributor Author

mattcuento commented Jan 5, 2026

there is one issue with docker build and looks like issue with disk space failing other (not sure how to fix)

Thanks, looks like substrait doesn't run a high enough version of protoc to to support optional fields by default.

error: failed to run custom build command for `substrait v0.62.2`
note: To improve backtraces for build dependencies, set the CARGO_PROFILE_RELEASE_BUILD_OVERRIDE_DEBUG=true environment variable to enable debug information generation.

Caused by:
  process didn't exit successfully: `/home/builder/workspace/target/release/build/substrait-339db7bdcba362ae/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-env-changed=FORCE_REBUILD
  cargo:rerun-if-changed=substrait
  cargo:rerun-if-changed=substrait/text/dialect_schema.yaml
  cargo:rerun-if-changed=substrait/text/simple_extensions_schema.yaml
  cargo:rerun-if-changed=substrait/proto/substrait/plan.proto
  cargo:rerun-if-changed=substrait/proto/substrait/extensions/extensions.proto
  cargo:rerun-if-changed=substrait/proto/substrait/type.proto
  cargo:rerun-if-changed=substrait/proto/substrait/parameterized_types.proto
  cargo:rerun-if-changed=substrait/proto/substrait/algebra.proto
  cargo:rerun-if-changed=substrait/proto/substrait/extended_expression.proto
  cargo:rerun-if-changed=substrait/proto/substrait/capabilities.proto
  cargo:rerun-if-changed=substrait/proto/substrait/function.proto
  cargo:rerun-if-changed=substrait/proto/substrait/type_expressions.proto

  --- stderr
  Error: Custom { kind: Other, error: "protoc failed: substrait/type.proto: This file contains proto3 optional fields, but --experimental_allow_proto3_optional was not set.\n" }

From this issue I gathered that we could compile protoc ourselves for the build, and that's what I have now for the latest commit. However, looks like a few distributions don't ship with or download cmake for compilation. I'm curious, @milenkovicm, have any advice here? Should I just add cmake in the necessary Dockerfiles?

Maybe as a follow up we should put a bit more documentation around this and example(s)

Agreed, happy to file an issue to track it. I'm still kind of curious if more changes would be desired for user ergonomics. Do we have existing examples of connecting to a scheduler besides the client? I'd imagine it might be useful to add some convenience/wrapper methods to create scheduler gRPC clients, such as combining create_grpc_client_connection + SchedulerGrpcClient::new(connection) like in distributed_query.rs:

    info!("Connecting to Ballista scheduler at {scheduler_url}");
    // TODO reuse the scheduler to avoid connecting to the Ballista scheduler again and again
    let connection = create_grpc_client_connection(scheduler_url, &grpc_config)
        .await
        .map_err(|e| DataFusionError::Execution(format!("{e:?}")))?;

    let mut scheduler = SchedulerGrpcClient::new(connection)
        .max_encoding_message_size(max_message_size)
        .max_decoding_message_size(max_message_size);

I don't know much about Ibis, but will take a look to see how it would integrate with this.

Also, could we gate substrait with config option, which could be on by default?
Users not needing it could disable it at compile time.

Done! Updated the PR description, added support/conditional dependencies under substrait

Copy link
Contributor

@milenkovicm milenkovicm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mattcuento changes make sense, we just need to fix build issue and we can merge this

@mattcuento mattcuento force-pushed the remove-query-type-sql-1358 branch 2 times, most recently from c68a693 to ed838e9 Compare January 5, 2026 16:46
@milenkovicm
Copy link
Contributor

not sure why it does not start github actions, maybe you should rebase

@mattcuento mattcuento force-pushed the remove-query-type-sql-1358 branch 2 times, most recently from 1697215 to b04b37b Compare January 5, 2026 17:01
@mattcuento mattcuento marked this pull request as ready for review January 5, 2026 17:07
@mattcuento
Copy link
Contributor Author

Will review the latest test linux balista/crates failures this evening

@milenkovicm
Copy link
Contributor

perhaps, action getting out of disk space ? looks like linker freaking out, which should not be related to your change

@mattcuento mattcuento force-pushed the remove-query-type-sql-1358 branch from b04b37b to 12c2c61 Compare January 6, 2026 02:19
@mattcuento
Copy link
Contributor Author

mattcuento commented Jan 6, 2026

Ok it does seem that the issues for test linux crates and test linux balista are to do with runner disk space being full.

You can monitor disk space with df -h in a workflow step.

I've tried a bunch of different tricks to free space (actions/runner-images#2840 for example) but no luck. We run tests in a container so we don't get access to the root file system to clean much up. We could alternatively run tests outside of a container which allows us more freedom to delete things.

This is what space looks like prior to building/running tests:

Filesystem      Size  Used Avail Use% Mounted on
overlay          72G   56G   16G  78% /
tmpfs            64M     0   64M   0% /dev
shm              64M     0   64M   0% /dev/shm
/dev/root        72G   56G   16G  78% /__t
tmpfs           3.2G  1.2M  3.2G   1% /run/docker.sock
tmpfs           7.9G     0  7.9G   0% /proc/acpi
tmpfs           7.9G     0  7.9G   0% /proc/scsi
tmpfs           7.9G     0  7.9G   0% /sys/firmware

I also don't know what kind of plan apache repos are on. Could we potentially just use a larger runner class?

@mattcuento mattcuento force-pushed the remove-query-type-sql-1358 branch from 12c2c61 to c6b02f8 Compare January 6, 2026 03:33
@mattcuento mattcuento force-pushed the remove-query-type-sql-1358 branch 6 times, most recently from 695ff88 to 3879489 Compare January 6, 2026 05:44
@mattcuento
Copy link
Contributor Author

Looks like you can reclaim quite a lot of space when running outside containers
image
this was just for test linux balista and test linux crates

@milenkovicm
Copy link
Contributor

would cargo clean before action make any difference?

@milenkovicm
Copy link
Contributor

maybe @Huy1Ng has idea how to fix issue with build, he did a great job reorganising github actions in #1289

@milenkovicm
Copy link
Contributor

milenkovicm@b60d245 adding cargo clean may help

@mattcuento mattcuento force-pushed the remove-query-type-sql-1358 branch 2 times, most recently from c6b02f8 to 8706cd3 Compare January 6, 2026 15:04
@Huy1Ng
Copy link
Contributor

Huy1Ng commented Jan 6, 2026

I couldn't replicate the error on my fork, but add this to the CI section in Cargo.toml

[profile.ci]
inherits = "dev"
incremental = false
debug = false
debug-assertions = false
strip = "debuginfo"

reduce the ./target directory size by ~10%. I'm not sure how cargo clean would help as we should start with empty ./target for each job (or do we?!)

@milenkovicm
Copy link
Contributor

I have tested with cargo clean in https://github.com/milenkovicm/datafusion-ballista/tree/check__remove-query-type-sql-1358 it worked there, i'm not sure if it is accident.

@mattcuento mattcuento force-pushed the remove-query-type-sql-1358 branch from 72a8639 to 72b3db6 Compare January 6, 2026 15:44
@mattcuento
Copy link
Contributor Author

Interesting, @milenkovicm, I tried cargo clean prior to tests yesterday while checking disk usage and we didn't seem to reclaim anything.

However, I did just try @Huy1Ng suggestion for the CI profile and that seemed to work. I've cleaned up some commit history now, hopefully should be ready to merge.

Thank you both for the suggestions here 🙂

Copy link
Contributor

@milenkovicm milenkovicm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mattcuento
one minor comment

.sql(&sql)
.await
.and_then(|df| df.into_optimized_plan())
Query::SubstraitPlan(bytes) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: unused variable: `bytes`
   --> ballista/scheduler/src/scheduler_server/grpc.rs:401:38
    |
401 |                 Query::SubstraitPlan(bytes) => {
    |                                      ^^^^^ help: if this is intentional, prefix it with an underscore: `_bytes`
    |
    = note: `#[warn(unused_variables)]` (part of `#[warn(unused)]`) on by default

if substrait disabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed!

Linting + using non-reserved field id

Adding test for substrait compatibility with local scheduler instance

add inline expressio to substrait test

protect substrait support under feature

Adding protoc feature to datafusion-substrait package

Removing protoc feature

Bump rust base version

Rebuild Cargo.lock

spacing

spacing

Add cargo clean before test

remove cargo clean

Fix linting for feature off
@mattcuento mattcuento force-pushed the remove-query-type-sql-1358 branch from 72b3db6 to c66b160 Compare January 6, 2026 18:45
@milenkovicm milenkovicm changed the title feat: support substrait_plan and remove deprecated sql field from ExecuteQueryParams.Query feat: scheduler supports substrait logical and remove deprecated sql support Jan 6, 2026
@milenkovicm milenkovicm changed the title feat: scheduler supports substrait logical and remove deprecated sql support feat: Scheduler supports substrait logical and remove deprecated sql support Jan 6, 2026
@milenkovicm milenkovicm changed the title feat: Scheduler supports substrait logical and remove deprecated sql support feat: Scheduler supports substrait logical plan and remove deprecated sql support Jan 6, 2026
@milenkovicm
Copy link
Contributor

great effort, thanks a lot @mattcuento
I hope you will follow up with docs and examples

@milenkovicm milenkovicm merged commit 60182c8 into apache:main Jan 6, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove query type sql from ballista protocol

3 participants