Skip to content

[Bug]: Beam's boot.go has restriction on Beam SDK whl format, which breaks precommit test #37670

@tvalentyn

Description

@tvalentyn

What happened?

Some test suites are currently failing with errors like:

RuntimeError: Pipeline construction environment and pipeline runtime environment are not compatible. If you use a custom container image, check that the Python interpreter minor version and the Apache Beam version in your image match the versions used at pipeline construction time. Submission environment: beam:version:sdk_base:apache/beam_python3.13_sdk:2.73.0.dev. Runtime environment: beam:version:sdk_base:apache/beam_python3.13_sdk:2.72.0.dev.

#37665 (comment)

Rootcause is that py3.13 wheel format apache_beam-2.73.0.dev0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl doesn't match expected format:

wheelName := fmt.Sprintf("cp%s-cp%s-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", pyVersion, pyVersion)

so the staged whl gets ignored, hence we have a mismatch between SDK's version at submission (2.73.0.dev) and at runtime (2.72.0.dev).

The logic in boot.go was influenced by the old assumptions that:

  • SDK is always staged to Dataflow worker (still the case for tests, no longer the case for prod)
  • When SDK package is staged its name must be predetermined, like
    sdkSrcFile = "dataflow_python_sdk.tar"
    .

To fix, we should not assume that the package name is predetermined or do any sort of wheel name validation: we should trust that the wheel name passed via --sdk_location will be installable.

We need to fix the go code in /sdks/python/container, then release a new beam-master container to fix this.

Note that internal Dataflow tests still pass SDK as a source tarball, and source artifact will still be renamed by the stager:

setup_options.sdk_location, names.STAGED_SDK_SOURCES_FILENAME)

We can either leave the logic for staging sources as is, or also get rid of renaming in the SDK and then always use --sdk_location artifact filename name in boot.go.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions