Skip to content

Three issues found in infra/aws/aws-ecr-s3-sagemaker.yaml CloudFormation template #4670

@adarshsingh7470

Description

@adarshsingh7470

Contact Details [Optional]

Available on GitHub for follow-up comments

System Information

zenml info -a -s not applicable — this bug is in the CloudFormation
infrastructure template, not in the ZenML Python package itself.

File affected: infra/aws/aws-ecr-s3-sagemaker.yaml
Branch: main

What happened?

While reviewing the CloudFormation template at infra/aws/aws-ecr-s3-sagemaker.yaml,
I found 3 bugs that cause silent failures or broken deployments.


Bug 1 — ECR Repository tag uses !Sub instead of !Ref

Current code (wrong):

ECRRepository:
  Tags:
    - Key: !Ref TagName
      Value: !Sub TagValue    # outputs literal string "TagValue"

Expected code:

ECRRepository:
  Tags:
    - Key: !Ref TagName
      Value: !Ref TagValue    # correctly resolves the parameter

!Sub TagValue does not resolve the TagValue parameter — it outputs
the literal string "TagValue". Every other resource in this file
correctly uses !Ref TagValue. ECR tags will always show "TagValue"
instead of the actual user-provided value like "zenml".


Bug 2 — CodeBuild PrivilegedMode: false breaks Docker-in-Docker builds

Current code (wrong):

Environment:
  Image: bentolor/docker-dind-awscli
  PrivilegedMode: false

Expected code:

Environment:
  Image: bentolor/docker-dind-awscli
  PrivilegedMode: true

The image bentolor/docker-dind-awscli is a Docker-in-Docker (dind) image
that requires running a Docker daemon inside the container. AWS CodeBuild
requires PrivilegedMode: true for this. With false, every docker build
command fails with a permission error — making the CodeBuild=true option
completely broken.

Reference: https://docs.aws.amazon.com/codebuild/latest/userguide/build-env-ref-compute-types.html


Bug 3 — Lambda runtime python3.8 is deprecated by AWS

Current code (wrong):

InvokeZenMLAPIFunction:
  Type: AWS::Serverless::Function
  Properties:
    Runtime: python3.8

Expected code:

Properties:
  Runtime: python3.12

AWS officially deprecated the Python 3.8 Lambda runtime. New Lambda
functions using python3.8 will fail to deploy. Since this Lambda
auto-registers the ZenML stack during CloudFormation deployment,
the entire RegisterZenMLStack flow breaks silently for new deployments.

Reference: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html

Reproduction steps

1.### Bug 1 — ECR Tag

  1. Open infra/aws/aws-ecr-s3-sagemaker.yaml
  2. Find the ECRRepository resource (~line 60)
  3. Check the Tags section — Value: !Sub TagValue
  4. Deploy the CloudFormation stack with TagValue = "zenml"
  5. Go to AWS Console → ECR → check the repository tags
  6. Tag value shows literal "TagValue" instead of "zenml"

Bug 2 — CodeBuild PrivilegedMode

  1. Deploy the CloudFormation stack with parameter CodeBuild=true
  2. Trigger a ZenML pipeline that uses CodeBuild as image builder
  3. Go to AWS Console → CodeBuild → check build logs
  4. Build fails with Docker daemon permission error because
    PrivilegedMode: false prevents Docker from running inside container

Bug 3 — Lambda Runtime

  1. Open infra/aws/aws-ecr-s3-sagemaker.yaml
  2. Find InvokeZenMLAPIFunction resource
  3. Note Runtime: python3.8
  4. Deploy the CloudFormation stack with ZenMLServerURL and
    ZenMLServerAPIToken filled in
  5. Lambda fails to deploy or throws deprecation warning
  6. ZenML stack auto-registration silently fails

...

Relevant log output

### Bug 2 — Expected CodeBuild error with PrivilegedMode: false:

Error response from daemon: cannot start a stopped process: unknown
exec /usr/local/bin/dockerd-entrypoint.sh: operation not permitted

### Bug 3 — Expected Lambda deployment warning:

The runtime parameter of python3.8 is no longer supported for 
creating or updating AWS Lambda functions. We recommend you use 
the new runtime (python3.12) while creating or updating functions.

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    core-teamIssues that are being handled by the core team

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions