-
Notifications
You must be signed in to change notification settings - Fork 598
Three issues found in infra/aws/aws-ecr-s3-sagemaker.yaml CloudFormation template #4670
Description
Contact Details [Optional]
Available on GitHub for follow-up comments
System Information
zenml info -a -s not applicable — this bug is in the CloudFormation
infrastructure template, not in the ZenML Python package itself.
File affected: infra/aws/aws-ecr-s3-sagemaker.yaml
Branch: main
What happened?
While reviewing the CloudFormation template at infra/aws/aws-ecr-s3-sagemaker.yaml,
I found 3 bugs that cause silent failures or broken deployments.
Bug 1 — ECR Repository tag uses !Sub instead of !Ref
Current code (wrong):
ECRRepository:
Tags:
- Key: !Ref TagName
Value: !Sub TagValue # outputs literal string "TagValue"Expected code:
ECRRepository:
Tags:
- Key: !Ref TagName
Value: !Ref TagValue # correctly resolves the parameter!Sub TagValue does not resolve the TagValue parameter — it outputs
the literal string "TagValue". Every other resource in this file
correctly uses !Ref TagValue. ECR tags will always show "TagValue"
instead of the actual user-provided value like "zenml".
Bug 2 — CodeBuild PrivilegedMode: false breaks Docker-in-Docker builds
Current code (wrong):
Environment:
Image: bentolor/docker-dind-awscli
PrivilegedMode: falseExpected code:
Environment:
Image: bentolor/docker-dind-awscli
PrivilegedMode: trueThe image bentolor/docker-dind-awscli is a Docker-in-Docker (dind) image
that requires running a Docker daemon inside the container. AWS CodeBuild
requires PrivilegedMode: true for this. With false, every docker build
command fails with a permission error — making the CodeBuild=true option
completely broken.
Reference: https://docs.aws.amazon.com/codebuild/latest/userguide/build-env-ref-compute-types.html
Bug 3 — Lambda runtime python3.8 is deprecated by AWS
Current code (wrong):
InvokeZenMLAPIFunction:
Type: AWS::Serverless::Function
Properties:
Runtime: python3.8Expected code:
Properties:
Runtime: python3.12AWS officially deprecated the Python 3.8 Lambda runtime. New Lambda
functions using python3.8 will fail to deploy. Since this Lambda
auto-registers the ZenML stack during CloudFormation deployment,
the entire RegisterZenMLStack flow breaks silently for new deployments.
Reference: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html
Reproduction steps
1.### Bug 1 — ECR Tag
- Open
infra/aws/aws-ecr-s3-sagemaker.yaml - Find the
ECRRepositoryresource (~line 60) - Check the Tags section —
Value: !Sub TagValue - Deploy the CloudFormation stack with TagValue = "zenml"
- Go to AWS Console → ECR → check the repository tags
- Tag value shows literal
"TagValue"instead of"zenml"
Bug 2 — CodeBuild PrivilegedMode
- Deploy the CloudFormation stack with parameter
CodeBuild=true - Trigger a ZenML pipeline that uses CodeBuild as image builder
- Go to AWS Console → CodeBuild → check build logs
- Build fails with Docker daemon permission error because
PrivilegedMode: falseprevents Docker from running inside container
Bug 3 — Lambda Runtime
- Open
infra/aws/aws-ecr-s3-sagemaker.yaml - Find
InvokeZenMLAPIFunctionresource - Note
Runtime: python3.8 - Deploy the CloudFormation stack with ZenMLServerURL and
ZenMLServerAPIToken filled in - Lambda fails to deploy or throws deprecation warning
- ZenML stack auto-registration silently fails
...
Relevant log output
### Bug 2 — Expected CodeBuild error with PrivilegedMode: false:
Error response from daemon: cannot start a stopped process: unknown
exec /usr/local/bin/dockerd-entrypoint.sh: operation not permitted
### Bug 3 — Expected Lambda deployment warning:
The runtime parameter of python3.8 is no longer supported for
creating or updating AWS Lambda functions. We recommend you use
the new runtime (python3.12) while creating or updating functions.Code of Conduct
- I agree to follow this project's Code of Conduct