Skip to content

add sca: self-contained dstack app builder#716

Merged
kvinwang merged 8 commits into
masterfrom
sca-tool
Jun 6, 2026
Merged

add sca: self-contained dstack app builder#716
kvinwang merged 8 commits into
masterfrom
sca-tool

Conversation

@kvinwang
Copy link
Copy Markdown
Collaborator

@kvinwang kvinwang commented Jun 6, 2026

Summary

Adds sca (self-contained app), a CLI under tools/sca/ that packages an application directly into an app-compose.json — no Docker, no registry pull. The app's files are decoded at boot into the CVM and run under a systemd service, so the exact bytes are measured into the compose-hash / RTMR3 and supervised like a container.

The branch is purely additive (only tools/sca/**).

How it works

  • You lay out a rootfs/ tree mirroring the CVM filesystem (your binary/script, a systemd unit, an entrypoint).
  • sca build packs the whole tree into a deterministic tar.gz, base64-encodes it, and embeds it in app-compose.json. At boot the guest extracts it onto / via openssl base64 -d | tar -xz -C / (the guest busybox userland has no base64, but has openssl/tar/gzip).
  • A systemd service (Restart=always) runs the entrypoint, so crashes are supervised.

Subcommands

  • sca new <dir> — scaffold config.json, README, and a rootfs/ with default entrypoint.sh + sca.service
  • sca build — emit app-compose.json and print compose-hash / app-id / payload size

Notable design points

  • Reproducible builds: pinned tarfile GNU format, normalized archive modes (dirs 0755, files 0644/0755 by exec bit) and metadata, so the compose-hash is stable across Python versions and builder umask.
  • Compose options as CLI flags on both new and build: --key-provider (none|kms|local|tpm, matching dstack-types KeyProviderKind), --gateway, --public-logs, --public-sysinfo, --secure-time, --(no-)instance-id, --allowed-env, --key-provider-id (tri-state). Validates key_provider and warns on gateway-without-kms.
  • Safety: service unit names validated against a strict regex and shlex.quoted (they reach a root shell at boot); 50 MiB cap checked before writing; unsupported file types rejected.

Examples

  • examples/hello-c — ~30KB static C HTTP server (musl) exposed via gateway, key_provider: kms
  • examples/heartbeat — zero-toolchain app (a shell script) with an embedded config file showing multi-file rootfs packing

Testing

Verified end-to-end on a live CVM, including KMS + gateway, crash-restart, exec-bit/symlink extraction.

kvinwang added 5 commits June 5, 2026 07:16
a CLI to package an app directly into app-compose.json (no docker, no
registry pull). the embedded binary is decoded at boot into a tmpfs
runtime dir and run under a systemd service (Restart=always), so the
exact bytes are measured into the compose-hash / RTMR3 and supervised
like a container.

- `sca new <dir>`: scaffold sca.toml + README + dist/
- `sca build`: emit app-compose.json, print compose-hash / app-id / size

decoding uses openssl (the guest busybox userland has no base64); the
unit is written to /run/systemd/system (tmpfs); payloads live under
.sca_payloads. verified end-to-end on a live CVM incl. crash-restart.
- model: lay out a rootfs/ dir mirroring the CVM fs; build packs the
  whole tree (deterministic tar.gz + base64) into .sca_rootfs and the
  boot script extracts it onto / via 'openssl base64 -d | tar -xz -C /'
  (busybox has no base64 but has openssl/tar/gzip; verified on a CVM)
- config switched from TOML to config.json (stdlib json, no version reqs)
- new scaffolds rootfs/run/sca/bin/entrypoint.sh and
  rootfs/etc/systemd/system/sca.service as editable defaults
- gzip compression raises the effective payload budget under the 50MiB cap
- reproducible build (normalized tar metadata) -> stable compose-hash
- pin tarfile GNU_FORMAT so output bytes don't shift across Python
  versions (default flipped GNU->PAX in 3.8), keeping compose-hash stable
- normalize archive modes (dirs 0755, files 0644/0755 by exec bit) so the
  hash no longer depends on the builder's umask; verified exec bit and
  symlinks still extract correctly on a live CVM
- validate service unit names against a strict regex and shell-quote them
  with shlex.quote (they reach a root shell at boot); reject shell-metachar
  names instead of emitting them
- require compose flags to be real JSON booleans (bool("false") was True)
- check the 50 MiB cap BEFORE writing so a failed build leaves no oversized
  artifact; reject unsupported file types (sockets/fifos) with a clear error
- document the in-memory build footprint
- replace the legacy kms_enabled/local_key_provider_enabled booleans with
  the modern key_provider enum (none|kms|local|tpm), matching dstack-types
  KeyProviderKind; gateway_enabled stays an independent flag. verified the
  new 'key_provider: kms' form drives KMS + gateway end-to-end on a live CVM
- expose all app-compose options as CLI flags on both 'new' (bake into
  config.json) and 'build' (override config): --key-provider, --gateway,
  --public-logs, --public-sysinfo, --secure-time, --(no-)instance-id,
  --allowed-env, --key-provider-id (tri-state so unset keeps config/default)
- validate key_provider and warn when gateway is on without kms
- README.md: what/why, boot-flow diagram, requirements, quick start,
  subcommands, config.json schema, CLI options, security model, the
  three transport size limits (nginx/prpc/in-guest), determinism notes
- examples/hello-c: ~30KB static C HTTP server exposed via gateway
  (key_provider kms); build.sh compiles with musl then runs sca build
- examples/heartbeat: zero-toolchain app (the service is a shell script)
  plus an embedded config file to show multi-file rootfs packing
- build artifacts (compiled binary, app-compose.json) are gitignored
Copilot AI review requested due to automatic review settings June 6, 2026 00:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new sca (“self-contained app”) builder under tools/sca/ that embeds an app’s rootfs/ into app-compose.json (tar.gz + base64) and generates a boot-time bash_script that extracts onto / and starts specified systemd units—enabling “no Docker / no registry pull” deployments that are still measured by compose-hash/RTMR3.

Changes:

  • Introduces tools/sca/sca.py CLI with new (scaffold) and build (deterministic pack + emit compose + print compose-hash/app-id) subcommands.
  • Adds documentation for the tool (tools/sca/README.md) including security model and size limits.
  • Adds two runnable examples (hello-c, heartbeat) with rootfs layouts, systemd units, and build/deploy instructions.

Reviewed changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tools/sca/sca.py New Python CLI to scaffold projects and build self-contained app-compose.json with embedded rootfs + generated boot script.
tools/sca/README.md User documentation for sca, including workflow, requirements, config schema, and security/size notes.
tools/sca/examples/hello-c/src/server.c Example static single-threaded HTTP server source for “hello-c”.
tools/sca/examples/hello-c/rootfs/run/sca/bin/entrypoint.sh Example entrypoint used by systemd to launch the hello-c server binary.
tools/sca/examples/hello-c/rootfs/etc/systemd/system/sca.service Example systemd unit for supervising the hello-c app.
tools/sca/examples/hello-c/README.md Example-specific build and deployment instructions for hello-c (gateway + KMS).
tools/sca/examples/hello-c/config.json Example config enabling key_provider: kms and gateway_enabled: true.
tools/sca/examples/hello-c/build.sh Convenience script to compile with musl-gcc and run sca build.
tools/sca/examples/hello-c/.gitignore Ignores generated binary and compose output for hello-c example.
tools/sca/examples/heartbeat/rootfs/run/sca/bin/entrypoint.sh Example “app is a shell script” heartbeat loop reading /etc/heartbeat/interval.
tools/sca/examples/heartbeat/rootfs/etc/systemd/system/sca.service Example systemd unit for supervising the heartbeat app.
tools/sca/examples/heartbeat/rootfs/etc/heartbeat/interval Example embedded config file for heartbeat interval.
tools/sca/examples/heartbeat/README.md Example-specific build/deploy notes for heartbeat.
tools/sca/examples/heartbeat/config.json Example config using key_provider: none and no gateway.
tools/sca/examples/heartbeat/.gitignore Ignores generated compose output for heartbeat example.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/sca/sca.py
Comment thread tools/sca/sca.py
Comment thread tools/sca/sca.py Outdated
Comment thread tools/sca/sca.py Outdated
kvinwang added 2 commits June 5, 2026 22:58
- validate allowed_envs is a list of strings (a stray string would
  otherwise be spread into per-character entries)
- validate key_provider_id is a hex string and swap_size_mb a
  non-negative int, instead of crashing later with a traceback
- treat 'services': [] as an explicit error, not a silent fallback to
  the default unit (only default when the key is absent/null)
- 'sca new' on a path that exists as a file now errors cleanly instead
  of raising NotADirectoryError
- add SPDX headers to example shell/service/C sources; register the
  config.json and interval data files in REUSE.toml (can't carry comments)
- add docstrings to public functions in sca.py and apply ruff-format so
  the prek ruff (E,F,I,D) + ruff-format hooks pass
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 16 changed files in this pull request and generated 1 comment.

Comment thread tools/sca/sca.py
gettarinfo() only returns None for sockets; FIFOs, device nodes and
hardlinks still produce a TarInfo and were silently packed via the
addfile fallback. reject any entry that isn't a regular file, directory
or symlink so they can't be embedded and extracted onto the guest /.
@kvinwang kvinwang merged commit 7a541ad into master Jun 6, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants