Summary
Add first-class TLA+ / TLC (and PlusCal/SANY) support to ado-aw so agentic workflows can run formal model checking of state machines and protocols. Today runtimes: covers lean, python, node, and dotnet, but there is no Java/TLA+ runtime — so a workflow that wants to model-check a spec has to hand-roll the toolchain in steps: and manually widen the network allowlist.
This is a natural companion to the existing runtimes: lean (theorem proving): TLC complements Lean — TLA+/TLC is the right tool for discovering liveness/safety gaps (it exhaustively searches and emits counterexample traces), while Lean proves a fixed design. Agentic "find stuck-state / deadlock / missing-timeout bugs in this state machine" workflows want TLC.
Motivation / use case
We want a scheduled agentic workflow that scans a codebase for event-driven state machines, builds TLA+ models, and runs TLC to find liveness gaps (states that can wedge forever) — then proposes the models + findings via safe-outputs (draft PR + work item per gap). This is fully within the agentic-workflow sweet spot, but the toolchain is currently DIY.
What's missing
TLC is a Java application distributed as a single tla2tools.jar (includes TLC, the SANY parser, and the PlusCal translator). To run it in a sandboxed AWF agent you need:
- A JRE/JDK — no Java runtime exists in
ado-aw today.
tla2tools.jar — fetched/cached (latest from the TLA+ releases, or a pinned version).
- Network allowlist entries for the download sources (e.g. a JRE provider such as Adoptium/Temurin, and the GitHub release asset for
tla2tools.jar). GitHub is in the built-in allowlist; the JRE provider domain is not.
- bash allow-list extension so the agent may invoke the toolchain (e.g.
java, plus convenience wrappers like tlc / pluscal / sany, and tar/curl for provisioning) — analogous to how runtimes: lean auto-adds lean/lake.
Proposed solution
A first-class runtime mirroring the existing lean entry:
runtimes:
tla:
version: "1.8.0" # tla2tools.jar version (optional; default latest)
jdk: "21" # JRE/JDK major (optional; sensible default)
When enabled it would, like other runtimes:
- install a JRE and stage
tla2tools.jar (cached across runs),
- auto-extend the bash allow-list with the toolchain commands,
- auto-add the network domains needed to provision it (so workflow authors don't hand-maintain
network.allowed),
- expose a stable invocation path (e.g. a
tlc/pluscal/sany shim, or documented env vars for the java -cp <jar> tlc2.TLC ... form).
Current workaround (what we have to do today)
steps:
- bash: |
set -euo pipefail
mkdir -p "$AGENT_TEMPDIRECTORY/tla"
curl -sSL "https://api.adoptium.net/v3/binary/latest/21/ga/linux/x64/jre/hotspot/normal/eclipse?project=jdk" -o "$AGENT_TEMPDIRECTORY/tla/jre.tgz"
tar -xzf "$AGENT_TEMPDIRECTORY/tla/jre.tgz" -C "$AGENT_TEMPDIRECTORY/tla"
curl -sSL "https://github.com/tlaplus/tlaplus/releases/latest/download/tla2tools.jar" -o "$AGENT_TEMPDIRECTORY/tla/tla2tools.jar"
displayName: "Provision TLA+ toolchain (JRE + TLC)"
network:
allowed:
- "api.adoptium.net"
- "*.adoptium.net"
The agent then runs java -cp tla2tools.jar tlc2.TLC -config M.cfg M.tla. This works but every author must repeat the provisioning, pin/refresh versions by hand, and curate the allowlist — exactly the boilerplate runtimes: exists to remove.
Acceptance criteria
Notes
- A general-purpose Java/JVM runtime would also satisfy this (TLA+ being one consumer); a thin
tla runtime on top of it would be ideal.
- Memory: TLC benefits from a parallel GC and a decent heap; a sensible default
-XX:+UseParallelGC and heap sizing (or a knob) would help.
Summary
Add first-class TLA+ / TLC (and PlusCal/SANY) support to
ado-awso agentic workflows can run formal model checking of state machines and protocols. Todayruntimes:coverslean,python,node, anddotnet, but there is no Java/TLA+ runtime — so a workflow that wants to model-check a spec has to hand-roll the toolchain insteps:and manually widen the network allowlist.This is a natural companion to the existing
runtimes: lean(theorem proving): TLC complements Lean — TLA+/TLC is the right tool for discovering liveness/safety gaps (it exhaustively searches and emits counterexample traces), while Lean proves a fixed design. Agentic "find stuck-state / deadlock / missing-timeout bugs in this state machine" workflows want TLC.Motivation / use case
We want a scheduled agentic workflow that scans a codebase for event-driven state machines, builds TLA+ models, and runs TLC to find liveness gaps (states that can wedge forever) — then proposes the models + findings via safe-outputs (draft PR + work item per gap). This is fully within the agentic-workflow sweet spot, but the toolchain is currently DIY.
What's missing
TLC is a Java application distributed as a single
tla2tools.jar(includes TLC, the SANY parser, and the PlusCal translator). To run it in a sandboxed AWF agent you need:ado-awtoday.tla2tools.jar— fetched/cached (latest from the TLA+ releases, or a pinned version).tla2tools.jar). GitHub is in the built-in allowlist; the JRE provider domain is not.java, plus convenience wrappers liketlc/pluscal/sany, andtar/curlfor provisioning) — analogous to howruntimes: leanauto-addslean/lake.Proposed solution
A first-class runtime mirroring the existing
leanentry:When enabled it would, like other runtimes:
tla2tools.jar(cached across runs),network.allowed),tlc/pluscal/sanyshim, or documented env vars for thejava -cp <jar> tlc2.TLC ...form).Current workaround (what we have to do today)
The agent then runs
java -cp tla2tools.jar tlc2.TLC -config M.cfg M.tla. This works but every author must repeat the provisioning, pin/refresh versions by hand, and curate the allowlist — exactly the boilerplateruntimes:exists to remove.Acceptance criteria
runtimes: tla(orruntimes: tlaplus) provisions a JRE +tla2tools.jar, cached across runs.tla2tools.jar.network.allowed).docs/runtimes.md(and ideally a sample workflow) showing a TLA+ model-checking agent.Notes
tlaruntime on top of it would be ideal.-XX:+UseParallelGCand heap sizing (or a knob) would help.