Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 24 additions & 5 deletions .cursor/skills/proof/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,15 @@ You (the parent agent) author the DAG inline using your understanding of the use
{
"title": "<short human-readable title for the run>",
"models": {
"HIGH": "gpt-5.3-codex",
"HIGH": {
"id": "gpt-5.4",
"params": [{ "id": "reasoning", "value": "high" }]
},
"MED": "composer-2",
"LOW": "auto-low"
"LOW": {
"id": "gpt-5.4-nano",
"params": [{ "id": "reasoning", "value": "low" }]
}
},
"tasks": [
{
Expand All @@ -49,7 +55,7 @@ Rules:
- Every `depends_on` entry must reference another task's `id`.
- No cycles. The runner rejects cyclic DAGs at parse time.
- `complexity` controls the model the subagent uses (see table below). Pick `HIGH` for novel/complex reasoning, `MED` for typical implementation, `LOW` for mechanical/lookup tasks.
- Optional top-level `models` can override the default complexity → model map for this DAG.
- Optional top-level `models` can override the default complexity → model map for this DAG. Values can be plain SDK model id strings or model selection objects of the shape `{ "id": "...", "params": [{ "id": "...", "value": "..." }] }`, with `params` omitted when unused.
- `subtask_prompt` should read like a standalone request — the runner automatically prepends a short summary of upstream task outputs, so you do not need to repeat them.
- Do **not** put two tasks that write to the same file in the same rank (siblings within a rank run concurrently and would race).

Expand Down Expand Up @@ -167,11 +173,24 @@ After the runner exits, briefly summarize what completed/failed and re-link the
| MED | `composer-2` |
| LOW | `gpt-5.4-nano` |

Override any subset inline with top-level DAG `models`, or pass a reusable profile with `--models-file <path>`. Precedence is defaults < DAG `models` < `--models-file`. The Cursor model catalog can vary by account.
Override any subset inline with top-level DAG `models`, or pass a reusable profile with `--models-file <path>`. Values can be plain SDK model id strings or SDK model selections with `params`. At run time, Proof calls `Cursor.models.list()`, validates ids and param values, and expands partial selections to the closest valid preset variant using the model's default variant for omitted params. Precedence is defaults < DAG `models` < `--models-file`. The Cursor model catalog can vary by account.
Comment thread
cursor[bot] marked this conversation as resolved.

To use a cheaper high-capability GPT model, use the base SDK id plus params, not a suffix-style id:

```json
{
"models": {
"HIGH": {
"id": "gpt-5.4",
"params": [{ "id": "reasoning", "value": "high" }]
}
}
}
```

### Discovering valid model ids

Many Cursor CLI catalog models encode reasoning effort and Max Mode as **slug suffixes** (e.g. `claude-opus-4-7-thinking-max`, `gpt-5.5-extra-high`, `gpt-5.3-codex-xhigh`), but the Cursor SDK may accept only base slugs. Do not compose SDK model ids from CLI suffixes by hand. For SDK-bound code, prefer `Cursor.models.list()` or the SDK's `ConfigurationError` catalog over `cursor-agent --list-models`.
Many Cursor CLI catalog models encode reasoning effort and Max Mode as **slug suffixes** (e.g. `claude-opus-4-7-thinking-max`, `gpt-5.5-extra-high`, `gpt-5.3-codex-xhigh`), but the Cursor SDK may accept only base slugs plus `params`. Do not compose SDK model ids from CLI suffixes by hand: use `{ "id": "gpt-5.4", "params": [{ "id": "reasoning", "value": "high" }] }`, not `gpt-5.4-high`. For SDK-bound code, prefer `Cursor.models.list()` or the SDK's `ConfigurationError` catalog over `cursor-agent --list-models`.

Ways to enumerate model ids:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
"title": "Flatbread Flow PMF Audit (no sub-sub-agents)",
"framing": "Treat Flatbread as Git-native relational content for TypeScript apps, backed by flat files. GraphQL is one interface, not the whole product identity.",
"models": {
"HIGH": "claude-opus-4-7",
"MED": "gpt-5.5",
"LOW": "gpt-5.4-mini"
"HIGH": { "id": "claude-opus-4-7" },
"MED": { "id": "gpt-5.5" },
"LOW": { "id": "gpt-5.4-mini" }
},
"tasks": [
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
"title": "Flatbread codegen-only change (no sub-sub-agents)",
"framing": "Treat Flatbread as Git-native relational content for TypeScript apps, backed by flat files. GraphQL is one interface, not the whole product identity.",
"models": {
"HIGH": "claude-opus-4-7",
"MED": "gpt-5.5",
"LOW": "gpt-5.4-mini"
"HIGH": { "id": "claude-opus-4-7" },
"MED": { "id": "gpt-5.5" },
"LOW": { "id": "gpt-5.4-mini" }
},
"tasks": [
{
Expand Down
6 changes: 3 additions & 3 deletions .cursor/skills/proof/examples/flatbread/dag-docs-sync.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
"title": "Flatbread docs / README sync (no sub-sub-agents)",
"framing": "Treat Flatbread as Git-native relational content for TypeScript apps, backed by flat files. GraphQL is one interface, not the whole product identity.",
"models": {
"HIGH": "claude-opus-4-7",
"MED": "gpt-5.5",
"LOW": "gpt-5.4-mini"
"HIGH": { "id": "claude-opus-4-7" },
"MED": { "id": "gpt-5.5" },
"LOW": { "id": "gpt-5.4-mini" }
},
"tasks": [
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
"title": "Flatbread schema-breaking migration (no sub-sub-agents; pause at human checkpoint after contract-synth)",
"framing": "Treat Flatbread as Git-native relational content for TypeScript apps, backed by flat files. GraphQL is one interface, not the whole product identity.",
"models": {
"HIGH": "claude-opus-4-7",
"MED": "gpt-5.5",
"LOW": "gpt-5.4-mini"
"HIGH": { "id": "claude-opus-4-7" },
"MED": { "id": "gpt-5.5" },
"LOW": { "id": "gpt-5.4-mini" }
},
"tasks": [
{
Expand Down
30 changes: 29 additions & 1 deletion packages/proof/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,33 @@ Every DAG has a `title` and a `tasks` array. Each task needs:

Proof computes ranks with Kahn topological sort and runs sibling tasks in the same rank concurrently. Avoid placing two sibling tasks in the same rank if they write the same files.

Optional top-level `models` can override the default complexity map with plain
SDK model id strings or SDK model selections:

```json
{
"models": {
"HIGH": {
"id": "gpt-5.4",
"params": [{ "id": "reasoning", "value": "high" }]
},
"MED": "composer-2",
"LOW": {
"id": "gpt-5.4-nano",
"params": [{ "id": "reasoning", "value": "low" }]
}
}
}
```

Use the object shape when you need `params`; use a string when the model id is
enough. For example, use `{ "id": "gpt-5.4", "params": [{ "id": "reasoning", "value": "high" }] }`, not a suffix-style id like `gpt-5.4-high`.

When a DAG runs, Proof calls `Cursor.models.list()`, validates model ids and
param values, and expands partial selections to the closest valid SDK preset
variant using that model's default variant for omitted params. `--init-only`
does not call the SDK, so it can still render a canvas without `CURSOR_API_KEY`.

Optional task kinds add control gates:

- `kind: "oracle"` runs a shell command and records pass/fail evidence.
Expand Down Expand Up @@ -118,8 +145,9 @@ Proof also exposes helpers for tooling:
```ts
import {
computeRanks,
createModelResolver,
createModelSelectionResolver,
parseDAG,
resolveModelSelectionFromCatalog,
runDryCheck,
type DAG,
type TaskState,
Expand Down
68 changes: 52 additions & 16 deletions packages/proof/src/canvas_writer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,15 @@

import { writeFile, mkdir } from 'node:fs/promises';
import { dirname } from 'node:path';
import type { Complexity, DAG, TaskKind } from './dag.js';
import {
formatModelSelection,
normalizeModelSelection,
type Complexity,
type DAG,
type ModelSelection,
type ModelSpec,
type TaskKind,
} from './dag.js';

export type TaskStatus =
| 'PENDING'
Expand All @@ -27,6 +35,7 @@ export interface TaskState {
subtask_prompt: string;
status: TaskStatus;
model: string;
modelSelection?: ModelSelection;
/** `'task'` (default), `'pause'`, or `'oracle'`. Undefined is normalized to `'task'`. */
kind?: TaskKind;
/**
Expand Down Expand Up @@ -91,25 +100,34 @@ export interface RunState {

export function initialRunState(
dag: DAG,
modelFor: (c: Complexity) => string
modelFor: (c: Complexity) => ModelSpec
): RunState {
return {
title: dag.title,
startedAt: Date.now(),
tasks: dag.tasks.map((t) => ({
id: t.id,
depends_on: t.depends_on,
complexity: t.complexity,
subtask_prompt: t.subtask_prompt,
status: 'PENDING',
model: modelFor(t.complexity),
// Normalize undefined kind → 'task' so downstream consumers (canvas
// template, runner dispatcher) never have to ?? again.
kind: t.kind ?? 'task',
// Surface oracle-only fields so the canvas can render the gate's
// command / expectation without reading the streamed result body.
...(t.kind === 'oracle' ? { command: t.command, expect: t.expect } : {}),
})),
tasks: dag.tasks.map((t) => {
const modelSelection = normalizeModelSelection(
modelFor(t.complexity),
`model for task ${t.id}`
);
return {
id: t.id,
depends_on: t.depends_on,
complexity: t.complexity,
subtask_prompt: t.subtask_prompt,
status: 'PENDING',
model: formatModelSelection(modelSelection),
modelSelection,
// Normalize undefined kind → 'task' so downstream consumers (canvas
// template, runner dispatcher) never have to ?? again.
kind: t.kind ?? 'task',
// Surface oracle-only fields so the canvas can render the gate's
// command / expectation without reading the streamed result body.
...(t.kind === 'oracle'
? { command: t.command, expect: t.expect }
: {}),
};
}),
};
}

Expand Down Expand Up @@ -215,13 +233,25 @@ type TaskStatus =
type Complexity = 'HIGH' | 'MED' | 'LOW';
type TaskKind = 'task' | 'pause' | 'oracle';

// Keep in sync with ModelParameterValue / ModelSelection in dag.ts.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Canvas template types still duplicated — silent drift risk (prior finding, still open).

The ModelParameterValue and ModelSelection interfaces here live inside the embedded canvas template string, so TypeScript cannot enforce their parity with the exported types in dag.ts. The // Keep in sync comment is the only guard.

A field added to the public ModelSelection (or a param renamed) will silently be invisible in canvas rendering until someone notices at runtime.

One pragmatic option: add a prose list of the fields being mirrored to the comment so a code reviewer has a checklist:

// Keep in sync with ModelParameterValue / ModelSelection in dag.ts.
// Fields mirrored: ModelParameterValue.{id, value}; ModelSelection.{id, params?}.
// If either type gains a field, update both this template copy and the canvas render below.

A stronger option (if the template compilation step is ever added) is to import and satisfies-assert the types before embedding.

interface ModelParameterValue {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelParameterValue and ModelSelection redeclared without compile-time sync enforcement (MED).

The // Keep in sync comment is the only guard. If dag.ts adds a required field to ModelSelection, TypeScript will not warn here — the local interface shadows any import, and the embedded template string is not typechecked by the package's own tsconfig. Silent canvas rendering anomalies would be the only symptom.

Minimal fix: add a structural assertion outside the template string:

type _AssertModelSelectionInSync =
  import('./dag.js').ModelSelection extends ModelSelection ? true : never;

This turns any structural divergence into a compile error that the package's own typecheck script will catch.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Canvas template types still manually duplicated — compile-time sync enforcement missing.

ModelParameterValue and ModelSelection are redeclared inside the embedded template string. TypeScript cannot enforce parity with dag.ts; a new required field (e.g. weight?: number) added to ModelSelection in dag.ts will silently not appear in the canvas render until a runtime mismatch surfaces.

A snapshot/integration test that exercises initialRunState with a parameterised task and asserts the rendered params string would catch this drift automatically.

id: string;
value: string;
}

interface ModelSelection {
Comment thread
cursor[bot] marked this conversation as resolved.
id: string;
params?: ModelParameterValue[];
}

interface TaskState {
id: string;
depends_on: string[];
complexity: Complexity;
subtask_prompt: string;
status: TaskStatus;
model: string;
modelSelection?: ModelSelection;
kind?: TaskKind;
command?: string;
expect?: string;
Expand Down Expand Up @@ -684,6 +714,12 @@ function TaskList({
: ''}
{(t.iteration ?? 0) > 0 ? ' · iteration ' + t.iteration : ''}
</Text>
{t.modelSelection?.params && t.modelSelection.params.length > 0 ? (
<Text size="small" tone="tertiary" style={{ paddingLeft: 12 }}>
{'Params: ' +
t.modelSelection.params.map((p) => p.id + '=' + p.value).join(', ')}
</Text>
) : null}
{effectiveKind(t) === 'pause' && t.checkpointPath ? (
<Stack gap={4}>
<Text size="small" weight="semibold">
Expand Down
Loading
Loading