Skip to content

feat(api): track Durable Object instances with storage usage#1523

Merged
braden-w merged 13 commits intomainfrom
opencode/playful-cactus
Mar 14, 2026
Merged

feat(api): track Durable Object instances with storage usage#1523
braden-w merged 13 commits intomainfrom
opencode/playful-cactus

Conversation

@braden-w
Copy link
Member

@braden-w braden-w commented Mar 13, 2026

Why this exists

Every Epicenter user gets their own Durable Objects—workspace rooms for structured metadata, document rooms for content with version history. Until now, the API had no visibility into which DOs exist per user or how much storage they consume. We need this for a user dashboard ("here are your workspaces and their sizes") and eventually for usage-based billing.

The challenge: we can't query Cloudflare for a list of DO instances or their storage externally. The only way to measure storage is from inside the DO via ctx.storage.sql.databaseSize. So we piggyback on existing RPC calls that are already happening.

How it works

┌─────────┐    sync(body)     ┌──────────────┐
│  Client  │ ───────────────► │  Hono Worker  │
└─────────┘                   └──────┬───────┘
                                     │
                          ┌──────────┴──────────┐
                          │                     │
                     stub.sync()          upsertDoInstance()
                          │                     │
                    ┌─────▼─────┐        ┌──────▼──────┐
                    │ Durable   │        │  Postgres   │
                    │ Object    │        │ (Hyperdrive)│
                    │           │        │             │
                    │ returns:  │        │ INSERT ...  │
                    │ {diff,    │        │ ON CONFLICT │
                    │  storage  │        │ DO UPDATE   │
                    │  Bytes}   │        └─────────────┘
                    └───────────┘

The Worker already calls stub.sync() and stub.getDoc() on every request. We added storageBytes to the return value (read from ctx.storage.sql.databaseSize inside the DO—zero extra cost) and fire a non-blocking upsert to Postgres via the afterResponse queue. The upsert completes before the DB connection closes but doesn't block the HTTP response to the client.

The afterResponse pattern

// Route handler — fire and forget
c.var.afterResponse.push(
  upsertDoInstance(c.var.db, { userId, doType, resourceName, doName, storageBytes })
);
return new Response(diff, { ... }); // Response goes out immediately

// Middleware — drain queued promises, then close DB connection
finally {
  c.executionCtx.waitUntil(afterResponse.drain().then(() => client.end()));
}

createAfterResponseQueue() encapsulates the promise collection with push() and drain() methods. drain() settles all queued promises via Promise.allSettled, and cleanup (closing the pg connection) is chained by the caller via .then()drain() itself takes no parameters. The Promise<unknown> typing is the semantic contract for fire-and-forget: we track promises to completion but never inspect what they resolve to.

DO naming convention alignment

This PR builds on the naming convention change from #1520 (user:{userId}:{type}:{name}). The 6 upsert calls in route handlers construct doName with the type segment, matching getWorkspaceStub/getDocumentStub:

doName: `user:${userId}:workspace:${resourceName}`
doName: `user:${userId}:document:${resourceName}`

Schema

CREATE TABLE durable_object_instance (
  user_id              text      NOT NULL REFERENCES "user"(id) ON DELETE CASCADE,
  do_type              text      NOT NULL,   -- 'workspace' | 'document'
  resource_name        text      NOT NULL,   -- e.g. 'epicenter.tab-manager'
  do_name              text      PRIMARY KEY, -- e.g. 'user:abc:workspace:epicenter.tab-manager'
  storage_bytes        bigint,               -- latest measurement from DO SQLite
  created_at           timestamp NOT NULL DEFAULT now(),
  last_accessed_at     timestamp NOT NULL DEFAULT now(),
  storage_measured_at  timestamp            -- NULL when only lastAccessedAt was updated (WebSocket)
);
CREATE INDEX doi_user_id_idx ON durable_object_instance (user_id);

Design decisions:

  • doName as primary keydoName = user:{userId}:{doType}:{resourceName}, so it already encodes the full identity. A composite PK on the decomposed columns was redundant (two indexes for the same logical key). Single-column PK simplifies the upsert conflict target from 3 columns to 1.
  • userId index — needed for FK cascade delete performance and "list all DOs for user X" queries. The PK on doName can't serve prefix queries on userId.
  • doType and resourceName as data columns — derivable from doName, but kept for query convenience (avoids string parsing).
  • DoType branded uniontype DoType = 'workspace' | 'document' with $type<DoType>() on the column for compile-time safety.
  • storageBytes nullable — WebSocket upgrades update lastAccessedAt only (no RPC to measure storage).
  • Separate storageMeasuredAt — distinguishes "never measured" from "measured at time T", since active WebSocket traffic can make lastAccessedAt fresh while storageBytes is stale.
  • Best-effort.catch() on upsert means a DB failure doesn't break sync. This is a resource registry, not billing authority.

Also in this PR

Workspace ID standardization: tab-managerepicenter.tab-manager (clean break, no migration—local-first clients re-sync to new DO).

deleteStorage() RPC: New method on BaseSyncRoom for cleanup of renamed/orphaned DOs.

Technical article: docs/articles/piggyback-storage-tracking-on-existing-rpcs.md documents the afterResponse queue pattern and the piggybacking approach for broader reference.

What this is NOT

This is not a billing system. For billing you'd need time-series storage measurements, request/connection counting, and AI token tracking—all separate append-only tables. This is a v1 resource registry for a user dashboard.

braden-w added 12 commits March 12, 2026 11:58
Rename workspace ID from 'tab-manager' to 'epicenter.tab-manager' to match
the documented epicenter.<app> naming convention used by other workspaces.
Exposes ctx.storage.deleteAll() as an RPC method for cleaning up
orphaned or renamed Durable Object rooms.
- Add durableObjectInstance table with composite PK (userId, doType, resourceName)
- Add unique index on doName, index on userId
- Add durableObjectInstanceRelations and update userRelations
- Generated migration: 0001_striped_silverclaw.sql
- Modify sync() and getDoc() to return { diff/data, storageBytes } via ctx.storage.sql.databaseSize
- Add afterResponse queue pattern in DB middleware to drain upserts before client.end()
- Add upsertDoInstance helper with INSERT ON CONFLICT for fire-and-forget tracking
- Update all 4 workspace/document route handlers to destructure new RPC shapes and push upserts
- WebSocket upgrades track lastAccessedAt only; HTTP paths include storageBytes
Add review section with summary, deviations, and follow-up work.
# Conflicts:
#	apps/tab-manager/src/lib/workspace.ts
Update upsert doName constructions to include the type segment,
matching the `user:{userId}:{type}:{name}` convention.
…quest lifecycle

Remove cleanup parameter from drain() — callers chain .then() instead.
Drop explicit Promise<unknown> return type from upsertDoInstance (inferred).
Add JSDoc explaining the unknown typing contract and step-by-step comments
documenting the pg.Client lifetime through waitUntil.
Add exported DoType discriminator to schema and apply $type<DoType>()
branding on the doType column. Replace uniqueIndex with .unique() on
doName (simpler Drizzle idiom). Regenerate migration and update spec
to match simplified drain() API.
The composite PK (userId, doType, resourceName) already starts with
userId, so Postgres uses it for any userId prefix query. The separate
index was dead weight — duplicate B-tree costing writes for zero
query benefit.
doName already encodes userId + doType + resourceName, making the
composite PK redundant. Single-column PK simplifies the upsert
conflict target from 3 columns to 1. userId index added for FK
cascade performance and user-scoped queries.
@braden-w braden-w changed the title feat(api): track Durable Object instances with storage telemetry feat(api): track Durable Object instances with storage usage Mar 14, 2026
Covers the afterResponse queue pattern, waitUntil lifecycle, and why
Promise<unknown> is the right fire-and-forget contract. Uses real code
from the Epicenter API.
@braden-w braden-w merged commit ef0c9cf into main Mar 14, 2026
1 of 8 checks passed
@braden-w braden-w deleted the opencode/playful-cactus branch March 14, 2026 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant