Use Tracing Channel for observability

I came across https://github.com/graphql/graphql-js/issues/3133 and since we (Sentry and other APM providers) are driving an ecosystem effort to adopt tracing channels into the server runtimes, I think this is a good time to discuss this again, especially with the concerns noted there no longer being valid.

I'd like to propose adding first-class [`TracingChannel`](https://nodejs.org/api/diagnostics_channel.html#class-tracingchannel) support to `graphql-js`, following the pattern established by [`undici`](https://github.com/nodejs/undici) in Node.js core.

## Motivation

graphql-js is deliberately minimal, it provides the execution engine and nothing else. There are no built-in hooks, middleware, plugins, or tracing APIs. This is a strength for a reference implementation, but it means **every** APM tool must resort to monkey-patching to observe what the engine is doing.

Today, `@opentelemetry/instrumentation-graphql` patches `parse`, `validate`, and `execute` using `import-in-the-middle` for esm and `require-in-the-middle` for CJS, then recursively traverses the entire schema at instrumentation time to wrap every field resolver's `resolve` function. Datadog's `dd-trace` does the same. Sentry does the same. Every APM vendor independently patches the same three functions and walks the same schema tree which is fragile, duplicative, and version-coupled.

Not to mention the broader ecosystem concerns:

- **Runtime lock-in:** RITM and IITM rely on Node.js-specific module loader internals (`Module._resolveFilename`, `module.register()`). They don't work on Bun or Deno, which implement the Node.js API surface but not the module loader internals.
- **ESM fragility:** IITM is built on Node.js's module customization hooks, which are still evolving and have been a persistent source of breakage in the OTEL JS ecosystem.
- **Initialization ordering:** Both require instrumentation to be set up before `graphql` is first `require()`'d / `import`'d.
- **Bundling:** Users must ensure instrumented modules are externalized, which is increasingly difficult as frameworks bundle server-side code into single executables or deployment files.
- **Schema walking is expensive.** Every APM tool recursively wraps every resolver on every type in the schema at setup time. This is O(fields) work that runs once per instrumented schema — and if the schema is rebuilt (e.g., in a gateway), it runs again. Native emission eliminates this entirely.

`TracingChannel` solves all of these. It provides structured lifecycle events (`start`, `end`, `asyncStart`, `asyncEnd`, `error`) with built-in async context propagation, zero-cost when no subscribers are attached, and a standardized subscription model that requires no monkey-patching.

Tracing channels already solve many issues with standard event emitters:

- No overhead when no subscribers are listening.
- Tracing channels can be acquired from anywhere, anytime in the code, so timing and load order won't be an issue.
- Compatible with all server-runtimes we have today (more on that below).
- Automatically handles correlation and context propagation, so no need to track executions with requestIds and no need to do the span relationship dance.

## Cross-platform compatibility

A previous discussion ([#3133](https://github.com/graphql/graphql-js/issues/3133)) raised the concern that `diagnostics_channel` is Node.js-specific and graphql-js targets multiple platforms. This concern was valid in 2021 but is no longer accurate:

- **Bun** supports `node:diagnostics_channel` including `TracingChannel` ([Bun docs](https://bun.sh/docs/runtime/nodejs-apis#node-diagnostics-channel))
- **Deno** supports `node:diagnostics_channel` via its Node.js compatibility layer ([Deno docs](https://docs.deno.com/api/node/diagnostics_channel/)), and also supports the `TracingChannel` API.
- **CloudFlare Workers** also have the [same level of compatability](https://developers.cloudflare.com/workers/runtime-apis/nodejs/diagnostics-channel/#tracingchannel).

This means that every server-side JavaScript runtime that runs graphql-js in production now supports `diagnostics_channel`. Browser environments don't need APM tracing at the execution engine level, browsers usually run GraphQL clients, not servers and even then the API can be imported/loaded conditionally if available and only used if so, so you can write isomorphic logic for it.

The standard compatibility pattern used across the ecosystem handles this cleanly:

```js
let dc;
try {
  dc = ('getBuiltinModule' in process)
    ? process.getBuiltinModule('node:diagnostics_channel')
    : require('node:diagnostics_channel');
} catch {
  // No diagnostics_channel available — all tracing is a no-op
}
```

This is zero-cost when `diagnostics_channel` is unavailable, this means no import, no overhead, no behavior change.

## Runtime Overhead

Tracing channels are specifically designed to handle way more events than `EventEmitter`, and it can be optimized to have zero-overhead when there are no listeners. In runtime we have access to `hasSubscribers` which can be checked before constructing any context objects.

## Implementation

We can use OTEL's as the bare minimum we need as it is the most use instrumentation for `graphql` out there by most APM providers.

All channels use the Node.js [`TracingChannel`](https://nodejs.org/api/diagnostics_channel.html#class-tracingchannel) API, which provides `start`, `end`, `asyncStart`, `asyncEnd`, and `error` sub-channels automatically.

| TracingChannel | Tracks | Context fields |
|---|---|---|
| `graphql:execute` | `execute()` — full operation execution lifecycle | `operationType`, `operationName`, `document`, `schema`, `variableValues` |
| `graphql:parse` | `parse()` — query string to AST | `source` |
| `graphql:validate` | `validate()` — AST validation against schema | `document`, `schema` |
| `graphql:resolve` | Individual field resolver execution | `fieldName`, `fieldPath`, `fieldType`, `parentType`, `args` |
| `graphql:subscribe` | `subscribe()` — subscription setup | `operationType`, `operationName`, `document`, `schema`, `variableValues` |

We can spec out the types and exact fields in each channel's context in a PR.

### Usage in the Ecosystem

```js
const dc = require('node:diagnostics_channel');

// Subscribe to operation execution — the primary span
dc.tracingChannel('graphql:execute').subscribe({
  start(ctx) {
    ctx.span = tracer.startSpan(`${ctx.operationType} ${ctx.operationName}`, {
      attributes: {
        'graphql.operation.type': ctx.operationType,
        'graphql.operation.name': ctx.operationName,
        'graphql.source': print(ctx.document),
      },
    });
  },
  asyncEnd(ctx) {
    ctx.span?.end();
  },
  error(ctx) {
    ctx.span?.setStatus({ code: SpanStatusCode.ERROR, message: ctx.error?.message });
    ctx.span?.recordException(ctx.error);
  },
});
```

That's it, all interested parties including the user can listen for the other channels like this one and run their own telemetry and logic.

## Prior Art

This approach follows the same pattern already adopted or in progress by other major libraries:

- **`undici`** (Node.js core) — ships `TracingChannel` support since Node 20.12: [`undici:request`](https://nodejs.org/api/diagnostics_channel.html#undici-channels)
- **`fastify`** — ships `TracingChannel` support natively (`tracing:fastify.request.handler`)
- **`node-redis`** — [redis/node-redis#3195](https://github.com/redis/node-redis/pull/3195) (`node-redis:command`, `node-redis:connect`)
- **`ioredis`** — [redis/ioredis#2089](https://github.com/redis/ioredis/pull/2089) (`ioredis:command`, `ioredis:connect`)
- **`pg` / `pg-pool`** — [brianc/node-postgres#3624](https://github.com/brianc/node-postgres/pull/3624) (`pg:query`, `pg:connection`, `pg:pool:connect`)
- **`mysql2`** — [sidorares/node-mysql2#4178](https://github.com/sidorares/node-mysql2/pull/4178) (`mysql2:query`, `mysql2:execute`, `mysql2:connect`, `mysql2:pool:connect`)

_Full disclosure, I'm leading the effort on most of these implementations_

---

I would love to hear from you if there's appetite for this, I'm happy to spec this in a PR and fully own this until it is shipped.

I think this feels like a great milestone for v17, the code changes are extremely minimal and if you check some of those PRs it takes a few minutes, or hours at best to transform a library or a framework to be observable-friendly with tracing channels. 

TracingChannel	Tracks	Context fields
`graphql:execute`	`execute()` — full operation execution lifecycle	`operationType`, `operationName`, `document`, `schema`, `variableValues`
`graphql:parse`	`parse()` — query string to AST	`source`
`graphql:validate`	`validate()` — AST validation against schema	`document`, `schema`
`graphql:resolve`	Individual field resolver execution	`fieldName`, `fieldPath`, `fieldType`, `parentType`, `args`
`graphql:subscribe`	`subscribe()` — subscription setup	`operationType`, `operationName`, `document`, `schema`, `variableValues`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Tracing Channel for observability #4629

Motivation

Cross-platform compatibility

Runtime Overhead

Implementation

Usage in the Ecosystem

Prior Art

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use Tracing Channel for observability #4629

Description

Motivation

Cross-platform compatibility

Runtime Overhead

Implementation

Usage in the Ecosystem

Prior Art

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions