-
Notifications
You must be signed in to change notification settings - Fork 149
Description
Summary
When using graphman pause <deployment> to manually pause a subgraph, the indexer-agent's reconcileDeployments loop automatically resumes it within minutes by calling subgraph_deploy on the graph-node JSON-RPC admin API. This makes it impossible to keep a subgraph paused for maintenance operations such as graphman rewind.
Steps to reproduce
graphman pause <IPFS_HASH>- Wait 2-5 minutes (one reconciliation cycle)
graphman info <IPFS_HASH> --status→ showsPaused: false
The subgraph has decisionBasis: always in indexer rules.
Expected behavior
The reconciliation loop should check the paused_at field before calling subgraph_deploy. If a deployment was explicitly paused via graphman, the agent should not resume it.
Impact
- Cannot perform
graphman rewindsafely — the subgraph resumes indexing during the rewind, causing data inconsistencies - Any maintenance operation requiring a temporary stop is compromised
- Operators must resort to workarounds like reassigning to a non-existent node (e.g.
graphman reassign <hash> maintenance_node_0) to prevent the agent from resuming the subgraph
Root cause
In packages/indexer-agent/src/agent.ts, the reconcileDeployments function calculates target deployments based on indexer rules and calls this.graphNode.ensure() for each one. There is no check for whether a deployment is currently paused before calling ensure. If the deployment is in the target list (active allocation or decisionBasis: always/offchain), it will be re-deployed, which implicitly resumes it.
Suggested fix
Before calling ensure for a deployment, check if it is currently paused (paused_at IS NOT NULL). If paused, skip the ensure call. Optionally, add a --force flag to graphman pause or a new indexer rule field (e.g. maintenancePause: true) that the agent explicitly respects.
Related issues
- Add a deployment management mode configuration similar to allocation management mode #713 — Added
--deployment-management manualmode as a workaround (disables entire reconciliation loop) - Enhancement: Indexer agent relies on graph-node to assign deployments to index-nodes #853 — Aimed to make the agent respect the new pause mechanism, but the "use new pausing mechanism" task was never completed
- Support pause/resume of subgraphs graph-node#4255 — Introduced first-class
graphman pause/resumecommands - graphman reassign should pause the subgraph first graph-node#5253 —
graphman reassignshould pause the subgraph first (still OPEN) graphman rewindshould use the same pausing mechanism asgraphman pausegraph-node#5110 —graphman rewindshould use the pause mechanism (closed by stale bot, never implemented)
Current workaround
Reassign the subgraph to a non-existent node before maintenance:
graphman pause <IPFS_HASH>
graphman reassign <IPFS_HASH> maintenance_node_0
# perform maintenance (rewind, reindex, etc.)
graphman reassign <IPFS_HASH> index_node_0
graphman resume <IPFS_HASH>The indexer-agent cannot resume the subgraph because maintenance_node_0 does not exist.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status