prombench malformed PR runs after Exaustion errors (or other start errors)

We can't run more than 3 benchmarks.  When this happens we have exhaustion error for some node pools.

This gets `prometheus_start` to never finish - it waits for node pool

`prometheus_stop` then never finish too

```
waiting for nodepools to be deleted
infra gke nodes check-deleted -a *** \
	-v ZONE:europe-west3-a -v GKE_PROJECT_ID:macro-mile-203600 \
	-v EKS_WORKER_ROLE_ARN: -v EKS_CLUSTER_ROLE_ARN: \
	-v EKS_SUBNET_IDS: -v SEPARATOR: \
	-v CLUSTER_NAME:test-infra -v PR_NUMBER:18000 \
	-f ./manifests/prombench/nodes_gke.yaml
11:35:06 gke.go:517: nodepool running name: prometheus-18000
make: *** [Makefile:120: all_nodes_deleted] Error 1
```

No timeouts on either of those jobs (I see it's running for 2h just fine) https://github.com/prometheus/prometheus/actions/runs/21665627212/job/62460234151

We need to make it robust so:
* We have some eventual timeouts
* Running cancel WHEN start is still running cancels starts
* restart == start ideally
* We don't need to manually remove `prometheus-xyz` node pool


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prombench malformed PR runs after Exaustion errors (or other start errors) #936

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

prombench malformed PR runs after Exaustion errors (or other start errors) #936

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions