We can't run more than 3 benchmarks. When this happens we have exhaustion error for some node pools.
This gets prometheus_start to never finish - it waits for node pool
prometheus_stop then never finish too
waiting for nodepools to be deleted
infra gke nodes check-deleted -a *** \
-v ZONE:europe-west3-a -v GKE_PROJECT_ID:macro-mile-203600 \
-v EKS_WORKER_ROLE_ARN: -v EKS_CLUSTER_ROLE_ARN: \
-v EKS_SUBNET_IDS: -v SEPARATOR: \
-v CLUSTER_NAME:test-infra -v PR_NUMBER:18000 \
-f ./manifests/prombench/nodes_gke.yaml
11:35:06 gke.go:517: nodepool running name: prometheus-18000
make: *** [Makefile:120: all_nodes_deleted] Error 1
No timeouts on either of those jobs (I see it's running for 2h just fine) https://github.com/prometheus/prometheus/actions/runs/21665627212/job/62460234151
We need to make it robust so:
- We have some eventual timeouts
- Running cancel WHEN start is still running cancels starts
- restart == start ideally
- We don't need to manually remove
prometheus-xyz node pool
We can't run more than 3 benchmarks. When this happens we have exhaustion error for some node pools.
This gets
prometheus_startto never finish - it waits for node poolprometheus_stopthen never finish tooNo timeouts on either of those jobs (I see it's running for 2h just fine) https://github.com/prometheus/prometheus/actions/runs/21665627212/job/62460234151
We need to make it robust so:
prometheus-xyznode pool