-
Notifications
You must be signed in to change notification settings - Fork 379
nats server not terminating correctly #1101
Copy link
Copy link
Open
Labels
defectSuspected defect such as a bug or regressionSuspected defect such as a bug or regression
Description
What version were you using?
2.12.2 via helm chart
What environment was the server running in?
Digital Ocean Managed Kubernetes 1.33
Is this defect reproducible?
Not manually, only happens random (but very often, we have around 100 installations and it happens every day, guess due to down/upscaling nodes)
Given the capability you are leveraging, describe your expectation?
When the nats-0 pod gets terminated (due to node scheduling etc), it sometimes get stuck unlimited in Terminating state..
Given the expectation, what is the defect you are observing?
apiVersion: v1
kind: Pod
metadata:
name: nats-0
generateName: nats-
namespace: company123
uid: a438e6b2-6f99-4190-9535-4b1da5fe8ca5
resourceVersion: '1907059241'
generation: 2
creationTimestamp: '2025-12-18T10:55:39Z'
deletionTimestamp: '2025-12-18T11:07:25Z'
deletionGracePeriodSeconds: 60
labels:
app.kubernetes.io/component: nats
app.kubernetes.io/instance: nats
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: nats
app.kubernetes.io/version: 2.12.2
apps.kubernetes.io/pod-index: '0'
controller-revision-hash: nats-6b7dcb6c9d
helm.sh/chart: nats-2.12.2
statefulset.kubernetes.io/pod-name: nats-0
annotations:
checksum/config: 303f0fedc13b8bb4a554be83f98214ee1af1bb4010e4ddc46987550f2c327c8f
cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
ownerReferences:
- apiVersion: apps/v1
kind: StatefulSet
name: nats
uid: f56e0705-1b3d-4376-9adb-650102c817ce
controller: true
blockOwnerDeletion: true
selfLink: /api/v1/namespaces/company123/pods/nats-0
status:
phase: Running
conditions:
- type: PodReadyToStartContainers
status: 'True'
lastProbeTime: null
lastTransitionTime: '2025-12-18T10:55:40Z'
- type: Initialized
status: 'True'
lastProbeTime: null
lastTransitionTime: '2025-12-18T10:55:39Z'
- type: Ready
status: 'False'
lastProbeTime: null
lastTransitionTime: '2025-12-18T11:06:49Z'
reason: ContainersNotReady
message: 'containers with unready status: [nats reloader prom-exporter]'
- type: ContainersReady
status: 'False'
lastProbeTime: null
lastTransitionTime: '2025-12-18T11:06:49Z'
reason: ContainersNotReady
message: 'containers with unready status: [nats reloader prom-exporter]'
- type: PodScheduled
status: 'True'
lastProbeTime: null
lastTransitionTime: '2025-12-18T10:55:39Z'
hostIP: 10.135.255.85
hostIPs:
- ip: 10.135.255.85
podIP: 10.244.5.236
podIPs:
- ip: 10.244.5.236
startTime: '2025-12-18T10:55:39Z'
containerStatuses:
- name: nats
state:
running:
startedAt: '2025-12-18T10:55:39Z'
lastState: {}
ready: false
restartCount: 0
image: docker.io/library/nats:2.12.2-alpine
imageID: >-
docker.io/library/nats@sha256:2d5fce3229ae5741f4ef9225aff95dc4dc036455931eaf77a3eec33fddaa192d
containerID: >-
containerd://24ebae4f6b5057dd5d32054241cbc33855dc1e4ac8895b5770ea8096a2640b6a
started: true
resources: {}
volumeMounts:
- name: config
mountPath: /etc/nats-config
- name: pid
mountPath: /var/run/nats
- name: kube-api-access-78h7k
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
readOnly: true
recursiveReadOnly: Disabled
- name: prom-exporter
state:
terminated:
exitCode: 0
reason: Completed
startedAt: '2025-12-18T10:55:40Z'
finishedAt: '2025-12-18T11:06:25Z'
containerID: >-
containerd://1643c23f8daf017f10c901bc1af3db63378f1f39f401e0fd525dc9d1d8d21d21
lastState: {}
ready: false
restartCount: 0
image: docker.io/natsio/prometheus-nats-exporter:0.17.3
imageID: >-
docker.io/natsio/prometheus-nats-exporter@sha256:26c826662ac8424597cc9bdf89ea5b606eb66e3c11db9b1215c27d2076bbb01b
containerID: >-
containerd://1643c23f8daf017f10c901bc1af3db63378f1f39f401e0fd525dc9d1d8d21d21
started: false
resources: {}
volumeMounts:
- name: kube-api-access-78h7k
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
readOnly: true
recursiveReadOnly: Disabled
- name: reloader
state:
terminated:
exitCode: 0
reason: Completed
startedAt: '2025-12-18T10:55:40Z'
finishedAt: '2025-12-18T11:06:25Z'
containerID: >-
containerd://3418820c117c3096322757d578a88e7f30d9a0e1b25620bda16e7b8788bb445d
lastState: {}
ready: false
restartCount: 0
image: docker.io/natsio/nats-server-config-reloader:0.20.1
imageID: >-
docker.io/natsio/nats-server-config-reloader@sha256:47094fcae2f4ce163ba2ff8b1ca5b0eead8bd642f0b05b6339ae1ff64fcf9e21
containerID: >-
containerd://3418820c117c3096322757d578a88e7f30d9a0e1b25620bda16e7b8788bb445d
started: false
resources: {}
volumeMounts:
- name: pid
mountPath: /var/run/nats
- name: config
mountPath: /etc/nats-config
- name: kube-api-access-78h7k
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
readOnly: true
recursiveReadOnly: Disabled
qosClass: BestEffort
spec:
volumes:
- name: config
configMap:
name: nats-config
defaultMode: 420
- name: pid
emptyDir: {}
- name: kube-api-access-78h7k
projected:
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
name: kube-root-ca.crt
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
defaultMode: 420
containers:
- name: nats
image: nats:2.12.2-alpine
args:
- '--config'
- /etc/nats-config/nats.conf
ports:
- name: nats
containerPort: 4222
protocol: TCP
- name: websocket
containerPort: 1337
protocol: TCP
- name: cluster
containerPort: 6222
protocol: TCP
- name: monitor
containerPort: 8222
protocol: TCP
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: SERVER_NAME
value: $(POD_NAME)
- name: GOMEMLIMIT
value: 5GiB
resources: {}
volumeMounts:
- name: config
mountPath: /etc/nats-config
- name: pid
mountPath: /var/run/nats
- name: kube-api-access-78h7k
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
livenessProbe:
httpGet:
path: /healthz?js-enabled-only=true
port: monitor
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 5
periodSeconds: 30
successThreshold: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /healthz?js-server-only=true
port: monitor
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
startupProbe:
httpGet:
path: /healthz
port: monitor
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 90
lifecycle:
preStop:
exec:
command:
- nats-server
- '-sl=ldm=/var/run/nats/nats.pid'
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
- name: reloader
image: natsio/nats-server-config-reloader:0.20.1
args:
- '-pid'
- /var/run/nats/nats.pid
- '-config'
- /etc/nats-config/nats.conf
resources: {}
volumeMounts:
- name: pid
mountPath: /var/run/nats
- name: config
mountPath: /etc/nats-config
- name: kube-api-access-78h7k
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
- name: prom-exporter
image: natsio/prometheus-nats-exporter:0.17.3
args:
- '-port=7777'
- '-connz'
- '-routez'
- '-subz'
- '-varz'
- '-prefix=nats'
- '-use_internal_server_id'
- http://localhost:8222/
ports:
- name: prom-metrics
containerPort: 7777
protocol: TCP
resources: {}
volumeMounts:
- name: kube-api-access-78h7k
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 60
dnsPolicy: ClusterFirst
serviceAccountName: default
serviceAccount: default
nodeName: hardly-moveable-s4x5p
shareProcessNamespace: true
securityContext: {}
imagePullSecrets:
- name: company
hostname: nats-0
subdomain: nats-headless
schedulerName: default-scheduler
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 300
priorityClassName: low
priority: -5
enableServiceLinks: false
preemptionPolicy: PreemptLowerPriority
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
defectSuspected defect such as a bug or regressionSuspected defect such as a bug or regression