Skip to content

nats server not terminating correctly #1101

@h0jeZvgoxFepBQ2C

Description

@h0jeZvgoxFepBQ2C

What version were you using?

2.12.2 via helm chart

What environment was the server running in?

Digital Ocean Managed Kubernetes 1.33

Is this defect reproducible?

Not manually, only happens random (but very often, we have around 100 installations and it happens every day, guess due to down/upscaling nodes)

Given the capability you are leveraging, describe your expectation?

When the nats-0 pod gets terminated (due to node scheduling etc), it sometimes get stuck unlimited in Terminating state..

Given the expectation, what is the defect you are observing?

apiVersion: v1
kind: Pod
metadata:
  name: nats-0
  generateName: nats-
  namespace: company123
  uid: a438e6b2-6f99-4190-9535-4b1da5fe8ca5
  resourceVersion: '1907059241'
  generation: 2
  creationTimestamp: '2025-12-18T10:55:39Z'
  deletionTimestamp: '2025-12-18T11:07:25Z'
  deletionGracePeriodSeconds: 60
  labels:
    app.kubernetes.io/component: nats
    app.kubernetes.io/instance: nats
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: nats
    app.kubernetes.io/version: 2.12.2
    apps.kubernetes.io/pod-index: '0'
    controller-revision-hash: nats-6b7dcb6c9d
    helm.sh/chart: nats-2.12.2
    statefulset.kubernetes.io/pod-name: nats-0
  annotations:
    checksum/config: 303f0fedc13b8bb4a554be83f98214ee1af1bb4010e4ddc46987550f2c327c8f
    cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
  ownerReferences:
    - apiVersion: apps/v1
      kind: StatefulSet
      name: nats
      uid: f56e0705-1b3d-4376-9adb-650102c817ce
      controller: true
      blockOwnerDeletion: true
  selfLink: /api/v1/namespaces/company123/pods/nats-0
status:
  phase: Running
  conditions:
    - type: PodReadyToStartContainers
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2025-12-18T10:55:40Z'
    - type: Initialized
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2025-12-18T10:55:39Z'
    - type: Ready
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2025-12-18T11:06:49Z'
      reason: ContainersNotReady
      message: 'containers with unready status: [nats reloader prom-exporter]'
    - type: ContainersReady
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2025-12-18T11:06:49Z'
      reason: ContainersNotReady
      message: 'containers with unready status: [nats reloader prom-exporter]'
    - type: PodScheduled
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2025-12-18T10:55:39Z'
  hostIP: 10.135.255.85
  hostIPs:
    - ip: 10.135.255.85
  podIP: 10.244.5.236
  podIPs:
    - ip: 10.244.5.236
  startTime: '2025-12-18T10:55:39Z'
  containerStatuses:
    - name: nats
      state:
        running:
          startedAt: '2025-12-18T10:55:39Z'
      lastState: {}
      ready: false
      restartCount: 0
      image: docker.io/library/nats:2.12.2-alpine
      imageID: >-
        docker.io/library/nats@sha256:2d5fce3229ae5741f4ef9225aff95dc4dc036455931eaf77a3eec33fddaa192d
      containerID: >-
        containerd://24ebae4f6b5057dd5d32054241cbc33855dc1e4ac8895b5770ea8096a2640b6a
      started: true
      resources: {}
      volumeMounts:
        - name: config
          mountPath: /etc/nats-config
        - name: pid
          mountPath: /var/run/nats
        - name: kube-api-access-78h7k
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          readOnly: true
          recursiveReadOnly: Disabled
    - name: prom-exporter
      state:
        terminated:
          exitCode: 0
          reason: Completed
          startedAt: '2025-12-18T10:55:40Z'
          finishedAt: '2025-12-18T11:06:25Z'
          containerID: >-
            containerd://1643c23f8daf017f10c901bc1af3db63378f1f39f401e0fd525dc9d1d8d21d21
      lastState: {}
      ready: false
      restartCount: 0
      image: docker.io/natsio/prometheus-nats-exporter:0.17.3
      imageID: >-
        docker.io/natsio/prometheus-nats-exporter@sha256:26c826662ac8424597cc9bdf89ea5b606eb66e3c11db9b1215c27d2076bbb01b
      containerID: >-
        containerd://1643c23f8daf017f10c901bc1af3db63378f1f39f401e0fd525dc9d1d8d21d21
      started: false
      resources: {}
      volumeMounts:
        - name: kube-api-access-78h7k
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          readOnly: true
          recursiveReadOnly: Disabled
    - name: reloader
      state:
        terminated:
          exitCode: 0
          reason: Completed
          startedAt: '2025-12-18T10:55:40Z'
          finishedAt: '2025-12-18T11:06:25Z'
          containerID: >-
            containerd://3418820c117c3096322757d578a88e7f30d9a0e1b25620bda16e7b8788bb445d
      lastState: {}
      ready: false
      restartCount: 0
      image: docker.io/natsio/nats-server-config-reloader:0.20.1
      imageID: >-
        docker.io/natsio/nats-server-config-reloader@sha256:47094fcae2f4ce163ba2ff8b1ca5b0eead8bd642f0b05b6339ae1ff64fcf9e21
      containerID: >-
        containerd://3418820c117c3096322757d578a88e7f30d9a0e1b25620bda16e7b8788bb445d
      started: false
      resources: {}
      volumeMounts:
        - name: pid
          mountPath: /var/run/nats
        - name: config
          mountPath: /etc/nats-config
        - name: kube-api-access-78h7k
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          readOnly: true
          recursiveReadOnly: Disabled
  qosClass: BestEffort
spec:
  volumes:
    - name: config
      configMap:
        name: nats-config
        defaultMode: 420
    - name: pid
      emptyDir: {}
    - name: kube-api-access-78h7k
      projected:
        sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              name: kube-root-ca.crt
              items:
                - key: ca.crt
                  path: ca.crt
          - downwardAPI:
              items:
                - path: namespace
                  fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.namespace
        defaultMode: 420
  containers:
    - name: nats
      image: nats:2.12.2-alpine
      args:
        - '--config'
        - /etc/nats-config/nats.conf
      ports:
        - name: nats
          containerPort: 4222
          protocol: TCP
        - name: websocket
          containerPort: 1337
          protocol: TCP
        - name: cluster
          containerPort: 6222
          protocol: TCP
        - name: monitor
          containerPort: 8222
          protocol: TCP
      env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: SERVER_NAME
          value: $(POD_NAME)
        - name: GOMEMLIMIT
          value: 5GiB
      resources: {}
      volumeMounts:
        - name: config
          mountPath: /etc/nats-config
        - name: pid
          mountPath: /var/run/nats
        - name: kube-api-access-78h7k
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      livenessProbe:
        httpGet:
          path: /healthz?js-enabled-only=true
          port: monitor
          scheme: HTTP
        initialDelaySeconds: 10
        timeoutSeconds: 5
        periodSeconds: 30
        successThreshold: 1
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /healthz?js-server-only=true
          port: monitor
          scheme: HTTP
        initialDelaySeconds: 10
        timeoutSeconds: 5
        periodSeconds: 10
        successThreshold: 1
        failureThreshold: 3
      startupProbe:
        httpGet:
          path: /healthz
          port: monitor
          scheme: HTTP
        initialDelaySeconds: 10
        timeoutSeconds: 5
        periodSeconds: 10
        successThreshold: 1
        failureThreshold: 90
      lifecycle:
        preStop:
          exec:
            command:
              - nats-server
              - '-sl=ldm=/var/run/nats/nats.pid'
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
    - name: reloader
      image: natsio/nats-server-config-reloader:0.20.1
      args:
        - '-pid'
        - /var/run/nats/nats.pid
        - '-config'
        - /etc/nats-config/nats.conf
      resources: {}
      volumeMounts:
        - name: pid
          mountPath: /var/run/nats
        - name: config
          mountPath: /etc/nats-config
        - name: kube-api-access-78h7k
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
    - name: prom-exporter
      image: natsio/prometheus-nats-exporter:0.17.3
      args:
        - '-port=7777'
        - '-connz'
        - '-routez'
        - '-subz'
        - '-varz'
        - '-prefix=nats'
        - '-use_internal_server_id'
        - http://localhost:8222/
      ports:
        - name: prom-metrics
          containerPort: 7777
          protocol: TCP
      resources: {}
      volumeMounts:
        - name: kube-api-access-78h7k
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
  restartPolicy: Always
  terminationGracePeriodSeconds: 60
  dnsPolicy: ClusterFirst
  serviceAccountName: default
  serviceAccount: default
  nodeName: hardly-moveable-s4x5p
  shareProcessNamespace: true
  securityContext: {}
  imagePullSecrets:
    - name: company
  hostname: nats-0
  subdomain: nats-headless
  
  schedulerName: default-scheduler
  tolerations:
    - key: node.kubernetes.io/not-ready
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
    - key: node.kubernetes.io/unreachable
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
  priorityClassName: low
  priority: -5
  enableServiceLinks: false
  preemptionPolicy: PreemptLowerPriority

Metadata

Metadata

Assignees

No one assigned

    Labels

    defectSuspected defect such as a bug or regression

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions