Skip to content

Fix OTel collector advertise host in direct Kubernetes mode#823

Merged
edavidaja merged 8 commits intomainfrom
fix/otelcol-advertise-host
Apr 3, 2026
Merged

Fix OTel collector advertise host in direct Kubernetes mode#823
edavidaja merged 8 commits intomainfrom
fix/otelcol-advertise-host

Conversation

@mconflitti-pbc
Copy link
Copy Markdown
Contributor

@mconflitti-pbc mconflitti-pbc commented Mar 31, 2026

Summary

  • Removes the default that set OpenTelemetry.CollectorAdvertiseHost to the Connect Service DNS name in the gcfg config — the ephemeral OTel collector port is not exposed on the Service, causing content job pods to time out on every log send
  • Injects status.podIP via the Kubernetes Downward API as CONNECT_OPENTELEMETRY_COLLECTORADVERTISEHOST when config.OpenTelemetry.Enabled: true — the OTel collector already binds to 0.0.0.0 in off-host mode so direct pod-to-pod routing works
  • Adds tests/otel_test.yaml to cover the new behavior

Validation

Env var is set is on connect pod

➜  otelcol-advertise-test git:(otelcol-advertise-test) ✗ PULUMI_ARCH=aws/k8s PULUMI_STACK=otelcol-advertise-test just kubectl exec -it deployment/rstudio-connect-prod -- env | grep CONNECT_OPENTELEMETRY
uv run -- pulumi -C architectures/aws/k8s -s otelcol-advertise-test stack output kubeconfig
CONNECT_OPENTELEMETRY_COLLECTORADVERTISEHOST=10.11.128.216

Proper host ip is passed into content pod:

Environment:
      USER:                            rstudio-connect
      USERNAME:                        rstudio-connect
      LOGNAME:                         rstudio-connect
      HOME:                            /tmp
      TMPDIR:                          /tmp
      PICOTEL_PREFIX:                  PICOTEL
      PICOTEL_SERVICE_NAME:            posit-connect-python
      PICOTEL_EXPORTER_OTLP_ENDPOINT:  http://10.11.128.216:32845/ <<<<<<<<<
      PICOTEL_TRACEPARENT:             00-4272be7a485e5338f7f21f268ed59265-2a45a203df4d0f60-01
      PICOTEL_RESOURCE_ATTRIBUTES:     content.guid=ac5d7e36-b2ed-4d80-ad8b-1bfa155f5182,content.id=7,content.bundle.id=8,job.key=ckTIUYqrLnGdCrwr

No picotel errors in content logs

Screenshot 2026-04-02 at 4 29 03 PM

Test plan

  • Deploy Connect on Kubernetes with OpenTelemetry.Enabled: true
  • Verify CONNECT_OPENTELEMETRY_COLLECTORADVERTISEHOST is set to the pod IP in the Connect pod
  • Deploy content requiring a fresh Python environment restore and confirm the job completes without stalling

@mconflitti-pbc
Copy link
Copy Markdown
Contributor Author

This may end up just becoming a reference as a workaround until we fix Connect to get the ip on its own:

➜  otelcol-advertise-test git:(otelcol-advertise-test) ✗ PULUMI_ARCH=aws/k8s PULUMI_STACK=otelcol-advertise-test just kubectl exec -it deployment/rstudio-connect-prod -- env | grep CONNECT_OPENTELEMETRY
uv run -- pulumi -C architectures/aws/k8s -s otelcol-advertise-test stack output kubeconfig
CONNECT_OPENTELEMETRY_COLLECTORADVERTISEHOST=10.11.128.216

Inject status.podIP via Downward API as CONNECT_OPENTELEMETRY_COLLECTORADVERTISEHOST
when OpenTelemetry is enabled, replacing the Service DNS default that caused content
job pods to time out on the unexposed ephemeral collector port.

Closes posit-dev/connect#38219
… already in pod.env

Prevents duplicate env var error when users had manually set this in pod.env
as a workaround before the chart-level fix was available.
deployment.yaml includes configmap-prestart.yaml in its checksum
annotation, so helm-unittest needs it in the templates list.
@mconflitti-pbc mconflitti-pbc force-pushed the fix/otelcol-advertise-host branch from 6367424 to b833a41 Compare April 2, 2026 19:22
@mconflitti-pbc mconflitti-pbc marked this pull request as ready for review April 2, 2026 20:24
@mconflitti-pbc mconflitti-pbc requested a review from a team as a code owner April 2, 2026 20:24
@dbkegley dbkegley requested a review from lucasrod16 April 2, 2026 20:41
@dbkegley
Copy link
Copy Markdown
Contributor

dbkegley commented Apr 2, 2026

Looks right to me. Just to confirm what I'm reading - if we set config.OpenTelemetry.CollectorAdvertiseHost explicitly in the config then the env var is not added to the deployment, right?

@edavidaja edavidaja merged commit b623964 into main Apr 3, 2026
6 checks passed
@edavidaja edavidaja deleted the fix/otelcol-advertise-host branch April 3, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants