Skip to content

Incomplete log records in Kubernetes #11602

@vadimalekseev

Description

@vadimalekseev

Describe the bug

Fluent Bit emits incomplete (split) log records during container log file rotation managed by containerd.
When containerd splits a log record across two files at rotation time,
Fluent Bit forwards each fragment as a separate log record instead of joining them.
The result is malformed JSON records with missing fields that arrive at the destination silently broken.

The bug appears only under high load, since only in this case containerd splits a log record across two files.
Over 1 hour of testing at 10k logs/sec from 2 Pods, Fluent Bit produced 34 split records.

To Reproduce

  1. Clone the benchmark repository:
git clone https://github.com/VictoriaMetrics/log-collectors-benchmark
cd log-collectors-benchmark
  1. Create a kind Kubernetes cluster (requires kubectl, kind, helm, docker, make):
kind create cluster --name log-collectors-bench
  1. Install VictoriaLogs as the log storage backend:
helm repo add vm https://victoriametrics.github.io/helm-charts/

helm install vls vm/victoria-logs-single --namespace logging --create-namespace
  1. Configure Fluent Bit to write to VictoriaLogs:
make set-endpoint VLS_HOST='vls-victoria-logs-single-server.logging.svc.cluster.local' VLS_PORT=9428
  1. Deploy Fluent Bit:
make bench-up-fluent-bit
  1. Start the load generator:
make bench-up-generator GENERATOR_REPLICAS=1 LOGS_PER_SECOND=10000 RAMP_UP=false

You can increase the number of the load generator replicas (GENERATOR_REPLICAS) to greater value if your machine is fast enough.
This will increase the load and a chance to reproduce the bug.

  1. Forward the VictoriaLogs port to your local machine:
kubectl port-forward -n logging vls-victoria-logs-single-server-0 9428:9428
  1. Wait approximately 30 minutes (the bug is intermittent and appears only under sustained load).

  2. Query VictoriaLogs for malformed records using the expression sequence_id:"" -
    this finds all records missing the sequence_id field, which are the split fragments.

  3. Clean up:

make bench-down-all

Expected behavior

Fluent Bit should detect that a log record was split at a file rotation boundary and reconstruct the complete record before forwarding it.

Screenshots

Image

Your Environment

Additional context

The root cause appears to be specific to the last log record of a file at rotation time.
The record is split across two files by containerd and is marked with the partial flag (P in CRI format),
even though its size does not exceed the standard 16 KiB threshold at which containerd normally splits long lines.
Fluent Bit forwards each part as a separate record instead of waiting for and joining the continuation from the new file.

We custom-modified our collector to verify that the issue is rotation-specific. Since other collectors don't encounter this, we've confirmed the application is writing logs properly and isn't the source of truncated or partial log lines.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions