Skip to content

Threaded mode always re-reads files from beginning regardless of DB offset or Read_from_Head setting #11357

@djordjevicmladen

Description

@djordjevicmladen

Describe the bug
The tail input plugin with Threaded true fails to track file position correctly, causing continuous re-reading of log files from the beginning. This occurs whether a database file is configured or not. The stored offset in the database is completely ignored, and the Read_from_Head parameter has no effect. Every time the plugin processes the file (including after rotation), it starts from the beginning, resulting in massive duplication of log entries in outputs.

To Reproduce

  • Configuration file:
[SERVICE]
    Log_Level         debug
    Flush             1
    Daemon            Off
    Parsers_File      parsers.conf

[INPUT]
    Name              tail
    Tag               nginx_logs.access
    Path              /var/log/nginx/access.log
    Path_Key          filepath
    Key               data
    Read_from_Head    true
    DB                /var/lib/fluent-bit/db/nginx-access.db
    DB.locking        true
    Refresh_Interval  10
    Threaded          true

[INPUT]
    Name              tail
    Tag               nginx_logs.error
    Path              /var/log/nginx/error.log
    Path_Key          filepath
    Key               data
    Read_from_Head    true
    DB                /var/lib/fluent-bit/db/nginx-error.db
    DB.locking        true
    Refresh_Interval  10
    Threaded          true

[OUTPUT]
    Name              stdout
    Match             *

[OUTPUT]
    Name              file
    Match             nginx_logs.access
    Path              /var/log/fluent-bit/output/
    File              nginx-access.log

[OUTPUT]
    Name              file
    Match             nginx_logs.error
    Path              /var/log/fluent-bit/output/
    File              nginx-error.log
  • Directory structure:
/var/log/nginx/                    # Nginx log files (input source)
├── access.log
└── error.log

/var/lib/fluent-bit/db/           # Fluent Bit state database files
├── nginx-access.db
└── nginx-error.db

/var/log/fluent-bit/output/       # Processed output files
├── nginx-access.log
└── nginx-error.log
  • Steps to reproduce:
    1. Start Fluent Bit with Threaded true in INPUT configuration
    2. Observe initial processing of log files
    3. Trigger file rotation, restart Fluent Bit, or wait a few seconds for the file to be re-read and checked for changes
    4. Notice that logs are re-processed from the beginning
    5. Check output files for duplicated entries
    6. Verify database file exists and contains offset data (if DB is configured)
    7. Repeat test without DB configuration - behavior is identical

Expected behavior
When Threaded true is enabled, Fluent Bit should:

  • Maintain file position tracking through the configured database or internal state
  • Resume reading from the last successfully processed offset after restart or file rotation
  • Respect the Read_from_Head setting (false = skip to end, true = read from beginning on first run only)
  • Prevent re-processing of already-read log entries
  • Behave consistently whether DB is configured or not (DB should enhance persistence, not be ignored)

Your Environment

  • Version used: 4.2
  • Configuration: (provided above)
  • Operating System and version: macOS with Apple Silicon (M chips)
  • Filters and plugins: tail input plugin, file and stdout output plugins

Additional context

  • The bug occurs exclusively when Threaded true is set - with Threaded false or omitted, offset tracking works correctly
  • Database configuration has no effect - the issue persists whether DB is configured or not; when configured, the offset is written but never read
  • Read_from_Head parameter is completely ignored when threading is enabled
  • Position tracking appears to reset on every file access, causing complete re-processing
  • This creates critical data duplication issues in production environments
  • The problem makes the tail plugin with threading unusable for scenarios requiring exactly-once or at-least-once delivery semantics
  • Specifically tested and confirmed on macOS with Apple Silicon architecture - may be platform-specific or architecture-dependent

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions