fix: address AOF unbounded growth and slow leader memory leak (#685, #769)#802
Open
Mukund2900 wants to merge 1 commit intotidwall:masterfrom
Open
fix: address AOF unbounded growth and slow leader memory leak (#685, #769)#802Mukund2900 wants to merge 1 commit intotidwall:masterfrom
Mukund2900 wants to merge 1 commit intotidwall:masterfrom
Conversation
This commit addresses two related issues affecting tile38 at scale with many Kafka hooks (tidwall#685, tidwall#769): 1. Auto-AOFSHRINK: Add configurable automatic AOF compaction via the new `aofshrink-min-size` config property. When set (e.g. "256mb"), a background goroutine monitors AOF size and triggers AOFSHRINK automatically, eliminating the need for external cron jobs. 2. Skip redundant AOF writes on followers: Followers can deterministically expire items via TTL without writing DEL/DELHOOK commands to their AOF. This prevents unnecessary AOF growth on follower nodes. 3. Fix hook proc() full-scan bug: The webhook queue processor was scanning the entire buntdb index instead of stopping after processing entries for the current hook. With 70K+ hooks, each proc() call performed an O(N) scan of all remaining entries, causing massive allocation pressure and heap fragmentation over time — the root cause of the slow leader memory leak. 4. Kafka connection recycling: Add a 30-minute maximum lifetime to Kafka producer connections. Previously, connections were only recycled after 30 seconds of inactivity, which never occurred under sustained load. This is a precautionary measure for long-running services. Made-with: Cursor
6cfa2b6 to
f551bfd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses two related issues affecting tile38 at scale with many Kafka hooks:
tile38_aof_size_bytesgrows to multiple GB in minutes, requiring manualAOFSHRINK+GCvia external cronBoth issues share the same root architecture: tile38's hook processing at high throughput (thousands of SETs/sec with 70K+ hooks) creates unbounded AOF growth and massive allocation pressure that fragments the Go heap over time.
What's wrong today
1. No automatic AOF compaction
Every write command (
SET,SETHOOK,DEL,DELHOOK) is appended to the AOF. With thousands of vehicle SETs per second and 70K+ geofence hooks, the AOF grows to several GB in minutes. There is no built-in mechanism to automatically compact it — users must runAOFSHRINKmanually or via external cron jobs.2. Redundant AOF writes on followers
When objects/hooks expire via TTL,
backgroundExpireObjects()andbackgroundExpireHooks()writeDEL/DELHOOKcommands to the AOF on both leader and followers. Followers don't need these writes — they can deterministically compute the same expirations from the original commands (which contain the TTL). These redundant writes cause the follower's AOF to bloat just as fast as the leader's.3. Hook proc() scans the entire qdb index (BUG — primary root cause of #769)
Each hook's
proc()function (webhook queue processor) callstx.AscendGreaterOrEqual(\"hooks\", h.query, callback)where the callback always returns true. The buntdb "hooks" index is sorted by hook name, so entries for each hook are contiguous. But because the callback never returns false, everyproc()call scans all remaining entries in the entire index past this hook's position.With 70K hooks and thousands of pending events:
proc()call performs an O(N) scan instead of O(entries-for-this-hook)The 70K manager goroutines running concurrently create enormous allocation pressure. Over weeks, Go's non-compacting garbage collector fragments the heap — pages with mixed live/dead objects can't be returned to the OS, so process RSS grows while HeapAlloc stays stable. This is why the leader's free memory slowly decreases over 1-1.5 months until OOM.
This only affects the leader because
queueHooks()(which populates qdb) only runs on the leader. Followers' hook goroutines are idle — they never callproc().4. Kafka connections are never recycled under sustained load
KafkaConn expires after 30 seconds of inactivity. With 70K hooks constantly sending events, the connection is never idle — the same sarama SyncProducer runs for the entire lifetime of the process (potentially months).
Note: This is a defensive/precautionary fix. There is no confirmed memory leak in the sarama library for long-lived producers — sarama's metadata refresh allocations are GC-eligible, its batching buffers are bounded, and its metrics registry is stable for a single producer instance. However, periodically recycling long-lived connections is reasonable production hygiene.
Changes
Change 1: Auto-AOFSHRINK (aofshrink.go, config.go, server.go) — fixes #685
New `aofshrink-min-size` config property. When set (e.g. `CONFIG SET aofshrink-min-size 256mb`), a background goroutine checks AOF size every 10 seconds and triggers AOFSHRINK automatically when the threshold is exceeded, with a 1-minute cooldown. Disabled by default (value 0).
Change 2: Skip AOF writes for TTL expirations on followers (expire.go) — fixes #685 on followers
Skip writeAOF() in backgroundExpireObjects() and backgroundExpireHooks() when running as a follower. In-memory state is still updated by cmdDEL()/cmdDelHook() — only the redundant disk write is skipped. On restart, the follower replays the leader's AOF which contains the original SET/SETHOOK commands with TTLs, and the items re-expire naturally.
Change 3: Fix proc() full-scan bug (hooks.go) — primary fix for #769
Changed the AscendGreaterOrEqual callback to return false when it encounters an entry for a different hook name. This changes each hook's scan from O(total qdb entries) to O(entries for this specific hook). This is the primary fix for the slow leader memory leak — it eliminates the massive allocation churn that causes Go heap fragmentation over time.
Change 4: Kafka connection max lifetime (kafka.go) — precautionary
Added kafkaMaxLifetime = 30 minutes. When a producer connection exceeds this age, it's closed and a fresh one is created on the next send.
This is a defensive measure, not a fix for a confirmed sarama bug. Long-lived connections work fine in sarama, but periodic recycling is reasonable production hygiene for services that run for months.
Impact
How to enable auto-AOFSHRINK
```
CONFIG SET aofshrink-min-size 256mb
CONFIG REWRITE
```
The value accepts the same format as maxmemory (kb, mb, gb suffixes or raw bytes). Set to 0 to disable (default).
Test plan