Add INFO COMMANDSTATS tracking with per-command success rates by Mathos1432 · Pull Request #1702 · microsoft/garnet

Mathos1432 · 2026-04-14T20:04:02Z

Why is this change being made?

Garnet currently has no way to track per-command usage statistics. Redis provides INFO COMMANDSTATS which reports how many times each command was called, how many failed, and how many were rejected. This is critical for production observability — operators need to understand command usage patterns, identify failing commands, and monitor success rates.

What changed?

Adds Redis-compatible INFO COMMANDSTATS support, gated behind --commandstats-monitor (disabled by default, zero overhead when off). When enabled, tracks per-command calls, failed_calls, and rejected_calls.

Key design decisions:

Counter-only tracking — No latency (usec) on the hot path. Reports usec=0,usec_per_call=0.00 for Redis format compatibility. Latency can be added later via sampling.
Per-session single-writer — Each session owns its own CommandStats array indexed by RespCommand enum (O(1) access). No locking on the hot path; aggregation happens on-demand when INFO COMMANDSTATS is requested.
Error detection — Dual strategy: commandErrorWritten flag set by WriteError/AbortWithErrorMessage, plus RESP - prefix check guarded by a bufferFlushedDuringCommand flag.
Excluded from default INFO — Only appears when explicitly requested via INFO COMMANDSTATS.
Bug fix: GarnetServerMonitor.Dispose() deadlocked when Start() was never called (the ManualResetEvent was never signaled). This affected EmbeddedRespServer used by BDN benchmarks. Fixed by tracking a started flag.

How was this validated?

Unit tests (17 passing)

8 new RespCommandStatsTests: DisabledByDefault, CallsTracking, FailedCalls, UsecFieldsZero, MultipleCommands, SuccessRate, InfoServerShowsMonitorStatus, FormatMatchesRedisConvention
9 existing metrics/info tests verified unbroken

Performance benchmarks (BenchmarkDotNet)

ACL — Server started with an ACL file (authentication enabled)
AOF — Server started with Append-Only File (persistence enabled)
None — Bare server, no ACL or AOF

PING (100 commands/op, .NET 10, Server GC)

Params	Disabled	Enabled	Overhead
ACL	1.687 μs	1.820 μs	+7.9%
AOF	1.680 μs	1.794 μs	+6.8%
None	1.690 μs	1.816 μs	+7.5%

PING is ~17ns/command, so ~1.3ns tracking overhead is proportionally large. This is the worst case.

SET (100 commands/op, .NET 10, Server GC)

Params	Disabled	Enabled	Overhead
ACL	13.10 μs	13.18 μs	+0.6%
AOF	21.13 μs	20.62 μs	noise
None	12.62 μs	13.03 μs	+3.2%

GET (100 commands/op, .NET 10, Server GC)

Params	Disabled	Enabled	Overhead
ACL	14.93 μs	14.65 μs	noise
AOF	14.70 μs	14.68 μs	noise
None	14.24 μs	14.82 μs	+4.1%

Summary: For real-world commands (SET/GET), overhead is 0-4% and within measurement noise for most configurations. Disabled by default = zero overhead unless opted in.

Implements Redis-compatible INFO COMMANDSTATS support, gated behind --commandstats-monitor config flag. Tracks per-command calls, failed_calls, and rejected_calls using lightweight counter increments (no Stopwatch). Key design decisions: - Array-indexed by RespCommand enum for O(1) access - Per-session counters (single-writer, no locking on hot path) - On-demand aggregation from active sessions + disposed session history - No latency tracking (usec=0) to avoid Stopwatch overhead (~4.4x perf hit) - Monitor dispose handles case where Start() was never called (EmbeddedServer) Benchmark results (100 PINGs, .NET 10): Disabled: 1.687 us | Enabled: 1.820 us | Overhead: ~7.8% SET/GET overhead: 0-4% (within noise for real workloads) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…latency references When MetricsSamplingFrequency > 0, globalCommandStats already includes history from disposed sessions. PopulateCommandStatsInfo was adding historyCommandStats on top, double-counting disposed session stats. Also removed stale 'latency' references from doc comments and help text since per-command latency tracking was removed to avoid Stopwatch overhead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T17:00:34Z

+                            if (!failed && !bufferFlushedDuringCommand &&
+                                responseStartBefore != null && dcurr > responseStartBefore)
+                                failed = *responseStartBefore == '-';
+                            if (failed)
+                                commandStats.IncrementFailed(cmd);


The failure-detection logic dereferences responseStartBefore assuming the output buffer was never flushed (!bufferFlushedDuringCommand). However bufferFlushedDuringCommand is only set in SendAndReset(); other flush paths (e.g., WriteDirectLarge and SendAndReset(IMemoryOwner<byte>, ...)) also call Send(...)/GetResponseObject() without setting this flag. If a command triggers one of those flushes, responseStartBefore may point to an old response buffer, and *responseStartBefore can read invalid memory / crash. Consider tracking flushes in all buffer-rotation paths, or avoid raw-pointer inspection by relying solely on an explicit commandErrorWritten signal (ensure every error write sets it), or store an offset within the current response object rather than a raw pointer.

Copilot · 2026-04-17T17:00:34Z

+                if (entry.Calls == 0 && entry.RejectedCalls == 0)
+                    continue;
+
+                string cmdName = cmd.ToString().ToLowerInvariant().Replace('_', '|');


Command names for INFO output should use the canonical Redis command name mapping rather than cmd.ToString(). Using the enum name risks mismatches for commands/subcommands/aliases and duplicates the naming logic already centralized in RespCommandsInfo.GetRespCommandName(cmd) (which already encodes the '|' subcommand separator). Suggest building cmdName from RespCommandsInfo.GetRespCommandName(cmd) and then applying the cmdstat_/lowercasing formatting.

Suggested change

string cmdName = cmd.ToString().ToLowerInvariant().Replace('_', '|');

string cmdName = RespCommandsInfo.GetRespCommandName(cmd).ToLowerInvariant();

Copilot · 2026-04-17T17:00:34Z

+            TimeSpan metricsUpdateDelay = TimeSpan.FromSeconds(1.5);
+
+            using ConnectionMultiplexer redis = ConnectionMultiplexer.Connect(TestUtils.GetConfig());
+            IDatabase db = redis.GetDatabase(0);
+
+            db.StringSet("key1", "value1");
+            Thread.Sleep(metricsUpdateDelay);
+


These tests rely on fixed Thread.Sleep delays to wait for metrics sampling, which can make the suite slow and flaky under load/CI variance. Since COMMANDSTATS can be aggregated on-demand when sampling is disabled, consider starting the server with metricsSamplingFreq <= 0 and polling INFO until the expected entry appears (with a timeout) instead of sleeping a fixed duration.

Copilot · 2026-04-17T17:00:35Z

+        private void StartServer(bool commandStatsMonitor = true, int metricsSamplingFreq = 1)
+        {
+            server = TestUtils.CreateGarnetServer(
+                TestUtils.MethodTestDir,
+                metricsSamplingFreq: metricsSamplingFreq,
+                commandStatsMonitor: commandStatsMonitor);
+            server.Start();


StartServer defaults metricsSamplingFreq to 1, which forces these tests to depend on periodic sampling (hence repeated sleeps). Since COMMANDSTATS supports on-demand aggregation when sampling is disabled, consider defaulting metricsSamplingFreq to a non-positive value for this test class (or adding a helper that waits/polls), to avoid timing-based assertions throughout the suite.

Copilot · 2026-04-17T17:00:35Z

            this.loggerFactory = loggerFactory;
            this.databaseManager = databaseManager ?? DatabaseManagerFactory.CreateDatabaseManager(serverOptions, createDatabaseDelegate, this);
-            this.monitor = serverOptions.MetricsSamplingFrequency > 0
+            this.monitor = serverOptions.MetricsSamplingFrequency > 0 || serverOptions.CommandStatsMonitor


GarnetLatencyMetricsSession requires a non-null GarnetServerMonitor (it dereferences monitor.monitor_iterations), but StoreWrapper still does not create monitor when only LatencyMonitor is enabled and MetricsSamplingFrequency <= 0. With the current condition, --latency-monitor can still lead to a NullReferenceException at session creation unless metrics sampling (or now commandstats) also happens to be enabled. Consider including serverOptions.LatencyMonitor in the monitor-creation condition, or adding explicit validation that latency monitoring requires a positive MetricsSamplingFrequency.

Suggested change

this.monitor = serverOptions.MetricsSamplingFrequency > 0 || serverOptions.CommandStatsMonitor

this.monitor = serverOptions.MetricsSamplingFrequency > 0 || serverOptions.CommandStatsMonitor || serverOptions.LatencyMonitor

Mathos1432 force-pushed the users/matrembl/commandstats branch 5 times, most recently from 0460835 to 60dc7f0 Compare April 15, 2026 13:35

Mathos1432 force-pushed the users/matrembl/commandstats branch from 60dc7f0 to f26aea3 Compare April 15, 2026 15:14

Mathos1432 marked this pull request as ready for review April 15, 2026 19:57

Copilot AI review requested due to automatic review settings April 15, 2026 19:57

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Copilot started reviewing on behalf of Mathos1432 April 15, 2026 20:43 View session

Mathos1432 requested a review from Copilot April 17, 2026 16:54

Copilot started reviewing on behalf of Mathos1432 April 17, 2026 16:54 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add INFO COMMANDSTATS tracking with per-command success rates#1702

Add INFO COMMANDSTATS tracking with per-command success rates#1702
Mathos1432 wants to merge 2 commits intomainfrom
users/matrembl/commandstats

Mathos1432 commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	string cmdName = cmd.ToString().ToLowerInvariant().Replace('_', '\|');
	string cmdName = RespCommandsInfo.GetRespCommandName(cmd).ToLowerInvariant();

	this.monitor = serverOptions.MetricsSamplingFrequency > 0 \|\| serverOptions.CommandStatsMonitor
	this.monitor = serverOptions.MetricsSamplingFrequency > 0 \|\| serverOptions.CommandStatsMonitor \|\| serverOptions.LatencyMonitor

Conversation

Mathos1432 commented Apr 14, 2026

Why is this change being made?

What changed?

Key design decisions:

How was this validated?

Unit tests (17 passing)

Performance benchmarks (BenchmarkDotNet)

PING (100 commands/op, .NET 10, Server GC)

SET (100 commands/op, .NET 10, Server GC)

GET (100 commands/op, .NET 10, Server GC)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants