Adding metrics information on token usage and performance by miroovh · Pull Request #120 · thushan/olla

miroovh · 2026-03-11T00:24:44Z

Metrics available through 2 url :
/internal/stats/requests
/internal/stats/summary

Main information retrieved :

Model
Endpoint name
Tokens in
Tokens out
Time to first token

Summary by CodeRabbit

Release Notes

New Features
- Request metrics collection now tracks tokens, response times, and throughput across your requests.
- New API endpoints for retrieving recent requests and aggregated statistics summaries.
- Time to First Token (TTFT) measurement for streaming responses to monitor response latency.
Chores
- Version updated to v0.0.24m.

coderabbitai · 2026-03-11T00:25:05Z

Walkthrough

This pull request introduces a comprehensive request metrics collection system featuring asynchronous, non-blocking metric recording with ring buffer storage, streaming time-to-first-token (TTFT) measurement via atomic operations, aggregated statistics computation with percentile support, and new HTTP endpoints for metrics retrieval integrated throughout the proxy service pipeline.

Changes

Cohort / File(s)	Summary
Metrics Core Infrastructure `internal/adapter/metrics/types.go`, `internal/core/ports/stats.go`	Introduces RequestMetricsEvent interface/data structure, RequestMetricsRecorder interface, and three public aggregation types (AggregatedStats, ModelAggregatedStats, EndpointAggregatedStats) for metrics representation and JSON serialisation.
Streaming TTFT Measurement `internal/adapter/metrics/stream_tap.go`	Adds StreamTap type implementing io.Writer for passive SSE stream observation, recording time-to-first-token and total bytes via atomic operations without blocking the hot path.
Metrics Collection & Aggregation `internal/adapter/metrics/collector.go`, `internal/adapter/metrics/collector_test.go`	Implements RequestCollector with ring buffer storage, non-blocking channel-based metric recording, dropping events under backpressure, GetRecentRequests(), GetAggregatedStats() with filtering, and comprehensive test coverage including ring buffer wraparound and per-model aggregation.
Proxy Integration `internal/adapter/proxy/core/base.go`, `internal/adapter/proxy/olla/service_retry.go`	Adds optional RequestMetricsRecorder field and setter to BaseProxyComponents; injects StreamTap via io.TeeReader into response body and records collected streaming metrics (TTFT, bytes) merged with provider metrics.
Application Layer Integration `internal/app/handlers/application.go`, `internal/app/services/http.py`	Initialises metrics.RequestCollector in Application struct with public accessor; wires the collector to proxy service via SetRequestMetricsRecorder() during service setup.
Metrics HTTP Endpoints `internal/app/handlers/handler_stats_requests.go`, `internal/app/handlers/server_routes.go`	Introduces RequestStatsResponse and RequestSummaryResponse types; adds /internal/stats/requests endpoint (supports limit query param, default 50, max 1000) and /internal/stats/summary endpoint (supports since duration param) for metrics retrieval.
Configuration `makefile`	Updates embedded version string from v0.0.1 to v0.0.24m.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Proxy as Proxy Service
    participant Tap as StreamTap
    participant Collector as Metrics Collector
    participant Buffer as Ring Buffer

    Client->>Proxy: Streaming LLM Request
    Proxy->>Tap: io.TeeReader wraps response
    Note over Tap: Records start time
    
    loop Streaming Response
        Proxy->>Tap: Write(data chunks)
        Tap->>Tap: Record TTFT on first token<br/>(atomic operations)
        Tap-->>Proxy: Pass through bytes
    end
    
    Proxy->>Proxy: Extract provider metrics<br/>(tokens, throughput)
    Proxy->>Collector: RecordRequestMetrics(event)
    Collector->>Collector: Non-blocking channel send<br/>(drop if full)
    
    Collector->>Buffer: Consumer goroutine<br/>drains channel
    Buffer->>Buffer: Append to ring buffer<br/>(with wraparound)
    
    Note over Collector,Buffer: Asynchronous processing

sequenceDiagram
    actor User
    participant HTTP as HTTP Handler
    participant Collector as Metrics Collector
    participant Buffer as Ring Buffer
    participant Aggregator as Stats Engine

    User->>HTTP: GET /internal/stats/requests?limit=50
    HTTP->>Collector: GetRecentRequests(50)
    Collector->>Buffer: Read N most recent entries
    Buffer-->>Collector: Return ordered slice
    Collector-->>HTTP: RequestMetrics slice
    HTTP-->>User: RequestStatsResponse (JSON)

    User->>HTTP: GET /internal/stats/summary?since=5m
    HTTP->>Collector: GetAggregatedStats(since timestamp)
    Collector->>Buffer: Iterate all entries
    Buffer->>Aggregator: Filter by timestamp
    Aggregator->>Aggregator: Compute aggregates:<br/>counts, percentiles (p50/p95/p99),<br/>per-model, per-endpoint
    Aggregator-->>Collector: AggregatedStats
    Collector-->>HTTP: Return stats
    HTTP-->>User: RequestSummaryResponse (JSON)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: better stats #52: Implements and wires per-request and provider metrics collection, with overlapping changes in metrics extraction, recording pipeline, and streaming metrics integration.

Suggested labels

enhancement, performance

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: introducing metrics collection for token usage and performance, which is the primary focus of the 269+ lines added to the metrics collector and related infrastructure.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (4)

makefile (1)

3-3: Non-standard version suffix "m" may cause tooling issues.

The m suffix in v0.0.24m doesn't follow semantic versioning. If this is a metrics-branch marker, consider using a pre-release format like v0.0.24-metrics or v0.0.24+metrics for better compatibility with version parsing tools.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@makefile` at line 3, The VERSION variable currently uses a non-standard
suffix ("v0.0.24m"); update the VERSION value to a semver-compatible form such
as "v0.0.24-metrics" or "v0.0.24+metrics" to avoid tooling parse issues—edit the
VERSION assignment in the Makefile (symbol: VERSION) to replace "v0.0.24m" with
your chosen pre-release or build metadata format and ensure any automation that
reads VERSION expects the new format.

internal/adapter/metrics/stream_tap.go (2)

15-20: Thread-safety note: firstTokenAt is not atomically protected.

The firstTokenAt field is accessed non-atomically while hasFirstData uses atomics. This is safe given the sequential nature of io.TeeReader usage, but if StreamTap were ever used with concurrent writers, this would be a data race. The design comment should clarify this single-writer assumption.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/adapter/metrics/stream_tap.go` around lines 15 - 20, The comment
notes that StreamTap.firstTokenAt is not atomically protected while hasFirstData
uses atomics; update the StreamTap type comment to explicitly state the
single-writer / sequential-io assumption (e.g., that StreamTap is only safe when
used by a single writer such as via io.TeeReader) and warn that concurrent
writers would cause a data race on firstTokenAt; reference StreamTap,
firstTokenAt and hasFirstData in the comment so future maintainers know the
concurrency contract and consider promoting atomic/time-safe access if
concurrency is needed.

34-51: Consider early return after SSE data detected to avoid repeated bytes.Contains.

Once hasFirstData reaches state 2, the bytes.Contains check on line 45 still executes on every subsequent chunk despite never storing. For high-throughput streams, this adds unnecessary overhead.

♻️ Suggested optimisation

 func (t *StreamTap) Write(p []byte) (n int, err error) {
 	now := time.Now()
 	t.totalBytes.Add(int64(len(p)))
 
+	// Fast path: already captured SSE data timing
+	if t.hasFirstData.Load() >= 2 {
+		return len(p), nil
+	}
+
 	// Record first byte timestamp
 	if t.hasFirstData.CompareAndSwap(0, 1) {
 		t.firstTokenAt = now
 	}
 
 	// Detect first SSE data line (contains actual token content)
-	// This gives a more accurate TTFT than just first byte which may be headers
-	if t.hasFirstData.Load() < 2 && bytes.Contains(p, sseDataPrefix) {
+	if bytes.Contains(p, sseDataPrefix) {
 		t.hasFirstData.Store(2)
 		t.firstTokenAt = now
 	}
 
 	return len(p), nil
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/adapter/metrics/stream_tap.go` around lines 34 - 51, The Write
method on StreamTap is repeatedly calling bytes.Contains(p, sseDataPrefix) even
after hasFirstData has been set to 2; update StreamTap.Write to short-circuit
before doing the expensive bytes.Contains call by reading hasFirstData (via
hasFirstData.Load()) and returning early (or skipping the contains block) when
it is already >= 2, while preserving the existing CAS logic that sets
firstTokenAt and transitions from 0→1 and 1→2; ensure you still set firstTokenAt
when transitioning to states 1 or 2 and reference the hasFirstData,
sseDataPrefix, Write, and firstTokenAt symbols when making the change.

internal/adapter/metrics/collector_test.go (1)

30-31: Sleep-based synchronisation can cause flaky tests.

Using time.Sleep(50 * time.Millisecond) to wait for the consumer goroutine is timing-dependent and may cause intermittent failures under load. Consider adding a helper that polls GetRecentRequests with a timeout, or exposing a sync mechanism for tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/adapter/metrics/collector_test.go` around lines 30 - 31, Replace the
fragile time.Sleep-based sync in the test by waiting deterministically:
implement a small polling helper (e.g., waitForRecentRequests(timeout,
wantCount) used from the test) that repeatedly calls GetRecentRequests() until
the expected number of entries appears or a timeout elapses, or alternatively
expose a test-only synchronization point (channel or sync.WaitGroup) from the
consumer so the test can block until processing completes; update the test to
call that helper or use the exposed sync instead of time.Sleep(50 *
time.Millisecond) so the consumer goroutine is reliably observed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/adapter/metrics/collector.go`:
- Around line 191-206: The per-endpoint aggregation is incorrectly summing
values into AvgTTFTMs, AvgDurationMs and AvgTokensPerSec; change these to
running averages like you did for per-model stats: after you increment
es.TotalRequests (use n := es.TotalRequests) set es.AvgTTFTMs =
(es.AvgTTFTMs*float64(n-1) + m.TTFTMs) / float64(n) and likewise
es.AvgDurationMs = (es.AvgDurationMs*float64(n-1) + m.TotalDurationMs) /
float64(n). For TokensPerSecond, introduce a new counter field on
EndpointAggregatedStats (e.g., TokensPerSecondSamples or TokenSamplesCount),
increment it only when m.TokensPerSecond > 0, and compute es.AvgTokensPerSec =
(es.AvgTokensPerSec*float64(samples-1) + float64(m.TokensPerSecond)) /
float64(samples) instead of summing; reference symbols: m.EndpointName,
stats.ByEndpoint, EndpointAggregatedStats, AvgTTFTMs, AvgDurationMs,
AvgTokensPerSec, TotalRequests.
- Around line 174-189: The per-model TTFT average is being diluted because
ModelAggregatedStats.AvgTTFTMs is summed for every request but later divided by
TotalRequests; update the aggregation to track a separate non-zero TTFT counter
(e.g., add ModelAggregatedStats.NonzeroTTFTCount) and only increment it when
m.TTFTMs > 0 inside the block that updates per-model stats in collector.go
(where ModelAggregatedStats is referenced), and then change the per-model
average calculation to divide AvgTTFTMs by NonzeroTTFTCount (falling back to 0
when that count is zero) instead of TotalRequests so the per-model TTFTAvgMs
matches the global TTFTAvgMs behavior.

In `@internal/adapter/proxy/olla/service_retry.go`:
- Around line 191-221: The RequestMetricsEvent currently hardcodes IsStreaming:
true which can be incorrect; update the logic in the block that builds the
ports.RequestMetricsEvent (used when s.RequestMetricsRecorder != nil) to set
IsStreaming dynamically—derive it from the request/response context (e.g.,
inspect stats/response headers for "text/event-stream", a streaming flag on
stats or the incoming request, or consult streamTap.HasReceivedData()), and set
IsStreaming = streamDetectedBool; also fix the stray spacing so the IsStreaming
field aligns with other struct fields; keep all other metrics population
(streamTap, stats.ProviderMetrics) unchanged.

In `@internal/app/handlers/application.go`:
- Around line 169-177: The RequestCollector starts a background goroutine
(consumeLoop) that never stops unless Shutdown() is called, but Application
currently never calls Shutdown on its metricsCollector; add a Close method on
Application (e.g., func (a *Application) Close()) that checks a.metricsCollector
!= nil and calls a.metricsCollector.Shutdown(), and then invoke app.Close() from
HTTPService.Stop() so the collector goroutine is cleanly terminated on teardown
(referencing Application, metricsCollector, RequestCollector, Shutdown,
HTTPService.Stop).

In `@internal/app/handlers/handler_stats_requests.go`:
- Around line 67-72: The handler currently parses the "since" query using
time.ParseDuration and silently ignores parse errors, causing malformed inputs
to return unbounded results; update the parsing logic around
r.URL.Query().Get("since")/time.ParseDuration so that if the parameter is
non-empty and time.ParseDuration returns an error you respond with a 400 Bad
Request (e.g., using http.Error or w.WriteHeader + error body) and return
immediately, otherwise set since = time.Now().Add(-d) as before; reference the
existing "since" variable, the r.URL.Query().Get("since") call, and
time.ParseDuration to locate and modify the code.

---

Nitpick comments:
In `@internal/adapter/metrics/collector_test.go`:
- Around line 30-31: Replace the fragile time.Sleep-based sync in the test by
waiting deterministically: implement a small polling helper (e.g.,
waitForRecentRequests(timeout, wantCount) used from the test) that repeatedly
calls GetRecentRequests() until the expected number of entries appears or a
timeout elapses, or alternatively expose a test-only synchronization point
(channel or sync.WaitGroup) from the consumer so the test can block until
processing completes; update the test to call that helper or use the exposed
sync instead of time.Sleep(50 * time.Millisecond) so the consumer goroutine is
reliably observed.

In `@internal/adapter/metrics/stream_tap.go`:
- Around line 15-20: The comment notes that StreamTap.firstTokenAt is not
atomically protected while hasFirstData uses atomics; update the StreamTap type
comment to explicitly state the single-writer / sequential-io assumption (e.g.,
that StreamTap is only safe when used by a single writer such as via
io.TeeReader) and warn that concurrent writers would cause a data race on
firstTokenAt; reference StreamTap, firstTokenAt and hasFirstData in the comment
so future maintainers know the concurrency contract and consider promoting
atomic/time-safe access if concurrency is needed.
- Around line 34-51: The Write method on StreamTap is repeatedly calling
bytes.Contains(p, sseDataPrefix) even after hasFirstData has been set to 2;
update StreamTap.Write to short-circuit before doing the expensive
bytes.Contains call by reading hasFirstData (via hasFirstData.Load()) and
returning early (or skipping the contains block) when it is already >= 2, while
preserving the existing CAS logic that sets firstTokenAt and transitions from
0→1 and 1→2; ensure you still set firstTokenAt when transitioning to states 1 or
2 and reference the hasFirstData, sseDataPrefix, Write, and firstTokenAt symbols
when making the change.

In `@makefile`:
- Line 3: The VERSION variable currently uses a non-standard suffix
("v0.0.24m"); update the VERSION value to a semver-compatible form such as
"v0.0.24-metrics" or "v0.0.24+metrics" to avoid tooling parse issues—edit the
VERSION assignment in the Makefile (symbol: VERSION) to replace "v0.0.24m" with
your chosen pre-release or build metadata format and ensure any automation that
reads VERSION expects the new format.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4d1a156f-dcb2-4b24-9412-5df7c5a1cfed

📥 Commits

Reviewing files that changed from the base of the PR and between 83712ca and 2f187ca.

📒 Files selected for processing (12)

internal/adapter/metrics/collector.go
internal/adapter/metrics/collector_test.go
internal/adapter/metrics/stream_tap.go
internal/adapter/metrics/types.go
internal/adapter/proxy/core/base.go
internal/adapter/proxy/olla/service_retry.go
internal/app/handlers/application.go
internal/app/handlers/handler_stats_requests.go
internal/app/handlers/server_routes.go
internal/app/services/http.go
internal/core/ports/stats.go
makefile

coderabbitai · 2026-03-11T00:31:09Z

+		// Per-model stats
+		if m.Model != "" {
+			ms, ok := stats.ByModel[m.Model]
+			if !ok {
+				ms = &ModelAggregatedStats{}
+				stats.ByModel[m.Model] = ms
+			}
+			ms.TotalRequests++
+			ms.TotalInputTokens += int64(m.InputTokens)
+			ms.TotalOutputTokens += int64(m.OutputTokens)
+			ms.AvgTTFTMs += m.TTFTMs
+			ms.AvgDurationMs += m.TotalDurationMs
+			if m.TokensPerSecond > 0 {
+				ms.AvgTokensPerSec += float64(m.TokensPerSecond)
+			}
+		}


⚠️ Potential issue | 🟡 Minor

Per-model TTFT average is skewed by requests with zero TTFT.

AvgTTFTMs accumulates all values including zeros (non-streaming requests), but divides by TotalRequests at line 229. This differs from the global TTFTAvgMs which only considers non-zero values (lines 164-166, 211). Requests without TTFT data will dilute the per-model average.

Consider tracking a separate count for non-zero TTFT entries per model:

Proposed fix sketch

type ModelAggregatedStats struct { // ... + ttftCount int64 // internal counter for averaging } // In accumulation: -ms.AvgTTFTMs += m.TTFTMs +if m.TTFTMs > 0 { + ms.AvgTTFTMs += m.TTFTMs + ms.ttftCount++ +} // In averaging: -ms.AvgTTFTMs /= ms.TotalRequests +if ms.ttftCount > 0 { + ms.AvgTTFTMs /= ms.ttftCount +}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/adapter/metrics/collector.go` around lines 174 - 189, The per-model TTFT average is being diluted because ModelAggregatedStats.AvgTTFTMs is summed for every request but later divided by TotalRequests; update the aggregation to track a separate non-zero TTFT counter (e.g., add ModelAggregatedStats.NonzeroTTFTCount) and only increment it when m.TTFTMs > 0 inside the block that updates per-model stats in collector.go (where ModelAggregatedStats is referenced), and then change the per-model average calculation to divide AvgTTFTMs by NonzeroTTFTCount (falling back to 0 when that count is zero) instead of TotalRequests so the per-model TTFTAvgMs matches the global TTFTAvgMs behavior.

coderabbitai · 2026-03-11T00:31:09Z

+		// Per-endpoint stats
+		if m.EndpointName != "" {
+			es, ok := stats.ByEndpoint[m.EndpointName]
+			if !ok {
+				es = &EndpointAggregatedStats{}
+				stats.ByEndpoint[m.EndpointName] = es
+			}
+			es.TotalRequests++
+			es.TotalInputTokens += int64(m.InputTokens)
+			es.TotalOutputTokens += int64(m.OutputTokens)
+			es.AvgTTFTMs += m.TTFTMs
+			es.AvgDurationMs += m.TotalDurationMs
+			if m.TokensPerSecond > 0 {
+				es.AvgTokensPerSec += float64(m.TokensPerSecond)
+			}
+		}


⚠️ Potential issue | 🟡 Minor

Same TTFT averaging issue applies to per-endpoint stats.

See comment on per-model stats above. The same fix pattern would apply here.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/adapter/metrics/collector.go` around lines 191 - 206, The per-endpoint aggregation is incorrectly summing values into AvgTTFTMs, AvgDurationMs and AvgTokensPerSec; change these to running averages like you did for per-model stats: after you increment es.TotalRequests (use n := es.TotalRequests) set es.AvgTTFTMs = (es.AvgTTFTMs*float64(n-1) + m.TTFTMs) / float64(n) and likewise es.AvgDurationMs = (es.AvgDurationMs*float64(n-1) + m.TotalDurationMs) / float64(n). For TokensPerSecond, introduce a new counter field on EndpointAggregatedStats (e.g., TokensPerSecondSamples or TokenSamplesCount), increment it only when m.TokensPerSecond > 0, and compute es.AvgTokensPerSec = (es.AvgTokensPerSec*float64(samples-1) + float64(m.TokensPerSecond)) / float64(samples) instead of summing; reference symbols: m.EndpointName, stats.ByEndpoint, EndpointAggregatedStats, AvgTTFTMs, AvgDurationMs, AvgTokensPerSec, TotalRequests.

coderabbitai · 2026-03-11T00:31:09Z

+	// Record per-request LLM metrics (tokens, TTFT, throughput)
+	if s.RequestMetricsRecorder != nil {
+		event := ports.RequestMetricsEvent{
+			StartTime:        stats.StartTime,
+			EndTime:          stats.EndTime,
+			RequestID:        stats.RequestID,
+			Model:            stats.Model,
+			EndpointName:     endpoint.Name,
+			EndpointURL:      endpoint.URLString,
+			TotalDurationMs:  stats.Latency,
+			BackendLatencyMs: stats.BackendResponseMs,
+			StreamingMs:      stats.StreamingMs,
+			TotalBytes:       int64(stats.TotalBytes),
+			Success:          true,
+			IsStreaming:       true,
+		}
+		// Use real TTFT from StreamTap if available
+		if streamTap != nil && streamTap.HasReceivedData() {
+			event.FirstTokenAt = streamTap.FirstTokenTime()
+			event.TTFTMs = streamTap.TTFT()
+		}
+		// Merge provider metrics (tokens, throughput) if extracted
+		if stats.ProviderMetrics != nil {
+			pm := stats.ProviderMetrics
+			event.InputTokens = pm.InputTokens
+			event.OutputTokens = pm.OutputTokens
+			event.TotalTokens = pm.TotalTokens
+			event.TokensPerSecond = pm.TokensPerSecond
+		}
+		s.RequestMetricsRecorder.RecordRequestMetrics(event)
+	}


⚠️ Potential issue | 🟡 Minor

Hardcoded IsStreaming: true may not reflect actual request type.

Line 205 hardcodes IsStreaming: true, but this proxy path may also handle non-streaming requests. Consider deriving this from the request or response headers (e.g., checking for text/event-stream content type or stream parameter in the request body).

🔧 Suggested fix

event := ports.RequestMetricsEvent{ // ... other fields ... TotalBytes: int64(stats.TotalBytes), Success: true, - IsStreaming: true, + IsStreaming: stats.IsStreaming, // Or derive from Content-Type header }

Additionally, there's an extra space before true on line 205 (IsStreaming: true) which doesn't match the alignment of other fields.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/adapter/proxy/olla/service_retry.go` around lines 191 - 221, The RequestMetricsEvent currently hardcodes IsStreaming: true which can be incorrect; update the logic in the block that builds the ports.RequestMetricsEvent (used when s.RequestMetricsRecorder != nil) to set IsStreaming dynamically—derive it from the request/response context (e.g., inspect stats/response headers for "text/event-stream", a streaming flag on stats or the incoming request, or consult streamTap.HasReceivedData()), and set IsStreaming = streamDetectedBool; also fix the stray spacing so the IsStreaming field aligns with other struct fields; keep all other metrics population (streamTap, stats.ProviderMetrics) unchanged.

coderabbitai · 2026-03-11T00:31:09Z

+	// Create per-request LLM metrics collector (tokens, TTFT, throughput)
+	metricsCollector := metrics.NewRequestCollector()
+
 	return &Application{
 		Config:             cfg,
 		logger:             logger,
 		proxyService:       proxyService,
 		statsCollector:     statsCollector,
+		metricsCollector:   metricsCollector,


⚠️ Potential issue | 🟠 Major

Metrics collector goroutine leaks on Application teardown.

The RequestCollector starts a background goroutine (consumeLoop) that only exits when Shutdown() is called. However, Application doesn't expose a shutdown path, and HTTPService.Stop() (see internal/app/services/http.go:182-199) only shuts down the HTTP server without calling metricsCollector.Shutdown().

Consider adding a Close() method to Application that shuts down the collector:

Proposed fix

// Close releases resources held by the Application. func (a *Application) Close() { if a.metricsCollector != nil { a.metricsCollector.Shutdown() } }

Then call app.Close() from HTTPService.Stop().

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/app/handlers/application.go` around lines 169 - 177, The RequestCollector starts a background goroutine (consumeLoop) that never stops unless Shutdown() is called, but Application currently never calls Shutdown on its metricsCollector; add a Close method on Application (e.g., func (a *Application) Close()) that checks a.metricsCollector != nil and calls a.metricsCollector.Shutdown(), and then invoke app.Close() from HTTPService.Stop() so the collector goroutine is cleanly terminated on teardown (referencing Application, metricsCollector, RequestCollector, Shutdown, HTTPService.Stop).

coderabbitai · 2026-03-11T00:31:09Z

+	var since time.Time
+	if v := r.URL.Query().Get("since"); v != "" {
+		if d, err := time.ParseDuration(v); err == nil {
+			since = time.Now().Add(-d)
+		}
+	}


⚠️ Potential issue | 🟡 Minor

Invalid since parameter silently falls back to unbounded query.

If the user provides a malformed duration (e.g., ?since=invalid), the handler silently ignores it and returns all data. Consider returning a 400 Bad Request so callers know their input was invalid.

Proposed fix

var since time.Time if v := r.URL.Query().Get("since"); v != "" { - if d, err := time.ParseDuration(v); err == nil { + d, err := time.ParseDuration(v) + if err != nil { + http.Error(w, "Invalid 'since' duration format", http.StatusBadRequest) + return + } - since = time.Now().Add(-d) - } + since = time.Now().Add(-d) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

var since time.Time

if v := r.URL.Query().Get("since"); v != "" {

if d, err := time.ParseDuration(v); err == nil {

since = time.Now().Add(-d)

}

}

var since time.Time

if v := r.URL.Query().Get("since"); v != "" {

d, err := time.ParseDuration(v)

if err != nil {

http.Error(w, "Invalid 'since' duration format", http.StatusBadRequest)

return

}

since = time.Now().Add(-d)

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/app/handlers/handler_stats_requests.go` around lines 67 - 72, The handler currently parses the "since" query using time.ParseDuration and silently ignores parse errors, causing malformed inputs to return unbounded results; update the parsing logic around r.URL.Query().Get("since")/time.ParseDuration so that if the parameter is non-empty and time.ParseDuration returns an error you respond with a 400 Bad Request (e.g., using http.Error or w.WriteHeader + error body) and return immediately, otherwise set since = time.Now().Add(-d) as before; reference the existing "since" variable, the r.URL.Query().Get("since") call, and time.ParseDuration to locate and modify the code.

thushan · 2026-03-11T21:56:08Z

Thanks for your submission, it's greatly appreciated @miroovh!

To pass CI, maybe run this locally:

make ready

That will run tests and do all the formating etc as well as pre-commit tests. See here:
https://thushan.github.io/olla/development/overview/

Adding metrics information on token usage and performance

2f187ca

miroovh requested a review from thushan as a code owner March 11, 2026 00:24

coderabbitai Bot reviewed Mar 11, 2026

View reviewed changes

thushan self-assigned this Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding metrics information on token usage and performance#120

Adding metrics information on token usage and performance#120
miroovh wants to merge 1 commit intothushan:mainfrom
miroovh:metrics

miroovh commented Mar 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 11, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 11, 2026

Uh oh!

coderabbitai Bot Mar 11, 2026

Uh oh!

coderabbitai Bot Mar 11, 2026

Uh oh!

coderabbitai Bot Mar 11, 2026

Uh oh!

coderabbitai Bot Mar 11, 2026

Uh oh!

thushan commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

miroovh commented Mar 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

thushan commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

miroovh commented Mar 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 11, 2026 •

edited

Loading