Conversation
…cing Implement comprehensive observability infrastructure for all services with Prometheus metrics collection, OpenTelemetry distributed tracing, and standardized monitoring endpoints. Changes: - Create shared/observability package with metrics, tracing, and middleware - Add Prometheus metrics for HTTP requests, database operations, message queues, cache, and business events - Add OpenTelemetry distributed tracing with console and OTLP exporters - Create MetricsMiddleware for automatic HTTP metrics collection - Add /metrics endpoint to all services (Sessions, Memory, API Gateway) - Instrument FastAPI applications with OpenTelemetry - Configure tracing in service lifespan with service name and version Metrics collected: - http_requests_total (counter by service, method, endpoint, status) - http_request_duration_seconds (histogram by service, method, endpoint) - http_requests_in_progress (gauge by service, method, endpoint) - db_operations_total, db_operation_duration_seconds, db_connections_active - mq_messages_published/consumed_total, mq_message_processing_duration_seconds - cache_operations_total, cache_hit_rate - memories_created/retrieved_total, sessions_created_total, events_added_total Tracing features: - Service-level tracing with resource attributes (service.name, service.version) - FastAPI automatic instrumentation for request/response traces - Console export for development, OTLP export for production 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/metricsendpoint to all services for Prometheus scrapingImplementation Details
Shared Observability Package (shared/observability/)
metrics.py: Prometheus metrics definitions for HTTP, database, message queues, cache, and business eventstracing.py: OpenTelemetry setup with resource configuration and FastAPI instrumentationmiddleware.py: MetricsMiddleware for automatic HTTP metrics collectionPrometheus Metrics (metrics.py:16-116)
HTTP Metrics:
http_requests_total- Counter by service, method, endpoint, status_codehttp_request_duration_seconds- Histogram with 11 buckets (5ms - 10s)http_requests_in_progress- Gauge by service, method, endpointDatabase Metrics:
db_operations_total- Counter by service, operation, table, statusdb_operation_duration_seconds- Histogram with 9 buckets (1ms - 1s)db_connections_active- Gauge by serviceMessage Queue Metrics:
mq_messages_published_total/mq_messages_consumed_totalmq_message_processing_duration_seconds- Histogram (100ms - 2min)Cache Metrics:
cache_operations_total- Counter by operation and statuscache_hit_rate- Gauge (0-1)Business Metrics:
memories_created_total,memories_retrieved_totalsessions_created_total,events_added_totalOpenTelemetry Tracing (tracing.py:17-75)
Service Integration
All services now include:
/metricsendpoint returning Prometheus-formatted metricsSessions Service (services/sessions/app/main.py):
/metricsMemory Service (services/memory/app/main.py):
/metricsAPI Gateway (services/gateway/app/main.py):
/metrics/metricslinkMetrics Endpoint Usage
All services now expose Prometheus metrics:
Example metrics output:
Distributed Tracing
OpenTelemetry traces are automatically generated for:
In development, traces are exported to console. In production, configure OTLP endpoint:
Future Enhancements
This PR provides the foundation for observability. Future PRs can add:
Test Plan
🤖 Generated with Claude Code