Skip to content

Latest commit

 

History

History
1063 lines (826 loc) · 35.1 KB

File metadata and controls

1063 lines (826 loc) · 35.1 KB

Lab 5: Adding OpenTelemetry Metrics with Aspire Dashboard

Overview

In this lab, you'll enhance the Chain of Responsibility pipeline by adding OpenTelemetry (OTEL) metrics using Aspire 9.5.1. You'll create a custom pipeline behavior that tracks three key metrics: cache hits, cache misses, and successful role additions. These metrics will be visible in the Aspire Dashboard for observability.

Learning Objectives

By the end of this lab, you will be able to:

  • Configure OpenTelemetry metrics in an Aspire 9.5.1 application
  • Create custom OTEL metrics using System.Diagnostics.Metrics
  • Build a pipeline behavior for telemetry collection
  • Integrate metrics tracking with caching behavior
  • View metrics in the Aspire Dashboard
  • Understand the benefits of observability in distributed systems

Prerequisites

  • Completion of Lab 4 (Chain of Responsibility with Pipeline Behaviors)
  • Understanding of pipeline behaviors and the Mediator pattern
  • Basic knowledge of observability concepts
  • Aspire 9.5.1 project structure (already configured in this solution)
  • Running Aspire application (use dotnet run in the AppHost project)

Important Notes

⚠️ Meter Name Consistency: Ensure the meter name in ServiceDefaults matches exactly with the meter name in TelemetryService ("NimblePros.DAB.Web.Telemetry").

⚠️ Build Before Testing: Always run dotnet build before testing to ensure there are no compile errors.

⚠️ Behavior Registration Order: The order of behavior registration in Program.cs affects execution order. TelemetryBehavior should be registered last among universal behaviors to capture the complete request lifecycle.

Part 1: Understanding OpenTelemetry Metrics

What are OpenTelemetry Metrics?

OpenTelemetry (OTEL) is an open-source observability framework that provides APIs, libraries, and instrumentation for collecting, processing, and exporting telemetry data (metrics, logs, and traces).

Metrics are numerical measurements that represent the state of your application over time:

  • Counters: Cumulative values that only increase (e.g., total requests, cache misses)
  • Gauges: Values that can go up or down (e.g., active connections, memory usage)
  • Histograms: Distributions of values (e.g., request duration)

The Three Metrics We'll Track

  1. Cache Hits (role_cache_hits_total) - Counter

    • Incremented when ListRoles finds data in the cache
    • Helps measure cache effectiveness
  2. Cache Misses (role_cache_misses_total) - Counter

    • Incremented when ListRoles needs to query the database
    • Indicates when cache is cold or data has expired
  3. Roles Added (roles_added_total) - Counter

    • Incremented when a role is successfully created
    • Tracks business functionality usage

Benefits of Metrics

Performance Monitoring: Track application performance over time
Capacity Planning: Understand usage patterns and resource needs
SLA Monitoring: Measure against service level agreements
Alerting: Set up alerts when metrics exceed thresholds
Debugging: Identify performance bottlenecks and issues

Part 2: Reviewing Current Aspire Setup

Current Aspire Configuration

Our solution already includes Aspire 9.5.1 with basic OTEL configuration. Let's examine the current setup:

ServiceDefaults (src/NimblePros.DAB.ServiceDefaults/Extensions.cs):

public static IHostApplicationBuilder ConfigureOpenTelemetry(this IHostApplicationBuilder builder)
{
    builder.Logging.AddOpenTelemetry(logging =>
    {
        logging.IncludeFormattedMessage = true;
        logging.IncludeScopes = true;
    });

    builder.Services.AddOpenTelemetry()
        .WithMetrics(metrics =>
        {
            metrics.AddAspNetCoreInstrumentation()
                .AddHttpClientInstrumentation()
                .AddRuntimeInstrumentation();
        })
        .WithTracing(tracing =>
        {
            tracing.AddAspNetCoreInstrumentation()
                .AddHttpClientInstrumentation();
        });

    builder.AddOpenTelemetryExporters();

    return builder;
}

Program.cs already calls:

builder.AddServiceDefaults(); // This configures OTEL

This gives us:

  • ✅ Basic OTEL setup with OTLP exporter
  • ✅ ASP.NET Core instrumentation (HTTP requests, etc.)
  • ✅ HttpClient instrumentation
  • ✅ .NET Runtime instrumentation
  • ✅ Aspire Dashboard integration

Part 3: Creating the Telemetry Pipeline Behavior

Step 1: Update ServiceDefaults for Custom Metrics

First, we need to configure our custom metrics source. Edit src/NimblePros.DAB.ServiceDefaults/Extensions.cs:

Add this line to the WithMetrics configuration:

public static IHostApplicationBuilder ConfigureOpenTelemetry(this IHostApplicationBuilder builder)
{
    builder.Logging.AddOpenTelemetry(logging =>
    {
        logging.IncludeFormattedMessage = true;
        logging.IncludeScopes = true;
    });

    builder.Services.AddOpenTelemetry()
        .WithMetrics(metrics =>
        {
            metrics.AddAspNetCoreInstrumentation()
                .AddHttpClientInstrumentation()
                .AddRuntimeInstrumentation()
                .AddMeter("NimblePros.DAB.Web.Telemetry"); // Add this line
        })
        .WithTracing(tracing =>
        {
            tracing.AddAspNetCoreInstrumentation()
                .AddHttpClientInstrumentation();
        });

    builder.AddOpenTelemetryExporters();

    return builder;
}

Step 2: Create the Communication Service

Create src/NimblePros.DAB.Web/04_Chain/Services/TelemetryContext.cs:

namespace NimblePros.DAB.Web._04_Chain.Services;

public interface ITelemetryContext
{
  void RecordCacheHit();
  void RecordCacheMiss();
  void RecordRoleAdded(string roleName);
  void FlushToTelemetry(ITelemetryService telemetryService);
}

public class TelemetryContext : ITelemetryContext
{
  private readonly List<Action<ITelemetryService>> _pendingActions = new();
  private readonly ILogger<TelemetryContext> _logger;

  public TelemetryContext(ILogger<TelemetryContext> logger)
  {
    _logger = logger;
  }

  public void RecordCacheHit()
  {
    _pendingActions.Add(ts => ts.TrackCacheHit());
  }

  public void RecordCacheMiss()
  {
    _pendingActions.Add(ts => ts.TrackCacheMiss());
  }

  public void RecordRoleAdded(string roleName)
  {
    _pendingActions.Add(ts => ts.TrackRoleAdded(roleName));
  }

  public void FlushToTelemetry(ITelemetryService telemetryService)
  {
    foreach (var action in _pendingActions)
    {
      try
      {
        action(telemetryService);
      }
      catch (Exception ex)
      {
        // Log but don't throw - telemetry shouldn't break the pipeline
        _logger.LogWarning(ex, "Failed to execute telemetry action");
      }
    }
    
    _pendingActions.Clear();
  }
}

Step 3: Create the Shared Telemetry Service

Create the telemetry service: src/NimblePros.DAB.Web/04_Chain/Services/TelemetryService.cs

using System.Diagnostics.Metrics;

namespace NimblePros.DAB.Web._04_Chain.Services;

public interface ITelemetryService
{
  void TrackCacheHit();
  void TrackCacheMiss();
  void TrackRoleAdded(string roleName);
}

public class TelemetryService : ITelemetryService, IDisposable
{
  private readonly ILogger<TelemetryService> _logger;
  private readonly Meter _meter;
  private readonly Counter<long> _cacheHitsCounter;
  private readonly Counter<long> _cacheMissesCounter;
  private readonly Counter<long> _rolesAddedCounter;

  public TelemetryService(ILogger<TelemetryService> logger)
  {
    _logger = logger;
    
    // Create a single meter for the entire application
    _meter = new Meter("NimblePros.DAB.Web.Telemetry");
    
    // Create counters once and reuse them
    _cacheHitsCounter = _meter.CreateCounter<long>(
      name: "role_cache_hits_total",
      description: "Total number of role cache hits");
      
    _cacheMissesCounter = _meter.CreateCounter<long>(
      name: "role_cache_misses_total", 
      description: "Total number of role cache misses");
      
    _rolesAddedCounter = _meter.CreateCounter<long>(
      name: "roles_added_total",
      description: "Total number of roles successfully added");
  }

  public void TrackCacheHit()
  {
    _cacheHitsCounter.Add(1);
    _logger.LogDebug("Tracked cache hit");
  }

  public void TrackCacheMiss()
  {
    _cacheMissesCounter.Add(1);
    _logger.LogDebug("Tracked cache miss");
  }

  public void TrackRoleAdded(string roleName)
  {
    _rolesAddedCounter.Add(1);
    _logger.LogDebug("Tracked role creation: {RoleName}", roleName);
  }

  public void Dispose()
  {
    _meter?.Dispose();
  }
}

Step 3: Create the Clean Telemetry Behavior

Now create: src/NimblePros.DAB.Web/04_Chain/PipelineBehaviors/TelemetryBehavior.cs

using Ardalis.GuardClauses;
using Ardalis.Result;
using Mediator;
using NimblePros.DAB.Web._04_Chain.Services;
using NimblePros.DAB.Web._04_Chain.UseCases.Create;
using NimblePros.DAB.Web.UseCases;

namespace NimblePros.DAB.Web._04_Chain.PipelineBehaviors;

public class TelemetryBehavior<TRequest, TResponse> : IPipelineBehavior<TRequest, TResponse>
  where TRequest : IMessage
{
  private readonly ITelemetryService _telemetryService;
  private readonly ITelemetryContext _telemetryContext;
  private readonly ILogger<TelemetryBehavior<TRequest, TResponse>> _logger;

  public TelemetryBehavior(
    ITelemetryService telemetryService,
    ITelemetryContext telemetryContext,
    ILogger<TelemetryBehavior<TRequest, TResponse>> logger)
  {
    _telemetryService = telemetryService;
    _telemetryContext = telemetryContext;
    _logger = logger;
  }

  public async ValueTask<TResponse> Handle(
    TRequest request,
    MessageHandlerDelegate<TRequest, TResponse> next,
    CancellationToken cancellationToken)
  {
    Guard.Against.Null(request, nameof(request));

    // Execute the pipeline - other behaviors can record events in ITelemetryContext
    var response = await next(request, cancellationToken);

    // Track business-level telemetry directly
    TrackBusinessEvents(request, response);

    // Flush all recorded events to telemetry service
    _telemetryContext.FlushToTelemetry(_telemetryService);

    return response;
  }

  private void TrackBusinessEvents(TRequest request, TResponse response)
  {
    try
    {
      // Track successful role creation directly (business logic)
      if (request is CreateRoleCommand createCommand && 
          response is Result<RoleDetails> createResult && 
          createResult.IsSuccess)
      {
        _telemetryContext.RecordRoleAdded(createCommand.RoleName);
      }
      
      // Infrastructure events (cache hits/misses) are recorded by other behaviors
    }
    catch (Exception ex)
    {
      _logger.LogWarning(ex, "Failed to track business telemetry for request {RequestType}", 
        typeof(TRequest).Name);
    }
  }
}

Key Features:

  • Wraps the entire pipeline - ensures all events are flushed
  • Handles both direct and indirect events - business logic + infrastructure signals
  • Clean separation - telemetry logic isolated here
  • Error handling - telemetry failures don't break the pipeline

Step 4: Enhanced Caching Behavior (Clean Separation)

Using the recommended scoped service approach, update the CachingBehavior:

using Ardalis.GuardClauses;
using Ardalis.Result;
using Mediator;
using Microsoft.AspNetCore.Identity;
using Microsoft.Extensions.Caching.Memory;
using NimblePros.DAB.Web._04_Chain.Services;
using NimblePros.DAB.Web._04_Chain.UseCases.Create;
using NimblePros.DAB.Web._04_Chain.UseCases.List;

namespace NimblePros.DAB.Web._04_Chain.PipelineBehaviors;

public class CachingBehavior<TRequest, TResponse> : IPipelineBehavior<TRequest, TResponse>
  where TRequest : IMessage
{
  private readonly IMemoryCache _cache;
  private readonly ITelemetryContext _telemetryContext;
  private readonly ILogger<CachingBehavior<TRequest, TResponse>> _logger;
  private readonly MemoryCacheEntryOptions _cacheOptions;

  public CachingBehavior(
    IMemoryCache cache,
    ITelemetryContext telemetryContext,
    ILogger<CachingBehavior<TRequest, TResponse>> logger)
  {
    _cache = cache;
    _telemetryContext = telemetryContext;
    _logger = logger;
    
    _cacheOptions = new MemoryCacheEntryOptions()
      .SetAbsoluteExpiration(relative: TimeSpan.FromSeconds(Constants.DEFAULT_CACHE_SECONDS));
  }

  public async ValueTask<TResponse> Handle(
    TRequest request,
    MessageHandlerDelegate<TRequest, TResponse> next,
    CancellationToken cancellationToken)
  {
    Guard.Against.Null(request, nameof(request));

    // Only cache ListRoles requests
    if (request is ListRolesRequest listRequest)
    {
      return await HandleListRolesWithCaching(listRequest, next, cancellationToken);
    }

    // For non-cacheable requests (like CreateRoleCommand), 
    // clear the cache and proceed normally
    var response = await next(request, cancellationToken);
    
    if (request is CreateRoleCommand createCommand)
    {
      ClearCache(createCommand);
    }

    return response;
  }

  private async ValueTask<TResponse> HandleListRolesWithCaching(
    ListRolesRequest request, 
    MessageHandlerDelegate<TRequest, TResponse> next,
    CancellationToken cancellationToken)
  {
    var cacheKey = GenerateCacheKey(request);
    
    // Try to get from cache first
    if (_cache.TryGetValue(cacheKey, out var cachedResult) && 
        cachedResult is TResponse cachedResponse)
    {
      // Record cache hit event for telemetry
      _telemetryContext.RecordCacheHit();
      _logger.LogDebug("Cache hit for ListRoles request");
      return cachedResponse;
    }

    // Record cache miss event for telemetry
    _telemetryContext.RecordCacheMiss();
    _logger.LogDebug("Cache miss for ListRoles request, fetching from database");
    
    var response = await next((TRequest)(object)request, cancellationToken);
    
    // Cache the response if it's successful
    if (response is Result<List<IdentityRole>> result && result.IsSuccess)
    {
      _cache.Set(cacheKey, response, _cacheOptions);
      _logger.LogDebug("Cached ListRoles response for {CacheSeconds} seconds", 
        Constants.DEFAULT_CACHE_SECONDS);
    }

    return response;
  }

  private void ClearCache(CreateRoleCommand command)
  {
    // Clear the list cache when a new role is added
    var listCacheKey = GenerateCacheKey(new ListRolesRequest());
    _cache.Remove(listCacheKey);
    _logger.LogDebug("Cleared ListRoles cache after creating role: {RoleName}", 
      command.RoleName);
  }

  private static string GenerateCacheKey(object request)
  {
    // Simple cache key generation based on request type
    return $"{request.GetType().Name}";
  }
}

Key Benefits:

  • No telemetry dependencies in caching behavior
  • Clean separation of caching and telemetry concerns
  • Easy to test - mock ITelemetryContext
  • Type-safe - no magic strings or casting

Step 5: Update the CachingBehavior

The existing CachingBehavior.cs needs to be enhanced to track cache hits and misses. Find the file in src/NimblePros.DAB.Web/04_Chain/PipelineBehaviors/CachingBehavior.cs and update it:

// Add ITelemetryContext to the constructor
private readonly ITelemetryContext _telemetryContext;

public CachingBehavior(
  IMemoryCache cache, 
  ILogger<CachingBehavior<TRequest, TResponse>> logger,
  ITelemetryContext telemetryContext) // Add this parameter
{
  _cache = cache;
  _logger = logger;
  _telemetryContext = telemetryContext; // Add this assignment
}

Then update the caching methods to record telemetry events:

// In cache hit scenario:
_telemetryContext.RecordCacheHit();
_logger.LogDebug("Cache hit for {RequestType} request", typeof(TRequest).Name);

// In cache miss scenario:
_telemetryContext.RecordCacheMiss();
_logger.LogDebug("Cache miss for {RequestType} request, fetching from database", typeof(TRequest).Name);

Important: The CachingBehavior may need to handle both ListRolesRequest and ListRolesWithAttributesRequest. Ensure you have a generic method like HandleGenericListRolesCaching to handle both request types.

Step 6: Register Services and Behaviors

Edit src/NimblePros.DAB.Web/Program.cs to register all the services:

// Register telemetry services
builder.Services.AddSingleton<ITelemetryService, TelemetryService>();
builder.Services.AddScoped<ITelemetryContext, TelemetryContext>();

// Add pipeline behaviors (order matters!)
builder.Services.AddScoped(typeof(IPipelineBehavior<,>), typeof(LoggingBehavior<,>)); // universal
builder.Services.AddScoped(typeof(IPipelineBehavior<,>), typeof(AuthorizationBehavior<,>)); // universal  
builder.Services.AddScoped(typeof(IPipelineBehavior<,>), typeof(ValidationBehavior<,>)); // universal
builder.Services.AddScoped(typeof(IPipelineBehavior<,>), typeof(TelemetryBehavior<,>)); // universal - LAST
builder.Services.AddScoped(typeof(IPipelineBehavior<,>), typeof(AttributedBehaviorExecutor<,>)); // universal - handles attributed behaviors

Critical Notes:

  • TelemetryBehavior should be registered last among universal behaviors to flush all accumulated telemetry events
  • AttributedBehaviorExecutor must be registered last to execute attributed behaviors like CachingBehavior
  • Order of registration determines execution order in the pipeline builder.Services.AddScoped(typeof(IPipelineBehavior<,>), typeof(TelemetryBehavior<,>)); // universal builder.Services.AddScoped(typeof(IPipelineBehavior<,>), typeof(AttributedBehaviorExecutor<,>)); // execute attributed behaviors

**Architecture Flow:**

Request → TelemetryBehavior (wraps everything) → AttributedBehaviorExecutor → CachingBehavior (records events to ITelemetryContext) → Handler ← Response flows back ← TelemetryBehavior (flushes ITelemetryContext to metrics)


**Benefits of This Architecture:**
- ✅ **Single Responsibility**: Each behavior has one job
- ✅ **Separation of Concerns**: Telemetry logic isolated in TelemetryBehavior
- ✅ **Communication**: Behaviors can signal events without tight coupling
- ✅ **Performance**: Minimal overhead, shared services

### Step 7: Build and Test the Implementation

Before testing, ensure your code compiles:

```bash
# Navigate to the Web project
cd src/NimblePros.DAB.Web

# Build the project to check for errors
dotnet build

# If successful, start the Aspire application
cd ../NimblePros.DAB.AppHost
dotnet run

Look for the dashboard URL in the console output (usually https://localhost:17xxx).

Part 4: Testing the Telemetry Implementation

Step 1: Access the Aspire Dashboard

  1. Start the application using dotnet run in the AppHost project
  2. Copy the dashboard URL from the console output
  3. Open the dashboard in your browser
  4. Navigate to the Metrics section

Step 2: Test Cache Metrics

Use the following endpoints to test cache behavior:

For Chain pattern endpoints:

  • GET https://localhost:7011/Chain/Roles (first call - cache miss)
  • GET https://localhost:7011/Chain/Roles (second call - cache hit)

For Pipeline Attributes pattern endpoints:

  • GET https://localhost:7011/PipelineAttributes/Roles (first call - cache miss)
  • GET https://localhost:7011/PipelineAttributes/Roles (second call - cache hit)

Test role creation:

  • POST https://localhost:7011/Chain/Roles or https://localhost:7011/PipelineAttributes/Roles
  • Body: {"roleName": "TestRole"}
  • Testability: Easy to mock and test each component

Part 4: Testing the Metrics

Step 1: Run the Application

  1. Start the solution:

    cd c:\dev\github-nimblepros\RefactorToPipelineArchitecture
    dotnet run --project src/NimblePros.DAB.AppHost
  2. The Aspire Dashboard should open automatically at https://localhost:17191 (or similar)

  3. Navigate to the Metrics section in the Aspire Dashboard

Step 2: Generate Metrics Data

Use the API endpoints to generate telemetry data:

  1. Generate Cache Misses (first request hits database):

    GET https://localhost:7070/04_Chain/roles/pipeline
    Authorization: Bearer {your-jwt-token}
  2. Generate Cache Hits (subsequent requests within cache window):

    GET https://localhost:7070/04_Chain/roles/pipeline
    Authorization: Bearer {your-jwt-token}
  3. Generate Role Creation Metrics:

    POST https://localhost:7070/04_Chain/roles/pipeline
    Authorization: Bearer {your-jwt-token}
    Content-Type: application/json
    
    {
      "roleName": "TestRole1"
    }
  4. Generate More Cache Misses (cache cleared after role creation):

    GET https://localhost:7070/04_Chain/roles/pipeline
    Authorization: Bearer {your-jwt-token}

Step 3: View Metrics in Aspire Dashboard

In the Aspire Dashboard:

  1. Navigate to Metrics section

  2. Look for your custom metrics:

    • role_cache_hits_total
    • role_cache_misses_total
    • roles_added_total
  3. Create visualizations:

    • Add charts for each metric
    • Set appropriate time ranges
    • Watch metrics update in real-time

Expected Behavior

First GET request to /Chain/Roles or /PipelineAttributes/Roles:

  • role_cache_misses_total increments (cold cache)
  • No cache hit
  • Database query executes

Second GET request (within 30-second cache window):

  • role_cache_hits_total increments (warm cache)
  • No database query
  • Response served from cache

POST request to create role:

  • roles_added_total increments (successful creation)
  • Cache gets cleared automatically
  • New role appears in subsequent GET requests

Third GET request (after role creation):

  • role_cache_misses_total increments again (cache cleared)
  • Fresh data loaded from database

Troubleshooting

If metrics don't appear in Aspire Dashboard:

  1. Check meter name consistency: Verify ServiceDefaults/Extensions.cs and TelemetryService.cs use the exact same meter name
  2. Verify behavior registration: Ensure both ITelemetryContext and TelemetryBehavior are registered in Program.cs
  3. Check logs: Look for telemetry-related log messages with 📊 and 🔄 emojis
  4. Build verification: Run dotnet build to ensure no compile errors
  5. Endpoint testing: Test the correct endpoints (/Chain/Roles or /PipelineAttributes/Roles) that have caching behavior

Part 5: Understanding the Pipeline Flow with Telemetry

Enhanced Pipeline Execution Order

With our new TelemetryBehavior, the pipeline now executes in this order:

Request → LoggingBehavior → AuthorizationBehavior → ValidationBehavior → TelemetryBehavior → AttributedBehaviorExecutor → Handler

For ListRoles with CachingBehavior Attribute:

ListRolesRequest
├── LoggingBehavior (universal)
├── AuthorizationBehavior (universal) 
├── ValidationBehavior (universal)
├── TelemetryBehavior (universal)
├── AttributedBehaviorExecutor (universal)
│   └── CachingBehavior (attributed)
│       ├── Cache Check
│       ├── Cache Hit → Increment cache_hits_total
│       └── Cache Miss → Increment cache_misses_total → Call Handler
└── ListRolesHandler → Return roles

For CreateRoleCommand:

CreateRoleCommand  
├── LoggingBehavior (universal)
├── AuthorizationBehavior (universal)
├── ValidationBehavior (universal) 
├── TelemetryBehavior (universal)
│   └── [On Response] Increment roles_added_total (if successful)
└── CreateRoleHandler → Add role → Return result

Part 6: Advanced Pattern - Behavior Communication

Better Separation of Concerns

Instead of having the CachingBehavior directly call telemetry, we can use a communication pattern where behaviors pass information through the pipeline. This keeps telemetry concerns isolated in the TelemetryBehavior and works in any context (web, CLI, background services, etc.).

The Clean Approach: Scoped Service Communication

Why This Approach?

  • Context-Independent: Works in web, CLI, background services
  • Clean DI: Uses dependency injection properly
  • Type-Safe: No magic strings or casting
  • Testable: Easy to mock dependencies
  • Performance: Minimal overhead

Step 1: Create a Scoped Telemetry Context Service

Create src/NimblePros.DAB.Web/04_Chain/Services/TelemetryContext.cs:

namespace NimblePros.DAB.Web._04_Chain.Services;

public interface ITelemetryContext
{
  void RecordCacheHit();
  void RecordCacheMiss();
  void RecordRoleAdded(string roleName);
  void FlushToTelemetry(ITelemetryService telemetryService);
}

public class TelemetryContext : ITelemetryContext
{
  private readonly List<Action<ITelemetryService>> _pendingActions = new();
  private readonly ILogger<TelemetryContext> _logger;

  public TelemetryContext(ILogger<TelemetryContext> logger)
  {
    _logger = logger;
  }

  public void RecordCacheHit()
  {
    _pendingActions.Add(ts => ts.TrackCacheHit());
  }

  public void RecordCacheMiss()
  {
    _pendingActions.Add(ts => ts.TrackCacheMiss());
  }

  public void RecordRoleAdded(string roleName)
  {
    _pendingActions.Add(ts => ts.TrackRoleAdded(roleName));
  }

  public void FlushToTelemetry(ITelemetryService telemetryService)
  {
    foreach (var action in _pendingActions)
    {
      try
      {
        action(telemetryService);
      }
      catch (Exception ex)
      {
        // Log but don't throw - telemetry shouldn't break the pipeline
        _logger.LogWarning(ex, "Failed to execute telemetry action");
      }
    }
    
    _pendingActions.Clear();
  }
}

Key Benefits:

  • Scoped per request: Each request gets its own context
  • Context-agnostic: No dependency on HttpContext or web infrastructure
  • Deferred execution: Actions are recorded and executed at the end
  • Error isolation: Telemetry failures don't break business logic

Part 7: Performance Considerations and Analysis

The Problem with Naive Implementations

❌ Anti-Pattern: Creating Metrics in Every Behavior Constructor

// DON'T DO THIS - Performance Issues!
public class BadTelemetryBehavior<TRequest, TResponse> : IPipelineBehavior<TRequest, TResponse>
{
  private readonly Meter _meter;
  private readonly Counter<long> _counter;

  public BadTelemetryBehavior()
  {
    // ❌ Creates new meter for EVERY request type!
    _meter = new Meter("MyApp"); 
    _counter = _meter.CreateCounter<long>("my_counter");
  }
}

Problems:

  1. Memory Waste: Creates separate meters/counters for every TRequest, TResponse combination
  2. Registration Overhead: 100 request types = 100 meter instances
  3. Metric Duplication: Same metric registered multiple times with OTEL
  4. Unnecessary Allocation: Metrics created even for requests that never use them

✅ Our Optimized Solution

Shared Telemetry Service Pattern:

// ✅ Efficient: One meter, shared across all requests
[Singleton] ITelemetryService -> Creates meters once[Scoped] TelemetryBehavior<T,R> -> Lightweight, just calls service
     ↓  
[Scoped] CachingBehavior<T,R> -> Uses same shared service

Benefits:

  1. Single Meter Instance: One meter for entire application
  2. Minimal Memory: Behaviors only hold service reference
  3. Fast Instantiation: No expensive meter creation in constructors
  4. Selective Tracking: Only track metrics for requests that need them
  5. Easy Testing: Mock ITelemetryService for unit tests

Part 7: Advanced Metrics Scenarios

Adding Custom Metrics Tags/Labels

You can enhance metrics with tags for better filtering and analysis:

// Enhanced counter with tags
_rolesAddedCounter.Add(1, new[] 
{
  new KeyValuePair<string, object?>("role_name", createResult.Value.Name),
  new KeyValuePair<string, object?>("user", "current_user")
});

Creating Histograms for Response Times

// Add to TelemetryBehavior
private readonly Histogram<double> _requestDurationHistogram;

public TelemetryBehavior(ILogger<TelemetryBehavior<TRequest, TResponse>> logger)
{
  // ... existing code ...
  
  _requestDurationHistogram = _meter.CreateHistogram<double>(
    name: "request_duration_ms",
    description: "Request processing duration in milliseconds");
}

public async ValueTask<TResponse> Handle(/* ... */)
{
  var stopwatch = Stopwatch.StartNew();
  
  var response = await next(request, cancellationToken);
  
  stopwatch.Stop();
  _requestDurationHistogram.Record(stopwatch.ElapsedMilliseconds, new[]
  {
    new KeyValuePair<string, object?>("request_type", typeof(TRequest).Name)
  });
  
  return response;
}

Part 7: Production Considerations

Metrics Best Practices

Use appropriate metric types:

  • Counters for things that only increase
  • Gauges for values that fluctuate
  • Histograms for distributions

Add meaningful labels but avoid high cardinality:

// Good: Low cardinality
new KeyValuePair<string, object?>("operation", "list_roles")

// Bad: High cardinality (unique per request)
new KeyValuePair<string, object?>("request_id", Guid.NewGuid().ToString())

Use descriptive names and descriptionsMonitor metric performance impactSet up alerting on key metrics

Deployment Considerations

For Production:

  • Configure OTLP endpoint to send to production observability stack
  • Set up proper metric retention policies
  • Configure alerting rules
  • Monitor metric collection overhead

Environment Variables for Production:

OTEL_EXPORTER_OTLP_ENDPOINT=https://your-otel-collector.company.com
OTEL_RESOURCE_ATTRIBUTES=service.name=NimblePros.DAB.Web,service.version=1.0.0

Part 8: Testing the Implementation

Unit Testing Pipeline Behaviors

You can unit test the telemetry behavior:

[Test]
public async Task TelemetryBehavior_Should_Increment_RolesAdded_Counter()
{
  // Arrange
  var logger = Mock.Of<ILogger<TelemetryBehavior<CreateRoleCommand, Result<RoleDetails>>>>();
  var behavior = new TelemetryBehavior<CreateRoleCommand, Result<RoleDetails>>(logger);
  
  var request = new CreateRoleCommand("TestRole");
  var expectedResponse = Result<RoleDetails>.Success(new RoleDetails("1", "TestRole"));
  
  var nextCalled = false;
  ValueTask<Result<RoleDetails>> Next(CreateRoleCommand req, CancellationToken ct)
  {
    nextCalled = true;
    return ValueTask.FromResult(expectedResponse);
  }
  
  // Act  
  var result = await behavior.Handle(request, Next, CancellationToken.None);
  
  // Assert
  Assert.That(nextCalled, Is.True);
  Assert.That(result.IsSuccess, Is.True);
  // Note: Testing actual metric increments requires more complex setup
}

Summary and Reflection

What You've Learned

  1. OpenTelemetry Metrics: Understanding of OTEL metrics types and usage
  2. Aspire Integration: How to configure custom metrics in Aspire 9.5.1
  3. Pipeline Telemetry: Creating behaviors that track business metrics
  4. Cache Metrics: Tracking cache effectiveness with hit/miss ratios
  5. Business Metrics: Tracking meaningful business operations
  6. Observability: Understanding the value of metrics for production systems

Key Takeaways

Benefits of Metrics in Pipeline Architecture:

  1. Non-Intrusive: Metrics tracking doesn't affect business logic
  2. Consistent: All requests automatically get telemetry tracking
  3. Flexible: Easy to add new metrics by modifying behaviors
  4. Testable: Behaviors can be unit tested independently
  5. Observable: Real-time visibility into application performance

Production Value:

  • Performance Monitoring: Track cache hit ratios, request counts
  • Capacity Planning: Understand usage patterns
  • SLA Monitoring: Measure against service level agreements
  • Incident Response: Quickly identify performance issues
  • Business Intelligence: Track feature usage and adoption

Pattern Evolution Summary

From Labs 1-5, we've built a complete observability story:

Lab 1 (Spaghetti): No observability, everything mixed together Lab 2 (Template Method): Basic logging in template methods Lab 3 (Decorator): Logging decorators for specific services Lab 4 (Chain of Responsibility): Universal logging behavior Lab 5 (OTEL Metrics): Complete observability with metrics, logs, and traces

Final Architecture Benefits

Aspect Lab 1 Lab 2 Lab 3 Lab 4 Lab 5
Business Logic Separation None Better Good Excellent Excellent
Observability None Basic Service-Level Request-Level Full OTEL
Metrics None None None None Custom + System
Testability Poor Better Good Excellent Excellent
Production-Ready No Partially Yes Yes Production-Ready

Next Steps

In Your Own Projects:

  • Start with Aspire ServiceDefaults for instant OTEL setup
  • Add custom metrics for business-critical operations
  • Create dashboards for key metrics in your observability platform
  • Set up alerting on important thresholds
  • Use metrics to guide performance optimization efforts

Advanced Topics to Explore:

  • Distributed Tracing: Track requests across microservices
  • Custom Exporters: Send metrics to specific APM systems
  • Metric Aggregation: Create business dashboards from OTEL data
  • Alerting Rules: Set up monitoring alerts based on metrics
  • Correlation: Link metrics, logs, and traces together

Common Pitfalls and Best Practices

❌ Common Mistakes:

  1. Meter Name Mismatch: Different names in ServiceDefaults and TelemetryService
  2. Wrong Behavior Order: TelemetryBehavior not registered last among universal behaviors
  3. Missing Dependencies: Forgetting to inject ITelemetryContext into CachingBehavior
  4. Testing Wrong Endpoints: Using endpoints without caching behavior
  5. Build Errors: Not running dotnet build before testing

✅ Best Practices:

  1. Consistent Naming: Use the same meter name across all configurations
  2. Descriptive Metrics: Use clear, business-meaningful metric names
  3. Proper Logging: Add emoji markers (📊, 🔄) for easy log filtering
  4. Unit Testing: Mock ITelemetryContext for behavior testing
  5. Documentation: Document which endpoints support which behaviors

Additional Resources


Congratulations! You've successfully built a production-ready pipeline architecture with complete observability using OpenTelemetry and Aspire 9.5.1. Your application now provides real-time insights into cache performance, business operations, and system health.

Questions or Issues? Open an issue in the GitHub repository or ask your instructor for clarification.