Skip to content

Approach #11: Runtime Verification & Telemetry for Production Safety #9

@ikennaokpala

Description

@ikennaokpala

Runtime Verification: Monitor Behavior in Production

Instrument code with telemetry. Check invariants at runtime. Detect violations before users are impacted.

Architecture

Future<void> approveBooking(String id) async {
  final span = telemetry.startSpan('approve_booking');
  
  try {
    span.setAttribute('booking_id', id);
    await _apiService.approve(id);
    
    // Runtime invariant check
    _validateInvariants();  // Throws if violated
    
    span.setStatus(SpanStatus.ok);
  } catch (e) {
    span.recordException(e);
    telemetry.recordMetric('booking_errors', 1);
    rethrow;
  } finally {
    span.end();
  }
}

void _validateInvariants() {
  if (trip.availableSeats < 0) {
    throw InvariantViolation('Seats cannot be negative');
  }
  if (totalBooked > trip.capacity) {
    throw InvariantViolation('Capacity exceeded');
  }
}

Monitors

monitor:
  name: Booking Health
  
  metrics:
    - approval_success_rate: '>95%'
    - approval_latency_p99: '<3000ms'
  
  invariants:
    - name: non_negative_seats
      query: COUNT(*) FROM trips WHERE seats < 0
      threshold: '== 0'
      alert: 'CRITICAL: Negative seats detected!'

Implementation

  1. Add telemetry to ALL async operations
  2. Add invariant checks after state changes
  3. Run app with monitoring enabled
  4. For violations: trace → GitHub issue → fix
  5. Verify healthy telemetry post-fix

Strengths

✅ Real-world validation (catches production issues)
✅ Continuous monitoring (always watching)
✅ Evidence-based (data shows what's happening)
✅ Fast feedback (alerts within minutes)

Rating: ⭐⭐⭐⭐⭐ (5/5) Essential for production

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions