This document describes the DevOps infrastructure and deployment processes for FlakeGuard.
FlakeGuard uses a modern DevOps stack with containerized services, automated CI/CD pipelines, comprehensive monitoring, and infrastructure as code principles.
-
Prerequisites
- Node.js 20+
- pnpm 8.15.1+
- Docker & Docker Compose
- Git
-
Setup
# Clone and setup git clone <repository> cd flakeguard cp .env.example .env # Install dependencies pnpm install # Start infrastructure docker-compose up -d postgres redis # Run migrations pnpm migrate:dev # Seed development data (optional) ./scripts/seed-data.sh development # Start development services docker-compose -f docker-compose.yml -f docker-compose.dev.yml --profile dev up -d
-
Access Services
- API: http://localhost:3000
- Web: http://localhost:3002
- Proxy: http://localhost:8080
- Database: localhost:5432
- Redis: localhost:6379
# Development with monitoring and seed data
./scripts/deploy.sh --environment development --monitoring --seed-data
# Staging deployment
./scripts/deploy.sh --environment staging --tag v1.2.3
# Production deployment
./scripts/deploy.sh --environment production --tag v1.2.3- API (
apps/api): Backend REST API service - Web (
apps/web): Frontend React application - Worker (
apps/worker): Background job processor - PostgreSQL: Primary database
- Redis: Cache and job queue
- Prometheus: Metrics collection
- Grafana: Metrics visualization
- NGINX: Reverse proxy and load balancer
┌─────────────────────────────────────────────────────────┐
│ Load Balancer │
│ (NGINX) │
└─────────────────┬───────────────────────────────────────┘
│
┌────────┴────────┐
│ │
┌────▼────┐ ┌────▼────┐
│ Web │ │ API │
│ (React) │ │(Node.js)│
└─────────┘ └────┬────┘
│
┌────────────┼────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│PostgreSQL│ │ Redis │ │ Worker │
│(Database)│ │ (Cache) │ │(Jobs) │
└─────────┘ └─────────┘ └─────────┘
- Triggers: Pull requests, pushes to main/develop
- Jobs:
- Lint & format check
- Type checking
- Unit & integration tests
- Security scanning
- Build verification
- Triggers: Pushes to main/develop, tags
- Jobs:
- Multi-service Docker builds
- Security scanning
- Image publishing to GHCR
- Automated deployments
- Triggers: Version tags (v*..)
- Jobs:
- Full test suite
- Release artifact creation
- GitHub release creation
- Production deployment triggers
graph TD
A[Code Commit] --> B[CI Pipeline]
B --> C{Tests Pass?}
C -->|No| D[Fail Build]
C -->|Yes| E[Build Docker Images]
E --> F[Security Scan]
F --> G{Branch?}
G -->|develop| H[Deploy to Staging]
G -->|main| I[Deploy to Production]
G -->|feature| J[End]
H --> K[Run Integration Tests]
I --> L[Create Release]
# docker-compose.yml
services:
postgres: # Database with health checks
redis: # Cache with persistence
prometheus: # Metrics collection
grafana: # Visualization# docker-compose.dev.yml
services:
api: # Hot reload, debugging
web: # Development server
worker: # Development worker
proxy: # NGINX reverse proxy- Health Checks: All services include comprehensive health checks
- Resource Limits: Memory and CPU limits configured
- Security: Non-root users, minimal base images
- Logging: Structured JSON logging
- Networking: Isolated bridge network
- Volumes: Persistent data and development mounts
scripts/deploy.sh: Main deployment orchestratorscripts/migrate.sh: Database migration automationscripts/seed-data.sh: Development data seedingscripts/health-check.sh: Service health monitoring
scripts/db/init/: Database initialization SQLscripts/seed/: Seed data generators
# Full development deployment
./scripts/deploy.sh --environment development --seed-data --monitoring
# Production deployment with specific tag
./scripts/deploy.sh --environment production --tag v1.2.3
# Database migrations only
./scripts/migrate.sh
# Health check all services
./scripts/health-check.sh comprehensive
# Seed test data
./scripts/seed-data.sh test- Prometheus: Application and infrastructure metrics
- Grafana: Dashboard and alerting
- Custom Metrics: Business logic metrics
- FlakeGuard Overview: Key performance indicators
- Infrastructure: System resource monitoring
- Business Metrics: Application-specific metrics
- Service Health: HTTP endpoints for all services
- Database Health: Connection and query performance
- Cache Health: Redis connectivity and performance
- Structured Logging: JSON format for all services
- Log Aggregation: Centralized logging (ready for ELK stack)
- Error Tracking: Comprehensive error reporting
- Non-root Users: All containers run as non-root
- Minimal Images: Alpine-based images
- Security Scanning: Trivy vulnerability scanning
- Secrets Management: Environment-based secrets
- Isolated Networks: Services communicate via Docker networks
- Rate Limiting: NGINX-based rate limiting
- HTTPS Ready: TLS configuration templates
- Environment Separation: Clear environment boundaries
- Service Authentication: API keys and JWT tokens
- Database Security: Connection pooling and access controls
- Purpose: Local development and testing
- Features: Hot reload, debugging, development tools
- Data: Seeded development data
- Monitoring: Optional monitoring stack
- Purpose: Pre-production testing
- Features: Production-like configuration
- Data: Sanitized production data or realistic test data
- Monitoring: Full monitoring stack
- Purpose: Live application serving users
- Features: High availability, backups, monitoring
- Data: Live user data with backups
- Monitoring: Full observability stack with alerting
# Database
DATABASE_URL=postgresql://user:pass@host:5432/db
REDIS_URL=redis://host:6379
# Application
NODE_ENV=production
API_KEY=secure-api-key
JWT_SECRET=jwt-signing-secret
# Monitoring
PROMETHEUS_URL=http://prometheus:9090
GRAFANA_URL=http://grafana:3000config/redis.conf: Redis optimizationconfig/prometheus.yml: Metrics scrapingconfig/grafana/: Dashboard provisioningconfig/nginx/: Reverse proxy configuration
- Database Backups: Automated PostgreSQL dumps
- Retention Policy: 30 days for production, 7 days for staging
- Verification: Backup integrity checks
# List available backups
ls -la backups/
# Restore from backup
docker-compose exec postgres pg_restore -U postgres -d flakeguard backup.sql
# Rollback deployment
./scripts/deploy.sh --rollback- Connection Pooling: Optimized connection management
- Query Optimization: Indexed queries and performance monitoring
- Caching: Redis-based application caching
- Resource Limits: Configured memory and CPU limits
- Horizontal Scaling: Docker Compose scaling support
- Load Balancing: NGINX upstream configuration
- Performance Metrics: Response time, throughput, error rates
- Resource Monitoring: CPU, memory, disk, network usage
- Alerting: Proactive issue notification
-
Service Won't Start
docker-compose logs [service] ./scripts/health-check.sh [service]
-
Database Connection Issues
./scripts/health-check.sh postgres docker-compose exec postgres pg_isready -
Performance Issues
- Check Grafana dashboards
- Review application logs
- Monitor resource usage
# Service status
docker-compose ps
# Service logs
docker-compose logs -f [service]
# Connect to service
docker-compose exec [service] bash
# Database shell
docker-compose exec postgres psql -U postgres flakeguard
# Redis CLI
docker-compose exec redis redis-cli-
Start Infrastructure
docker-compose up -d postgres redis
-
Run Migrations
./scripts/migrate.sh
-
Start Applications
# Option 1: Docker development docker-compose -f docker-compose.yml -f docker-compose.dev.yml --profile dev up -d # Option 2: Local development pnpm dev
# Unit tests
pnpm test
# Integration tests
pnpm test:integration
# E2E tests
pnpm test:e2e
# All tests with coverage
pnpm test:coverage# Linting
pnpm lint
# Type checking
pnpm typecheck
# Formatting
pnpm format
# All quality checks
pnpm quality# Scale API service
docker-compose up -d --scale api=3
# Scale worker service
docker-compose up -d --scale worker=2- NGINX: Upstream load balancing configuration
- Health Checks: Automatic unhealthy instance removal
- Session Affinity: Sticky sessions if needed
- Read Replicas: PostgreSQL read replica configuration
- Connection Pooling: PgBouncer integration
- Caching: Redis caching layer
- Use feature branches with descriptive names
- Write comprehensive tests for new features
- Follow conventional commit message format
- Update documentation with changes
- Test in staging before production deployment
- Use semantic versioning for releases
- Maintain deployment rollback capability
- Monitor deployments with health checks
- Regular security updates for base images
- Rotate secrets and API keys regularly
- Use least privilege access principles
- Monitor for security vulnerabilities
- Monitor application and infrastructure metrics
- Implement comprehensive logging
- Maintain up-to-date documentation
- Practice disaster recovery procedures
- Weekly: Review monitoring dashboards and logs
- Monthly: Update dependencies and security patches
- Quarterly: Disaster recovery testing
- Annually: Full security audit
- Uptime: Service availability monitoring
- Performance: Response time and throughput tracking
- Errors: Error rate and pattern analysis
- Resources: Infrastructure utilization monitoring
Keep this documentation updated with:
- Infrastructure changes
- New deployment procedures
- Configuration updates
- Troubleshooting solutions
For additional support, refer to the project README or contact the development team.