Skip to content

DevOps Operations

This document covers infrastructure, deployment, and operational procedures for the Maricusco trading system.

Infrastructure Overview

Infrastructure Architecture

graph TB
    subgraph "External Access"
        USER[Users/Developers]
        LB[Load Balancer<br/>Optional]
    end

    subgraph "Application Layer"
        APP[Maricusco Application<br/>FastAPI + Uvicorn<br/>Port: 8000]
    end

    subgraph "Data Services"
        PG[(PostgreSQL<br/>TimescaleDB<br/>Port: 5432)]
        REDIS[(Redis Cache<br/>Port: 6379)]
        CHR[(ChromaDB<br/>Vector Store<br/>Port: 8000)]
    end

    subgraph "Monitoring Stack"
        PROM[Prometheus<br/>Metrics Collection<br/>Port: 9090]
        GRAF[Grafana<br/>Dashboards<br/>Port: 3000]
    end

    subgraph "Storage"
        VOL1[(postgres_data)]
        VOL2[(redis_data)]
        VOL3[(chromadb_data)]
        VOL4[(prometheus_data)]
        VOL5[(grafana_data)]
    end

    USER --> LB
    LB --> APP
    USER --> APP

    APP --> PG
    APP --> REDIS
    APP --> CHR
    APP -->|/metrics| PROM
    APP -->|/health| PROM

    PROM --> GRAF

    PG --> VOL1
    REDIS --> VOL2
    CHR --> VOL3
    PROM --> VOL4
    GRAF --> VOL5

    style APP fill:#7A9FB3,stroke:#6B8FA3,color:#fff
    style PG fill:#9B8AAB,stroke:#8B7A9B,color:#fff
    style REDIS fill:#9B8AAB,stroke:#8B7A9B,color:#fff
    style CHR fill:#9B8AAB,stroke:#8B7A9B,color:#fff
    style PROM fill:#7A9A7A,stroke:#6B8E6B,color:#fff
    style GRAF fill:#7A9A7A,stroke:#6B8E6B,color:#fff
    style LB fill:#C4A484,stroke:#B49474,color:#fff
    style VOL1 fill:#C4A484,stroke:#B49474,color:#fff
    style VOL2 fill:#C4A484,stroke:#B49474,color:#fff
    style VOL3 fill:#C4A484,stroke:#B49474,color:#fff
    style VOL4 fill:#C4A484,stroke:#B49474,color:#fff
    style VOL5 fill:#C4A484,stroke:#B49474,color:#fff

Service Architecture

The system runs as a containerized application with the following services:

Note: All services are automatically deployed and monitored through the CI/CD pipeline.

Service Container Port Purpose
Application maricusco-app 8000 Main FastAPI application
PostgreSQL maricusco-postgres 5432 Time-series database (TimescaleDB)
Redis maricusco-redis 6379 Caching layer
ChromaDB maricusco-chromadb 8000 Vector memory storage
Prometheus maricusco-prometheus 9090 Metrics collection
Grafana maricusco-grafana 3000 Metrics visualization

Network Configuration

All services communicate via the maricusco-network bridge network. Services use internal DNS names (e.g., postgres, redis, chromadb) for inter-service communication.

Resource Limits

Default resource constraints per service:

app:
  limits: { cpus: '2.0', memory: 4G }
  reservations: { cpus: '0.5', memory: 512M }

postgres:
  limits: { cpus: '2.0', memory: 2G }
  reservations: { cpus: '0.5', memory: 512M }

redis:
  limits: { cpus: '1.0', memory: 1G }
  reservations: { cpus: '0.25', memory: 256M }

chromadb:
  limits: { cpus: '2.0', memory: 2G }
  reservations: { cpus: '0.5', memory: 512M }

grafana:
  limits: { cpus: '1.0', memory: 512M }
  reservations: { cpus: '0.25', memory: 128M }

Adjust these values in docker-compose.yml based on workload requirements.

Deployment

Prerequisites

  • Docker 20.10+ and Docker Compose 2.0+
  • Minimum 8GB RAM, 4 CPU cores
  • 20GB free disk space for volumes

Initial Deployment

  1. Clone repository:

    git clone https://github.com/Maricusco/multi-agent-trading.git
    cd multi-agent-trading
    

  2. Configure environment:

    cp .env.example .env
    # Edit .env with production values
    

  3. Set required environment variables:

    # API Keys
    export OPENAI_API_KEY=sk-...
    export ALPHA_VANTAGE_API_KEY=...
    
    # Database credentials
    export POSTGRES_PASSWORD=<secure-password>
    export POSTGRES_USER=maricusco
    export POSTGRES_DB=maricusco
    
    # Grafana credentials
    export GF_SECURITY_ADMIN_PASSWORD=<secure-password>
    

  4. Start services:

    docker-compose up -d
    

  5. Verify deployment:

    docker-compose ps
    # All services should show "healthy" status
    
    # Check application health
    curl http://localhost:8000/health
    

Application Configuration

The application entrypoint supports environment variable overrides:

# Uvicorn configuration
export APP_MODULE=maricusco.api.app:app
export APP_HOST=0.0.0.0
export APP_PORT=8000
export UVICORN_WORKERS=4  # Scale workers
export UVICORN_RELOAD=false
export EXTRA_UVICORN_ARGS="--log-level info"

Scaling

Horizontal scaling (multiple app instances):

  1. Update docker-compose.yml:

    app:
      deploy:
        replicas: 3
    

  2. Use a load balancer (nginx, traefik) in front of multiple instances.

Vertical scaling (resource limits):

Update resource limits in docker-compose.yml based on monitoring metrics.

Health Checks

sequenceDiagram
    participant DC as Docker Compose
    participant APP as Application
    participant PG as PostgreSQL
    participant RD as Redis
    participant CH as ChromaDB

    Note over DC: Health check interval: 30s

    DC->>APP: Health check request<br/>(/usr/local/bin/healthcheck)
    APP->>APP: Check /health endpoint
    APP->>PG: Test connection<br/>(pg_isready)
    PG-->>APP: Connection status
    APP->>RD: Test connection<br/>(redis-cli ping)
    RD-->>APP: PONG
    APP->>CH: HTTP heartbeat<br/>(/api/v1/heartbeat)
    CH-->>APP: 200 OK
    APP-->>DC: Health status<br/>(healthy/unhealthy)

    Note over DC,CH: All dependencies must be healthy<br/>for container to be marked healthy

Health checks run every 30 seconds with a 2-second timeout:

  • Application: /health endpoint (required)
  • PostgreSQL: pg_isready command
  • Redis: redis-cli ping
  • ChromaDB: HTTP heartbeat endpoint

Verify health status:

docker-compose ps
# Check individual service
docker exec maricusco-app curl -f http://localhost:8000/health

Environment Management

Environment Variables

Configuration priority (highest to lowest): 1. Environment variables 2. .env file 3. Default values in code

Security: - Never commit .env files to version control - Use secrets management (AWS Secrets Manager, HashiCorp Vault) in production - Rotate credentials regularly

See Configuration Reference for complete environment variable documentation.

CI/CD Pipeline

The project uses GitHub Actions for continuous integration and deployment. See CI/CD Pipeline for complete documentation.

Key stages: 1. Lock file validation 2. Code quality (lint, type check, security) 3. Test suite execution 4. Docker image build and scan 5. Deployment (manual or automated)

Local CI simulation:

make ci-check  # Run all CI checks locally

Monitoring and Observability

Monitoring Flow

flowchart LR
    subgraph "Application"
        APP[Maricusco App]
        METRICS[/metrics endpoint]
        HEALTH[/health endpoint]
    end

    subgraph "Collection"
        PROM[Prometheus<br/>Scrapes every 15s]
    end

    subgraph "Visualization"
        GRAF[Grafana<br/>Dashboards]
    end

    subgraph "Storage"
        TSDB[(Time Series DB<br/>15 day retention)]
    end

    APP --> METRICS
    APP --> HEALTH
    METRICS -->|HTTP GET| PROM
    HEALTH -->|HTTP GET| PROM
    PROM --> TSDB
    PROM -->|Query| GRAF
    TSDB -->|Query| GRAF
    GRAF -->|Display| USER[Users/Operators]

    style APP fill:#7A9FB3,stroke:#6B8FA3,color:#fff
    style PROM fill:#7A9A7A,stroke:#6B8E6B,color:#fff
    style GRAF fill:#7A9A7A,stroke:#6B8E6B,color:#fff
    style TSDB fill:#9B8AAB,stroke:#8B7A9B,color:#fff
    style METRICS fill:#C4A484,stroke:#B49474,color:#fff
    style HEALTH fill:#C4A484,stroke:#B49474,color:#fff

See Monitoring and Metrics for complete documentation on metrics, logging, and Grafana dashboards.

Security

Container Security

  • Non-root user (appuser, UID 1000) in application container
  • Minimal base images (Python slim)
  • Multi-stage builds to reduce image size
  • Regular security scans via Trivy in CI/CD

Network Security

  • Services communicate via internal Docker network
  • Expose only necessary ports (8000 for app, 3000 for Grafana)
  • Use reverse proxy (nginx, traefik) for TLS termination

Secrets Management

Production recommendations: - Use secrets management service (AWS Secrets Manager, HashiCorp Vault) - Never hardcode credentials - Rotate API keys and passwords regularly - Use least-privilege access principles

Security Scanning

Docker image scanning:

docker run --rm \
  -v /var/run/docker.sock:/var/run/docker.sock \
  aquasec/trivy image maricusco:latest

Dependency scanning:

uv run pip-audit --desc

Maintenance

Log Management

View logs:

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f app

# Last 100 lines
docker-compose logs --tail=100 app

Log cleanup:

# Clean old logs (if file logging enabled)
find maricusco/logs -name "*.log.*" -mtime +30 -delete

Database Maintenance

PostgreSQL maintenance:

# Vacuum database
docker exec maricusco-postgres psql -U maricusco maricusco -c "VACUUM ANALYZE;"

# Check database size
docker exec maricusco-postgres psql -U maricusco maricusco -c "SELECT pg_size_pretty(pg_database_size('maricusco'));"

Redis maintenance:

# Check memory usage
docker exec maricusco-redis redis-cli INFO memory

# Clear cache (use with caution)
docker exec maricusco-redis redis-cli FLUSHDB

Updates and Upgrades

Application update:

# Pull latest code
git pull origin main

# Rebuild and restart
docker-compose build app
docker-compose up -d app

Dependency updates: - Dependabot automatically creates PRs for dependency updates - Review and merge PRs after CI passes - See CI/CD Pipeline for details

Troubleshooting

Service Won't Start

Check logs:

docker-compose logs app

Common issues: - Missing environment variables → Check .env file - Port conflicts → Change port mappings in docker-compose.yml - Insufficient resources → Increase resource limits - Database connection failure → Verify POSTGRES_* environment variables

Health Check Failures

Application unhealthy:

# Check health endpoint directly
docker exec maricusco-app curl -v http://localhost:8000/health

# Check dependency connectivity
docker exec maricusco-app ping postgres
docker exec maricusco-app ping redis

Database connection issues:

# Test PostgreSQL connection
docker exec maricusco-postgres pg_isready -U maricusco

# Check PostgreSQL logs
docker-compose logs postgres

Performance Issues

High memory usage:

# Check container resource usage
docker stats

# Adjust resource limits in docker-compose.yml

Slow queries:

# Enable query logging in PostgreSQL
# Check slow query logs
docker-compose logs postgres | grep "slow query"

Data Issues

Corrupted or missing data: 1. Check volume mounts: docker volume inspect maricusco_postgres_data 2. Check service logs: docker-compose logs <service> 3. Verify database connectivity and permissions

Operational Procedures

Service Restart

Restart all services:

docker-compose restart

Restart specific service:

docker-compose restart app

Graceful restart (zero downtime):

# Rolling restart with new image
docker-compose up -d --no-deps --build app

Service Scaling

Scale application:

# Scale to 3 instances
docker-compose up -d --scale app=3

Note: Use load balancer for multiple instances. Docker Compose scaling is limited for production use.

Maintenance Windows

Scheduled maintenance: 1. Notify users of maintenance window 2. Stop services: docker-compose down 3. Perform maintenance (updates, backups, etc.) 4. Start services: docker-compose up -d 5. Verify health: curl http://localhost:8000/health

References