DevOps Operations¶

This document covers infrastructure, deployment, and operational procedures for the Maricusco trading system.

Infrastructure Overview¶

Infrastructure Architecture¶

graph TB
    subgraph "External Access"
        USER[Users/Developers]
        LB[Load Balancer<br/>Optional]
    end

    subgraph "Application Layer"
        APP[Maricusco Application<br/>FastAPI + Uvicorn<br/>Port: 8000]
    end

    subgraph "Data Services"
        PG[(PostgreSQL<br/>TimescaleDB<br/>Port: 5432)]
        REDIS[(Redis Cache<br/>Port: 6379)]
        CHR[(ChromaDB<br/>Vector Store<br/>Port: 8000)]
    end

    subgraph "Monitoring Stack"
        PROM[Prometheus<br/>Metrics Collection<br/>Port: 9090]
        GRAF[Grafana<br/>Dashboards<br/>Port: 3000]
    end

    subgraph "Storage"
        VOL1[(postgres_data)]
        VOL2[(redis_data)]
        VOL3[(chromadb_data)]
        VOL4[(prometheus_data)]
        VOL5[(grafana_data)]
    end

    USER --> LB
    LB --> APP
    USER --> APP

    APP --> PG
    APP --> REDIS
    APP --> CHR
    APP -->|/metrics| PROM
    APP -->|/health| PROM

    PROM --> GRAF

    PG --> VOL1
    REDIS --> VOL2
    CHR --> VOL3
    PROM --> VOL4
    GRAF --> VOL5

    style APP fill:#7A9FB3,stroke:#6B8FA3,color:#fff
    style PG fill:#9B8AAB,stroke:#8B7A9B,color:#fff
    style REDIS fill:#9B8AAB,stroke:#8B7A9B,color:#fff
    style CHR fill:#9B8AAB,stroke:#8B7A9B,color:#fff
    style PROM fill:#7A9A7A,stroke:#6B8E6B,color:#fff
    style GRAF fill:#7A9A7A,stroke:#6B8E6B,color:#fff
    style LB fill:#C4A484,stroke:#B49474,color:#fff
    style VOL1 fill:#C4A484,stroke:#B49474,color:#fff
    style VOL2 fill:#C4A484,stroke:#B49474,color:#fff
    style VOL3 fill:#C4A484,stroke:#B49474,color:#fff
    style VOL4 fill:#C4A484,stroke:#B49474,color:#fff
    style VOL5 fill:#C4A484,stroke:#B49474,color:#fff

Service Architecture¶

The system runs as a containerized application with the following services:

Note: All services are automatically deployed and monitored through the CI/CD pipeline.

Service	Container	Port	Purpose
Application	`maricusco-app`	8000	Main FastAPI application
PostgreSQL	`maricusco-postgres`	5432	Time-series database (TimescaleDB)
Redis	`maricusco-redis`	6379	Caching layer
ChromaDB	`maricusco-chromadb`	8000	Vector memory storage
Prometheus	`maricusco-prometheus`	9090	Metrics collection
Grafana	`maricusco-grafana`	3000	Metrics visualization

Network Configuration¶

All services communicate via the maricusco-network bridge network. Services use internal DNS names (e.g., postgres, redis, chromadb) for inter-service communication.

Resource Limits¶

Default resource constraints per service:

app:
  limits: { cpus: '2.0', memory: 4G }
  reservations: { cpus: '0.5', memory: 512M }

postgres:
  limits: { cpus: '2.0', memory: 2G }
  reservations: { cpus: '0.5', memory: 512M }

redis:
  limits: { cpus: '1.0', memory: 1G }
  reservations: { cpus: '0.25', memory: 256M }

chromadb:
  limits: { cpus: '2.0', memory: 2G }
  reservations: { cpus: '0.5', memory: 512M }

grafana:
  limits: { cpus: '1.0', memory: 512M }
  reservations: { cpus: '0.25', memory: 128M }

Adjust these values in docker-compose.yml based on workload requirements.

Deployment¶

Prerequisites¶

Docker 20.10+ and Docker Compose 2.0+
Minimum 8GB RAM, 4 CPU cores
20GB free disk space for volumes

Initial Deployment¶

Clone repository:

git clone https://github.com/Maricusco/multi-agent-trading.git
cd multi-agent-trading

Configure environment:

cp .env.example .env
# Edit .env with production values

Set required environment variables:

# API Keys
export OPENAI_API_KEY=sk-...
export ALPHA_VANTAGE_API_KEY=...

# Database credentials
export POSTGRES_PASSWORD=<secure-password>
export POSTGRES_USER=maricusco
export POSTGRES_DB=maricusco

# Grafana credentials
export GF_SECURITY_ADMIN_PASSWORD=<secure-password>

Start services:
```
docker-compose up -d
```

Verify deployment:

docker-compose ps
# All services should show "healthy" status

# Check application health
curl http://localhost:8000/health

Application Configuration¶

The application entrypoint supports environment variable overrides:

# Uvicorn configuration
export APP_MODULE=maricusco.api.app:app
export APP_HOST=0.0.0.0
export APP_PORT=8000
export UVICORN_WORKERS=4  # Scale workers
export UVICORN_RELOAD=false
export EXTRA_UVICORN_ARGS="--log-level info"

Scaling¶

Horizontal scaling (multiple app instances):

Update docker-compose.yml:
```
app:
  deploy:
    replicas: 3
```
Use a load balancer (nginx, traefik) in front of multiple instances.

Vertical scaling (resource limits):

Update resource limits in docker-compose.yml based on monitoring metrics.

Health Checks¶

sequenceDiagram
    participant DC as Docker Compose
    participant APP as Application
    participant PG as PostgreSQL
    participant RD as Redis
    participant CH as ChromaDB

    Note over DC: Health check interval: 30s

    DC->>APP: Health check request<br/>(/usr/local/bin/healthcheck)
    APP->>APP: Check /health endpoint
    APP->>PG: Test connection<br/>(pg_isready)
    PG-->>APP: Connection status
    APP->>RD: Test connection<br/>(redis-cli ping)
    RD-->>APP: PONG
    APP->>CH: HTTP heartbeat<br/>(/api/v1/heartbeat)
    CH-->>APP: 200 OK
    APP-->>DC: Health status<br/>(healthy/unhealthy)

    Note over DC,CH: All dependencies must be healthy<br/>for container to be marked healthy

Health checks run every 30 seconds with a 2-second timeout:

Application: /health endpoint (required)
PostgreSQL: pg_isready command
Redis: redis-cli ping
ChromaDB: HTTP heartbeat endpoint

Verify health status:

docker-compose ps
# Check individual service
docker exec maricusco-app curl -f http://localhost:8000/health

Environment Management¶

Environment Variables¶

Configuration priority (highest to lowest): 1. Environment variables 2. .env file 3. Default values in code

Security: - Never commit .env files to version control - Use secrets management (AWS Secrets Manager, HashiCorp Vault) in production - Rotate credentials regularly

See Configuration Reference for complete environment variable documentation.

CI/CD Pipeline¶

The project uses GitHub Actions for continuous integration and deployment. See CI/CD Pipeline for complete documentation.

Key stages: 1. Lock file validation 2. Code quality (lint, type check, security) 3. Test suite execution 4. Docker image build and scan 5. Deployment (manual or automated)

Local CI simulation:

make ci-check  # Run all CI checks locally

Monitoring and Observability¶

Monitoring Flow¶

flowchart LR
    subgraph "Application"
        APP[Maricusco App]
        METRICS[/metrics endpoint]
        HEALTH[/health endpoint]
    end

    subgraph "Collection"
        PROM[Prometheus<br/>Scrapes every 15s]
    end

    subgraph "Visualization"
        GRAF[Grafana<br/>Dashboards]
    end

    subgraph "Storage"
        TSDB[(Time Series DB<br/>15 day retention)]
    end

    APP --> METRICS
    APP --> HEALTH
    METRICS -->|HTTP GET| PROM
    HEALTH -->|HTTP GET| PROM
    PROM --> TSDB
    PROM -->|Query| GRAF
    TSDB -->|Query| GRAF
    GRAF -->|Display| USER[Users/Operators]

    style APP fill:#7A9FB3,stroke:#6B8FA3,color:#fff
    style PROM fill:#7A9A7A,stroke:#6B8E6B,color:#fff
    style GRAF fill:#7A9A7A,stroke:#6B8E6B,color:#fff
    style TSDB fill:#9B8AAB,stroke:#8B7A9B,color:#fff
    style METRICS fill:#C4A484,stroke:#B49474,color:#fff
    style HEALTH fill:#C4A484,stroke:#B49474,color:#fff

See Monitoring and Metrics for complete documentation on metrics, logging, and Grafana dashboards.

Security¶

Container Security¶

Non-root user (appuser, UID 1000) in application container
Minimal base images (Python slim)
Multi-stage builds to reduce image size
Regular security scans via Trivy in CI/CD

Network Security¶

Services communicate via internal Docker network
Expose only necessary ports (8000 for app, 3000 for Grafana)
Use reverse proxy (nginx, traefik) for TLS termination

Secrets Management¶

Production recommendations: - Use secrets management service (AWS Secrets Manager, HashiCorp Vault) - Never hardcode credentials - Rotate API keys and passwords regularly - Use least-privilege access principles

Security Scanning¶

Docker image scanning:

docker run --rm \
  -v /var/run/docker.sock:/var/run/docker.sock \
  aquasec/trivy image maricusco:latest

Dependency scanning:

uv run pip-audit --desc

Maintenance¶

Log Management¶

View logs:

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f app

# Last 100 lines
docker-compose logs --tail=100 app

Log cleanup:

# Clean old logs (if file logging enabled)
find maricusco/logs -name "*.log.*" -mtime +30 -delete

Database Maintenance¶

PostgreSQL maintenance:

# Vacuum database
docker exec maricusco-postgres psql -U maricusco maricusco -c "VACUUM ANALYZE;"

# Check database size
docker exec maricusco-postgres psql -U maricusco maricusco -c "SELECT pg_size_pretty(pg_database_size('maricusco'));"

Redis maintenance:

# Check memory usage
docker exec maricusco-redis redis-cli INFO memory

# Clear cache (use with caution)
docker exec maricusco-redis redis-cli FLUSHDB

Updates and Upgrades¶

Application update:

# Pull latest code
git pull origin main

# Rebuild and restart
docker-compose build app
docker-compose up -d app

Dependency updates: - Dependabot automatically creates PRs for dependency updates - Review and merge PRs after CI passes - See CI/CD Pipeline for details

Troubleshooting¶

Service Won't Start¶

Check logs:

docker-compose logs app

Common issues: - Missing environment variables → Check .env file - Port conflicts → Change port mappings in docker-compose.yml - Insufficient resources → Increase resource limits - Database connection failure → Verify POSTGRES_* environment variables

Health Check Failures¶

Application unhealthy:

# Check health endpoint directly
docker exec maricusco-app curl -v http://localhost:8000/health

# Check dependency connectivity
docker exec maricusco-app ping postgres
docker exec maricusco-app ping redis

Database connection issues:

# Test PostgreSQL connection
docker exec maricusco-postgres pg_isready -U maricusco

# Check PostgreSQL logs
docker-compose logs postgres

Performance Issues¶

High memory usage:

# Check container resource usage
docker stats

# Adjust resource limits in docker-compose.yml

Slow queries:

# Enable query logging in PostgreSQL
# Check slow query logs
docker-compose logs postgres | grep "slow query"

Data Issues¶

Corrupted or missing data: 1. Check volume mounts: docker volume inspect maricusco_postgres_data 2. Check service logs: docker-compose logs <service> 3. Verify database connectivity and permissions

Operational Procedures¶

Service Restart¶

Restart all services:

docker-compose restart

Restart specific service:

docker-compose restart app

Graceful restart (zero downtime):

# Rolling restart with new image
docker-compose up -d --no-deps --build app

Service Scaling¶

Scale application:

# Scale to 3 instances
docker-compose up -d --scale app=3

Note: Use load balancer for multiple instances. Docker Compose scaling is limited for production use.

Maintenance Windows¶

Scheduled maintenance: 1. Notify users of maintenance window 2. Stop services: docker-compose down 3. Perform maintenance (updates, backups, etc.) 4. Start services: docker-compose up -d 5. Verify health: curl http://localhost:8000/health

References¶

Docker Setup - Detailed Docker configuration
CI/CD Pipeline - Continuous integration and deployment
Monitoring and Metrics - Observability setup
Configuration Reference - Complete configuration options