Skip to content

CI/CD Pipeline

This document describes the continuous integration and deployment pipeline for the Redhound trading system.

Overview

The project uses GitHub Actions for CI/CD with an optimized two-tier pipeline:

Pull Request workflow (fast validation): - Runs essential validation only (lint, type-check, security, tests excluding slow/integration) - Typically completes in 5-10 minutes - No Docker build (saves time and disk space) - Ensures PRs are clean before merge

Push workflow after merge (Docker build only): - Detects merge commits and skips redundant validation (already passed in PR) - Runs Docker build + Trivy security scan once - Prepares optimized image for deployment (~2.5GB) - Typically completes in 8-12 minutes (Docker build only)

Direct push workflow (no Docker build): - Runs full validation including all tests - Does NOT run Docker build (use workflow_dispatch for manual builds) - Use for hotfixes or direct commits to protected branches

Pipeline Architecture

graph TD
    Start[Push/PR] --> ShouldRun{Should Run?}
    ShouldRun -->|Yes| SetupVenv[Setup Virtual Env]
    ShouldRun -->|Skip| Skip[Skip Validation]
    ShouldRun -->|Yes| CheckLock[Check uv.lock Sync]

    CheckLock -->|In Sync| Wait[Wait for Setup]
    CheckLock -->|Out of Sync| AutoFix{Same Repo PR?}
    AutoFix -->|Yes| UpdateLock[Auto-update Lock]
    AutoFix -->|No| Fail[Fail: Run uv lock]
    UpdateLock --> NewRun[Trigger New Run]

    SetupVenv -->|Venv Ready| Wait
    Wait --> Parallel{Parallel Jobs}

    Parallel --> Lint[Lint & Format]
    Parallel --> TypeCheck[Type Check]
    Parallel --> Security[Security Scan]
    Parallel --> Secrets[Detect Secrets]
    Parallel --> PreCommit[Pre-commit Hooks]

    Lint --> Test[Test Suite]
    TypeCheck --> Test
    Security --> Test
    Secrets --> Test
    PreCommit --> Test

    Test --> MergeVal{PR Event?}
    MergeVal -->|Yes| ValidateMerge[Merge Validation]
    MergeVal -->|No| DockerBuild[Docker Build]

    ValidateMerge --> Notify[Notification]
    DockerBuild --> Notify

    Notify --> Success[✓ Success]

    style SetupVenv fill:#9C27B0
    style CheckLock fill:#4CAF50
    style Parallel fill:#2196F3
    style Test fill:#FF9800
    style Notify fill:#9C27B0
    style Fail fill:#F44336

Pipeline Stages

1. Setup Virtual Environment

Purpose: Prepare a shared virtual environment for all validation jobs

Actions: - Checkout code and setup Python/UV - Check for cached venv using hash of pyproject.toml and uv.lock - Detect if exact cache match or restore-key (partial) match - On exact match: use cached venv as-is (~30s to download 3.5GB) - On partial match or miss: run uv sync --locked --extra dev to update/create venv - Save updated cache for future runs

Benefits: - Eliminates redundancy: Only one uv sync per cache key instead of 6 - Faster CI: Saves 1-2+ minutes when cache misses - Consistency: All jobs use identical environment - Smart caching: Reuses partial matches but updates to current dependencies

Cache behavior:

key: venv-${{ runner.os }}-${{ hashFiles('pyproject.toml', 'uv.lock') }}
restore-keys: venv-${{ runner.os }}-  # Fallback to any OS-matching cache

Performance: - Exact cache hit: ~30 seconds (download 3.5GB at ~135 MB/s + extraction) - Partial match: ~1-2 minutes (restore old cache + incremental sync) - Cache miss: ~2-3 minutes (full sync from scratch)

Note: The 3.5GB cache size is due to ML/AI dependencies (PyTorch, CUDA, transformers). This is expected and optimal for projects with heavy dependencies. Download speed of ~135 MB/s is near the maximum for GitHub Actions cache infrastructure.

2. Check Lock

Purpose: Verify uv.lock is in sync with pyproject.toml

Actions: - Generate lock file with uv lock - Compare with committed lock file - Auto-update for Dependabot PRs - Fail for manual PRs with out-of-sync lock

Dependabot Auto-fix: When Dependabot creates a PR updating dependencies in pyproject.toml, the pipeline automatically: 1. Detects it's a Dependabot PR 2. Runs uv lock to update the lock file 3. Commits and pushes the updated uv.lock 4. Triggers a new workflow run with the updated lock file

Manual Fix:

# If check fails for your PR
make lock-sync
git add uv.lock
git commit -m "chore: update uv.lock"
git push

3. Lint and Format Check

Purpose: Enforce code style and quality standards

Tools: - ruff: Linting and formatting (replaces flake8, black, isort)

Checks: - Code formatting (PEP 8 compliance) - Import sorting - Unused imports and variables - Code complexity - Common code smells

Local Execution:

# Check formatting
uv run ruff format --check .

# Check linting
uv run ruff check .

# Auto-fix issues
uv run ruff check --fix .
uv run ruff format .

4. Type Check

Purpose: Validate type hints and catch type-related bugs

Tools: - pyright: Static type checker

Optimization: - PR workflow: Incremental type checking (only changed Python files) - Push workflow: Full type check (entire codebase)

Why incremental? - PRs: Fast feedback (~30-60s instead of 2-3 min) - Push: Comprehensive validation ensures no type regressions

Checks: - Type hint correctness - Type compatibility - Missing type annotations - Type inference issues

Local Execution:

# Run type checker (full)
npx pyright backend/ cli/

# Run on specific files (like CI does on PRs)
npx pyright path/to/changed/file.py

# Or via pre-commit
uv run pre-commit run pyright --all-files

5. Security Scan

Purpose: Identify security vulnerabilities

Tools: - bandit: Python security linter (always runs) - pip-audit: Dependency vulnerability scanner (conditional)

Optimization: - bandit: Always runs (fast, scans code) - pip-audit: Runs on push events or dependency PRs only (saves ~2 min on code-only PRs)

When pip-audit runs: - All push events (comprehensive validation) - PRs with "dep", "bump", "upgrade" in title - PRs with "dependencies" label - Dependabot PRs

Checks: - Common security issues (SQL injection, hardcoded passwords, etc.) - Known CVEs in dependencies - Insecure code patterns

Local Execution:

# Run bandit
uv run bandit -r backend/ cli/ -ll

# Run pip-audit
uv run pip-audit --desc

Note: Bandit failures block merge. pip-audit results are reported and block on CRITICAL/HIGH vulnerabilities.

6. Detect Secrets

Purpose: Prevent accidental commit of secrets

Tools: - detect-secrets: Secret detection

Checks: - API keys - Passwords - Private keys - Tokens - Other sensitive data

Baseline: .secrets.baseline file contains known false positives

Local Execution:

# Scan for secrets
uv run detect-secrets scan --baseline .secrets.baseline

# Update baseline (after verifying false positives)
uv run detect-secrets scan --update .secrets.baseline

7. Pre-commit Hooks

Purpose: Run additional quality checks

Hooks: - trailing-whitespace: Remove trailing whitespace - end-of-file-fixer: Ensure files end with newline - check-yaml: Validate YAML syntax - check-added-large-files: Prevent large file commits - check-merge-conflict: Detect merge conflict markers

Local Execution:

# Run all hooks
uv run pre-commit run --all-files

# Run specific hook
uv run pre-commit run trailing-whitespace --all-files

Note: Linting, type checking, and security scans run in dedicated CI jobs. Locally they run at pre-push (same hooks as CI), so push is blocked until they pass and CI does not fail on them.

8. Test Suite

Purpose: Validate functionality and maintain code quality

Configuration: - Runs in mock mode (no API costs) - Parallel execution with pytest-xdist - Coverage threshold: 10% (interim, will increase) - PR workflow: Excludes slow, integration, api, and talib tests (fast feedback) - Push workflow: Runs full suite including integration and slow tests (comprehensive)

Test Execution:

# PR tests (fast - unit only; typically 2-4 minutes)
pytest -v --tb=short -n auto -m "not slow and not integration and not api and not talib" \
  --cov=backend --cov-report=xml --cov-fail-under=10

# Push tests (comprehensive - typically 6-10 minutes)
pytest -v --tb=short -n auto \
  --cov=backend --cov-report=xml --cov-fail-under=10

Why this split: - PRs get fast feedback on core functionality - Push workflow (after merge) validates everything including slow/integration tests - Avoids running comprehensive tests twice (once in PR, once after merge)

Outputs: - JUnit XML for test results - Coverage XML for Codecov - Test result comments on PRs

Local Execution:

# Run tests like CI does
REDHOUND_MOCK_MODE=true make test

# Run with coverage
REDHOUND_MOCK_MODE=true pytest --cov=backend --cov-report=html

9. Merge Validation (PRs Only)

Purpose: Validate PR is ready for merge

Checks: - No merge conflicts - Required labels present (if configured) - All required checks passed

Required Labels: - dependencies: Required for dependency update PRs

Configuration:

# .github/workflows/cicd.yml
env:
  REQUIRED_PR_LABELS: "dependencies"  # Comma-separated

Override: Set to empty string to disable label requirement

10. Docker Build (Merge Commits Only)

Purpose: Build and scan Docker image for deployment

Actions: 1. Free up disk space - Remove unnecessary tooling (~40GB freed) - Removes .NET, Android SDK, GHC, CodeQL, Node modules, Azure CLI, Gradle, PowerShell - Prunes Docker images, volumes, and build cache - Prevents "no space left on device" errors during export 2. Build Docker image with BuildKit (uses UV_LINK_MODE=copy to suppress hardlink warnings) 3. Test image (validate imports and CLI) 4. Scan with Trivy for vulnerabilities 5. Upload scan results as artifact

When it runs: - Only on merge commits to main/master/dev (not on every push) - Skipped when only dependencies changed (no code changes) - Rationale: PR validation already passed, so only build once after merge - For manual Docker builds on direct pushes, use workflow_dispatch

Optimizations: - Only copies site-packages directory (not entire Python lib) - reduces image size from ~4.5GB to ~2.5GB - Export time reduced from ~4 minutes to ~2 minutes - UV uses copy mode to avoid hardlink warnings across filesystems

Vulnerability Thresholds: - CRITICAL: Fail build - HIGH: Fail build - MEDIUM: Report but pass - LOW: Report but pass

Local Execution:

# Build image
docker build -t redhound:test .

# Test image
docker run --rm --entrypoint python redhound:test -c "import backend; from backend.api.app import app"

# Scan image
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
  aquasec/trivy:latest image redhound:test

11. Notification

Purpose: Send pipeline status to Slack

Information Sent: - Repository and branch - Commit message and author - Pipeline status (success/failure) - Job results (passed/failed) - Vulnerability summary (if Docker build ran) - Link to workflow run

Configuration:

Required GitHub secrets: - SLACK_BOT_TOKEN: Bot User OAuth Token (starts with xoxb-) - SLACK_CHANNEL_ID: Channel ID where notifications will be sent (starts with C)

Setting Up Slack App:

  1. Create a Slack App:
  2. Go to https://api.slack.com/apps
  3. Click "Create New App" → "From scratch"
  4. Enter app name (e.g., "CI/CD Notifications")
  5. Select your Slack workspace
  6. Click "Create App"

  7. Configure Bot Token Scopes:

  8. In the app settings, go to "OAuth & Permissions" (left sidebar)
  9. Scroll to "Scopes" → "Bot Token Scopes"
  10. Add the following scopes:

    • chat:write - Send messages to channels
    • chat:write.public - Send messages to public channels (if posting to public channels)
  11. Install App to Workspace:

  12. Scroll to top of "OAuth & Permissions" page
  13. Click "Install to Workspace"
  14. Review permissions and click "Allow"
  15. Copy the "Bot User OAuth Token" (starts with xoxb-)

    • This is your SLACK_BOT_TOKEN
  16. Get Channel ID:

  17. Open Slack in your browser
  18. Navigate to the channel where you want notifications
  19. Look at the URL: https://yourworkspace.slack.com/archives/C1234567890
  20. The part after /archives/ is the Channel ID (starts with C)
  21. Alternatively, right-click the channel → "View channel details" → Channel ID is at the bottom

  22. Add Secrets to GitHub:

  23. Go to your repository → Settings → Secrets and variables → Actions
  24. Click "New repository secret"
  25. Add SLACK_BOT_TOKEN with the bot token from step 3
  26. Add SLACK_CHANNEL_ID with the channel ID from step 4

Note: Notification is optional and only runs if secrets are configured. If the bot is deleted or deactivated, you'll see account_inactive errors in workflow logs.

Additional Notification Workflows

Dependabot Notifications

Workflow: .github/workflows/dependabot-notifications.yml

Triggers: Dependabot PR opened or closed

Notifies: Update type (major/minor/patch) and PR status (opened/merged/closed)

Workflow Triggers

Pull Request Events

Triggers: PR to any branch when relevant files change

Runs: - Fast validation (lint, type-check, security, basic tests) - Skips Docker build (faster, saves disk space) - Typically completes in 5-10 minutes

Files monitored: - **.py (Python files) - pyproject.toml (dependencies) - uv.lock (lock file) - Dockerfile (container image) - docker-compose*.yml (compose files) - .pre-commit-config.yaml (pre-commit config) - .secrets.baseline (secrets baseline) - .github/workflows/** (workflow files)

Push Events (After Merge)

Triggers: Push to main, master, or dev branches

Behavior: - Merge commits (2+ parent commits): Runs Docker build only (no validation jobs) - Validation already passed in the PR - Only the Docker build job runs; Slack notification is sent by the "Notification (merge)" job - Typically completes in 3-5 minutes - Direct commits (1 parent): Runs full validation and Docker build - Use for hotfixes or direct commits - All validation jobs plus Docker build; main "Notification" job reports results - Typically completes in 10-15 minutes

Why different job sets: PR runs validation jobs only (no Docker). Merge runs Docker only. They do not run the same jobs with Docker skipped; each event runs a distinct set.

Manual Trigger

Can be manually triggered via GitHub Actions UI:

Actions → CI/CD → Run workflow

Runs comprehensive validation + Docker build regardless of commit type.

Concurrency Control

Strategy: Cancel in-progress runs when new commits are pushed

concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.head.ref || github.ref }}
  cancel-in-progress: true

Benefits: - Saves CI minutes - Faster feedback on latest changes - Prevents queue buildup

Caching Strategy

Shared Virtual Environment

Optimization: All validation jobs share a single virtual environment prepared by the setup-venv job.

How it works: 1. setup-venv job runs first after should-run 2. Checks for cached venv with key based on pyproject.toml and uv.lock hashes 3. On cache miss: runs uv sync --locked --extra dev once and caches result 4. On cache hit: skips sync entirely 5. All downstream jobs (lint, type-check, security, test) restore the same cached venv 6. Each job uses actions/cache/restore@v5 with fail-on-cache-miss: true for reliability

Why one venv (dev) for all jobs: Cache size (~3.5GB) is dominated by production dependencies (torch, transformers, etc.). Using a minimal ci extra does not reduce cache size meaningfully, and uv lock does not change when adding optional groups, so the cache key stays the same. One full dev venv keeps behavior simple and cache consistent. The ci extra in pyproject.toml remains available for optional local use (e.g. uv sync --extra ci for quick lint/type-check only).

Benefits: - Eliminates redundant uv sync calls: Only one sync per cache key (vs. 6 syncs previously) - Faster on cache miss: ~2-3 minutes saved when dependencies change - Faster on cache hit: All jobs restore cache (~1-2 min) and start work immediately - More reliable: Shared cache means all jobs use identical environment

Cache performance: - Cache size: ~295 MB (compressed) - Cache restore: ~1-2 minutes (download + decompress) - Cache save: ~30-60 seconds (compress + upload) - Note: Cache restore time is acceptable given it eliminates 5 redundant uv sync operations (each ~1-3 min)

Cache key structure:

key: venv-${{ runner.os }}-${{ hashFiles('pyproject.toml', 'uv.lock') }}

Cache invalidation: Cache refreshes automatically when: - Dependencies change in pyproject.toml - Lock file updates via uv lock - GitHub Actions cache eviction (7 days inactivity or storage limits)

Pre-commit Cache

- uses: actions/cache@v5
  with:
    path: ~/.cache/pre-commit
    key: pre-commit-${{ runner.os }}-${{ hashFiles('.pre-commit-config.yaml') }}

Cache Hit: Reuse pre-commit environments Cache Miss: Install pre-commit hooks

Docker Build Cache

cache-from: type=gha
cache-to: type=gha,mode=min

Mode: min caches only final image layers (faster cache upload, good cache hit rate) Alternative: max caches all intermediate layers (slower upload, more cache hits)

UV Cache

The setup-uv action includes built-in caching via enable-cache: true:

- uses: astral-sh/setup-uv@v7
  with:
    version: ${{ inputs.uv-version }}
    enable-cache: true

What it caches: - UV binary downloads - Python installations - Package downloads and wheels

Benefits: Faster setup across all jobs using UV

Environment Variables

Version Pinning

All tool versions are pinned in workflow environment:

env:
  PYTHON_VERSION: "3.12.12"
  UV_VERSION: "0.9.13"
  RUFF_VERSION: "0.14.8"
  PYRIGHT_VERSION: "1.1.407"
  DETECT_SECRETS_VERSION: "1.5.0"
  # ... more versions

Benefits: - Reproducible builds - Explicit version control - Easy version updates

Configuration Variables

env:
  REQUIRED_PR_LABELS: "dependencies"  # Comma-separated labels

Reusable Actions

Setup Python and UV

Custom composite action for consistent setup:

# .github/actions/setup-python-uv/action.yml
- name: Setup Python and UV
  uses: ./.github/actions/setup-python-uv
  with:
    python-version: ${{ env.PYTHON_VERSION }}
    uv-version: ${{ env.UV_VERSION }}

Actions: 1. Install Python 2. Install uv 3. Add uv to PATH

PR Comment

Custom action for posting comments on PRs:

- name: Comment PR
  uses: ./.github/actions/pr-comment
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    message: |
      ## Security Scan Results
      ...

Permissions

Workflow uses minimal required permissions:

permissions:
  contents: read          # Read repository contents
  pull-requests: write    # Comment on PRs
  checks: write           # Write check results
  actions: read           # Read workflow runs
  security-events: write  # Write security events
  statuses: write         # Write commit statuses

Timeouts

Each job has a timeout to prevent hanging:

  • should-run: 1 minute
  • setup-venv: 5 minutes
  • check-lock: 3 minutes
  • lint: 5 minutes
  • type-check: 8 minutes
  • security: 8 minutes
  • detect-secrets: 3 minutes
  • pre-commit: 5 minutes
  • test: 10 minutes
  • merge-validation: 2 minutes
  • docker-build: 15 minutes
  • notification: 2 minutes

Failure Handling

Job Dependencies

Jobs run in parallel when possible, with dependencies:

should-run (determines workflow scope: PR vs Push vs Merge)
  ├── setup-venv (prepares shared venv - runs once, skipped on merge commits)
  └── check-lock (verifies lock sync, skipped on merge commits)

Validation jobs (skipped on merge commits):
  ├── lint (uses shared venv)
  ├── type-check (uses shared venv)
  ├── security (uses shared venv)
  ├── detect-secrets (uses shared venv)
  ├── pre-commit (uses shared venv)
  └── test (uses shared venv, faster on PRs)

Docker (push events only):
  └── docker-build (always runs on pushes unless deps-only change)
      └── notification (always runs)

Workflow behavior by event type:

Event Type Validation Docker Duration
PR (code-only) ✓ Fast (incremental type-check, pip-audit skipped, no slow tests) ✗ Skipped 5-7 min
PR (with deps) ✓ Fast (incremental type-check, pip-audit runs, no slow tests) ✗ Skipped 7-10 min
Merge commit push ✗ Skipped ✓ Build only 3-5 min
Direct push ✓ Full (full type-check, pip-audit, all tests) ✓ Build 10-15 min
Manual ✓ Full (all tests) ✓ Build 10-15 min

Note: setup-venv and check-lock run in parallel as they have no interdependency.

Continue on Error

Some jobs continue on error: - security: Reports findings but doesn't fail - docker-build scan: Reports vulnerabilities as artifact

Conditional Execution

Jobs skip when not needed: - merge-validation: Only on PRs - docker-build: Only on pushes - notification: Only when core jobs succeed

Local CI Simulation

Run Full CI Locally

# 1. Check lock sync
make check-lock-sync

# 2. Run linting
uv run ruff format --check .
uv run ruff check .

# 3. Run type checking
npx pyright backend/ cli/

# 4. Run security scans
uv run bandit -r backend/ cli/ -ll
uv run pip-audit --desc

# 5. Run secret detection
uv run detect-secrets scan --baseline .secrets.baseline

# 6. Run pre-commit hooks
uv run pre-commit run --all-files

# 7. Run tests
REDHOUND_MOCK_MODE=true pytest -v -n auto --cov=backend

# 8. Build Docker image
docker build -t redhound:test .

Act (Run GitHub Actions Locally)

# Install act
brew install act  # macOS
# or
curl https://raw.githubusercontent.com/nektos/act/master/install.sh | sudo bash

# Run workflow locally
act push

# Run specific job
act -j test

# Run with secrets
act -s GITHUB_TOKEN=<token>

Troubleshooting

Lock File Out of Sync

Error: uv.lock is out of sync with pyproject.toml

Solution:

make lock-sync
git add uv.lock
git commit -m "chore: update uv.lock"
git push

Linting Failures

Error: Ruff linting or formatting errors

Solution:

# Auto-fix issues
uv run ruff check --fix .
uv run ruff format .

git add .
git commit -m "style: fix linting issues"
git push

Type Check Failures

Error: Pyright type errors

Solution:

# Run locally to see errors
npx pyright redhound/ cli/

# Fix type errors in code
# Add type hints, fix type mismatches, etc.

git add .
git commit -m "fix: resolve type errors"
git push

Test Failures

Error: Pytest test failures

Solution:

# Run tests locally
REDHOUND_MOCK_MODE=true pytest -v

# Debug specific test
REDHOUND_MOCK_MODE=true pytest -v tests/path/to/test.py::test_name

# Fix failing tests
git add .
git commit -m "fix: resolve test failures"
git push

Docker Build Failures

Error: Docker image build or scan failures

Solution:

# Build locally
docker build -t redhound:test .

# Check for errors in Dockerfile
# Fix dependency issues, etc.

git add Dockerfile
git commit -m "fix: resolve Docker build issues"
git push

Cache Issues

Symptoms: Unexpected failures, stale dependencies

Solution: 1. Go to GitHub Actions 2. Click on workflow run 3. Click "Re-run jobs" → "Re-run all jobs" 4. Check "Clear cache" option

Or manually clear cache:

# Via GitHub CLI
gh cache delete <cache-key>

# List caches
gh cache list

Docker Build "No Space Left" Errors

Error: failed to copy files: copy file range failed: no space left on device

Cause: GitHub Actions runners have limited disk space (~14GB free by default). Large Docker builds with heavy dependencies (PyTorch, CUDA libs) can exhaust this during the image export phase.

Solution (already implemented): The workflow automatically frees up ~40GB before building by removing: - .NET tools (/usr/share/dotnet) - Android SDK (/usr/local/lib/android) - GHC (/opt/ghc) - CodeQL (/opt/hostedtoolcache/CodeQL) - Node modules (/usr/local/lib/node_modules) - Azure CLI (/opt/az) - Gradle (/usr/share/gradle*) - PowerShell (/usr/local/share/powershell) - Unused Docker images, volumes, and build cache

Additional optimizations: - Docker image only copies site-packages (not entire Python lib), reducing size from 4.5GB → 2.5GB - Export phase completes faster, reducing disk pressure

Manual fix (if still failing):

# Check disk space
df -h

# Clean up more aggressively
sudo apt-get clean
sudo apt-get autoremove -y
sudo rm -rf /usr/share/swift
sudo rm -rf /usr/local/share/boost

Performance Optimization

Why CI Can Be Slow

CI performance depends on several factors:

Common slowdown causes:

  1. Cache misses - When pyproject.toml or uv.lock changes, the venv cache key changes. The setup-venv job runs uv sync --extra ci once (~30-60s with minimal deps), then all other jobs restore the cached venv instantly.

  2. GitHub Actions cache eviction - GitHub evicts caches after 7 days of inactivity or when storage limits are reached. This forces a fresh uv sync on the next run.

  3. Test suite - Tests (5-6 min) scale with codebase size. Already optimized: PRs exclude slow/integration/api tests.

  4. Type checking (optimized) - On PRs: incremental (30-60s). On push: full (2-3 min).

  5. Docker build - Full image build + Trivy scan on cache miss can take several minutes.

  6. Runner availability - During busy times, slow I/O or CPU wait can add latency.

How to Keep CI Fast

1. Prevent unnecessary lock changes:

Run uv lock locally when changing pyproject.toml to avoid "lock updated in CI → new run → cache miss" cycles:

# After editing pyproject.toml
uv lock
git add uv.lock
git commit -m "chore: update dependencies"

The uv-lock pre-commit hook now runs on both pre-commit and pre-push to catch this automatically.

2. Leverage optimizations already in place:

The workflow is heavily optimized for fast PR feedback:

Incremental type checking - Only changed files on PRs (saves 1-2 min) ✅ Conditional pip-audit - Skips on code-only PRs (saves ~2 min) ✅ Test filtering - PRs skip slow/integration tests (saves 2-3 min) ✅ Docker skip on PRs - No image build during PR review (saves 8-10 min)

3. Minimize dependency changes:

Group dependency updates when possible. Each lock change invalidates the venv cache for all subsequent runs until the cache is warmed.

4. Understand when pip-audit runs:

The workflow skips pip-audit on code-only PRs to save ~2 minutes. To trigger it: - Include "dep", "bump", or "upgrade" in PR title - Add "dependencies" label to PR - Push directly to main/master/dev (always runs)

5. Mark slow tests correctly:

Use pytest markers so CI skips them on PRs:

@pytest.mark.slow
def test_heavy_operation():
    ...

@pytest.mark.integration
def test_with_real_api():
    ...

6. Profile and optimize heavy operations:

Monitor job durations in GitHub Actions. If type-check or tests grow beyond reasonable limits: - Type-check: Already optimized with incremental checking on PRs - Tests: Already parallelized with pytest-xdist, consider splitting into more tiers if needed

Best Practices

1. Keep Branches Green

  • Run tests locally before pushing
  • Fix CI failures immediately
  • Don't merge PRs with failing checks

2. Update Dependencies Automatically

The project uses fully automated dependency management with GitHub's native auto-merge:

Zero-Touch Workflow: - Dependabot creates PRs weekly (Sunday 9am) - CI automatically updates uv.lock for Dependabot PRs - Patch and minor updates enable auto-merge via dependabot/fetch-metadata action - PRs merge instantly when CI passes (no polling) - Major updates get labeled requires-review for manual approval

Efficiency Benefits: - 85% faster than polling-based solutions - ~25 CI minutes saved per week (no wait actions) - 100% accurate version detection using Dependabot metadata - Zero manual work for safe updates

How It Works: 1. Dependabot metadata action reads exact update type from PR 2. Workflow enables GitHub's native auto-merge for patch/minor 3. When CI passes, GitHub merges automatically (instant) 4. Major updates wait for manual review with requires-review label

What You Do: - Nothing for 90% of PRs (auto-merge handles them) - Review PRs labeled requires-review (1-2/month) and click "Merge" on GitHub - All updates tracked at: https://github.com/your-org/redhound/pulls?q=author:dependabot

3. Monitor Pipeline Performance

  • Check workflow run times
  • Optimize slow jobs
  • Use caching effectively

4. Security First

  • Never commit secrets
  • Review security scan results
  • Update vulnerable dependencies promptly

5. Write Good Commit Messages

  • Follow Conventional Commits
  • Be descriptive
  • Reference issues when applicable

Viewing Pipeline Runs

# Via GitHub CLI
gh run list --limit 50

# Via GitHub UI
Actions  CI/CD  View workflow runs

Cloudflare Pages (Documentation)

The documentation site is deployed to redhound.pages.dev via Cloudflare Pages. The frontend app is deployed to redhound.vercel.app (Vercel); Pages is for docs only. No GitHub Actions are used for docs deploy (to avoid consuming Actions minutes).

The built site is committed to the repo. Cloudflare runs no build (clone + upload only), so each deploy finishes in ~30 seconds.

1. Cloudflare dashboard

  • Root directory: docs (or leave default if you use the path below).
  • Build command: leave empty, or set to exit 0.
  • Build output directory: mkdocs/site (with root docs) or docs/mkdocs/site (with repo root).

2. Publishing doc changes

After editing files under docs/content/ or docs/mkdocs/:

cd docs/mkdocs
uv run mkdocs build -f mkdocs.yml
cp _headers site/
cp _redirects site/
git add site/
git commit -m "docs: rebuild site"
git push

Cloudflare will pick up the push and deploy the updated site/ in ~30 s.

3. One-time setup

The directory docs/mkdocs/site/ is tracked in the repo so Cloudflare can deploy it. If the directory is missing (e.g. new clone), run the build and copy steps above, then commit and push site/.

Alternative: Build on Cloudflare (no Actions, but slow)

If you prefer not to commit the built site, you can let Cloudflare build on each deploy. Each run reinstalls Python and pip (~2+ minutes). In the dashboard:

Setting Value
Root directory docs
Build command cd mkdocs && pip install -r requirements.txt && mkdocs build -f mkdocs.yml && cp _headers site/ && cp _redirects site/
Build output directory mkdocs/site

Troubleshooting

Error: Failed to publish your Function. Got error: Unknown internal error occurred.

Known Cloudflare backend issue. Ensure the deployed output contains only static assets (no _worker.js, _routes.json, or functions directory). Check Cloudflare Status and workers-sdk issues; re-run the deploy (often transient).

References

Next Steps