CI/CD Pipeline¶
This document describes the continuous integration and deployment pipeline for the Redhound trading system.
Overview¶
The project uses GitHub Actions for CI/CD with an optimized two-tier pipeline:
Pull Request workflow (fast validation): - Runs essential validation only (lint, type-check, security, tests excluding slow/integration) - Typically completes in 5-10 minutes - No Docker build (saves time and disk space) - Ensures PRs are clean before merge
Push workflow after merge (Docker build only): - Detects merge commits and skips redundant validation (already passed in PR) - Runs Docker build + Trivy security scan once - Prepares optimized image for deployment (~2.5GB) - Typically completes in 8-12 minutes (Docker build only)
Direct push workflow (no Docker build):
- Runs full validation including all tests
- Does NOT run Docker build (use workflow_dispatch for manual builds)
- Use for hotfixes or direct commits to protected branches
Pipeline Architecture¶
graph TD
Start[Push/PR] --> ShouldRun{Should Run?}
ShouldRun -->|Yes| SetupVenv[Setup Virtual Env]
ShouldRun -->|Skip| Skip[Skip Validation]
ShouldRun -->|Yes| CheckLock[Check uv.lock Sync]
CheckLock -->|In Sync| Wait[Wait for Setup]
CheckLock -->|Out of Sync| AutoFix{Same Repo PR?}
AutoFix -->|Yes| UpdateLock[Auto-update Lock]
AutoFix -->|No| Fail[Fail: Run uv lock]
UpdateLock --> NewRun[Trigger New Run]
SetupVenv -->|Venv Ready| Wait
Wait --> Parallel{Parallel Jobs}
Parallel --> Lint[Lint & Format]
Parallel --> TypeCheck[Type Check]
Parallel --> Security[Security Scan]
Parallel --> Secrets[Detect Secrets]
Parallel --> PreCommit[Pre-commit Hooks]
Lint --> Test[Test Suite]
TypeCheck --> Test
Security --> Test
Secrets --> Test
PreCommit --> Test
Test --> MergeVal{PR Event?}
MergeVal -->|Yes| ValidateMerge[Merge Validation]
MergeVal -->|No| DockerBuild[Docker Build]
ValidateMerge --> Notify[Notification]
DockerBuild --> Notify
Notify --> Success[✓ Success]
style SetupVenv fill:#9C27B0
style CheckLock fill:#4CAF50
style Parallel fill:#2196F3
style Test fill:#FF9800
style Notify fill:#9C27B0
style Fail fill:#F44336
Pipeline Stages¶
1. Setup Virtual Environment¶
Purpose: Prepare a shared virtual environment for all validation jobs
Actions:
- Checkout code and setup Python/UV
- Check for cached venv using hash of pyproject.toml and uv.lock
- Detect if exact cache match or restore-key (partial) match
- On exact match: use cached venv as-is (~30s to download 3.5GB)
- On partial match or miss: run uv sync --locked --extra dev to update/create venv
- Save updated cache for future runs
Benefits:
- Eliminates redundancy: Only one uv sync per cache key instead of 6
- Faster CI: Saves 1-2+ minutes when cache misses
- Consistency: All jobs use identical environment
- Smart caching: Reuses partial matches but updates to current dependencies
Cache behavior:
key: venv-${{ runner.os }}-${{ hashFiles('pyproject.toml', 'uv.lock') }}
restore-keys: venv-${{ runner.os }}- # Fallback to any OS-matching cache
Performance: - Exact cache hit: ~30 seconds (download 3.5GB at ~135 MB/s + extraction) - Partial match: ~1-2 minutes (restore old cache + incremental sync) - Cache miss: ~2-3 minutes (full sync from scratch)
Note: The 3.5GB cache size is due to ML/AI dependencies (PyTorch, CUDA, transformers). This is expected and optimal for projects with heavy dependencies. Download speed of ~135 MB/s is near the maximum for GitHub Actions cache infrastructure.
2. Check Lock¶
Purpose: Verify uv.lock is in sync with pyproject.toml
Actions:
- Generate lock file with uv lock
- Compare with committed lock file
- Auto-update for Dependabot PRs
- Fail for manual PRs with out-of-sync lock
Dependabot Auto-fix:
When Dependabot creates a PR updating dependencies in pyproject.toml, the pipeline automatically:
1. Detects it's a Dependabot PR
2. Runs uv lock to update the lock file
3. Commits and pushes the updated uv.lock
4. Triggers a new workflow run with the updated lock file
Manual Fix:
# If check fails for your PR
make lock-sync
git add uv.lock
git commit -m "chore: update uv.lock"
git push
3. Lint and Format Check¶
Purpose: Enforce code style and quality standards
Tools: - ruff: Linting and formatting (replaces flake8, black, isort)
Checks: - Code formatting (PEP 8 compliance) - Import sorting - Unused imports and variables - Code complexity - Common code smells
Local Execution:
# Check formatting
uv run ruff format --check .
# Check linting
uv run ruff check .
# Auto-fix issues
uv run ruff check --fix .
uv run ruff format .
4. Type Check¶
Purpose: Validate type hints and catch type-related bugs
Tools: - pyright: Static type checker
Optimization: - PR workflow: Incremental type checking (only changed Python files) - Push workflow: Full type check (entire codebase)
Why incremental? - PRs: Fast feedback (~30-60s instead of 2-3 min) - Push: Comprehensive validation ensures no type regressions
Checks: - Type hint correctness - Type compatibility - Missing type annotations - Type inference issues
Local Execution:
# Run type checker (full)
npx pyright backend/ cli/
# Run on specific files (like CI does on PRs)
npx pyright path/to/changed/file.py
# Or via pre-commit
uv run pre-commit run pyright --all-files
5. Security Scan¶
Purpose: Identify security vulnerabilities
Tools: - bandit: Python security linter (always runs) - pip-audit: Dependency vulnerability scanner (conditional)
Optimization: - bandit: Always runs (fast, scans code) - pip-audit: Runs on push events or dependency PRs only (saves ~2 min on code-only PRs)
When pip-audit runs: - All push events (comprehensive validation) - PRs with "dep", "bump", "upgrade" in title - PRs with "dependencies" label - Dependabot PRs
Checks: - Common security issues (SQL injection, hardcoded passwords, etc.) - Known CVEs in dependencies - Insecure code patterns
Local Execution:
Note: Bandit failures block merge. pip-audit results are reported and block on CRITICAL/HIGH vulnerabilities.
6. Detect Secrets¶
Purpose: Prevent accidental commit of secrets
Tools: - detect-secrets: Secret detection
Checks: - API keys - Passwords - Private keys - Tokens - Other sensitive data
Baseline: .secrets.baseline file contains known false positives
Local Execution:
# Scan for secrets
uv run detect-secrets scan --baseline .secrets.baseline
# Update baseline (after verifying false positives)
uv run detect-secrets scan --update .secrets.baseline
7. Pre-commit Hooks¶
Purpose: Run additional quality checks
Hooks:
- trailing-whitespace: Remove trailing whitespace
- end-of-file-fixer: Ensure files end with newline
- check-yaml: Validate YAML syntax
- check-added-large-files: Prevent large file commits
- check-merge-conflict: Detect merge conflict markers
Local Execution:
# Run all hooks
uv run pre-commit run --all-files
# Run specific hook
uv run pre-commit run trailing-whitespace --all-files
Note: Linting, type checking, and security scans run in dedicated CI jobs. Locally they run at pre-push (same hooks as CI), so push is blocked until they pass and CI does not fail on them.
8. Test Suite¶
Purpose: Validate functionality and maintain code quality
Configuration:
- Runs in mock mode (no API costs)
- Parallel execution with pytest-xdist
- Coverage threshold: 10% (interim, will increase)
- PR workflow: Excludes slow, integration, api, and talib tests (fast feedback)
- Push workflow: Runs full suite including integration and slow tests (comprehensive)
Test Execution:
# PR tests (fast - unit only; typically 2-4 minutes)
pytest -v --tb=short -n auto -m "not slow and not integration and not api and not talib" \
--cov=backend --cov-report=xml --cov-fail-under=10
# Push tests (comprehensive - typically 6-10 minutes)
pytest -v --tb=short -n auto \
--cov=backend --cov-report=xml --cov-fail-under=10
Why this split: - PRs get fast feedback on core functionality - Push workflow (after merge) validates everything including slow/integration tests - Avoids running comprehensive tests twice (once in PR, once after merge)
Outputs: - JUnit XML for test results - Coverage XML for Codecov - Test result comments on PRs
Local Execution:
# Run tests like CI does
REDHOUND_MOCK_MODE=true make test
# Run with coverage
REDHOUND_MOCK_MODE=true pytest --cov=backend --cov-report=html
9. Merge Validation (PRs Only)¶
Purpose: Validate PR is ready for merge
Checks: - No merge conflicts - Required labels present (if configured) - All required checks passed
Required Labels:
- dependencies: Required for dependency update PRs
Configuration:
Override: Set to empty string to disable label requirement
10. Docker Build (Merge Commits Only)¶
Purpose: Build and scan Docker image for deployment
Actions:
1. Free up disk space - Remove unnecessary tooling (~40GB freed)
- Removes .NET, Android SDK, GHC, CodeQL, Node modules, Azure CLI, Gradle, PowerShell
- Prunes Docker images, volumes, and build cache
- Prevents "no space left on device" errors during export
2. Build Docker image with BuildKit (uses UV_LINK_MODE=copy to suppress hardlink warnings)
3. Test image (validate imports and CLI)
4. Scan with Trivy for vulnerabilities
5. Upload scan results as artifact
When it runs:
- Only on merge commits to main/master/dev (not on every push)
- Skipped when only dependencies changed (no code changes)
- Rationale: PR validation already passed, so only build once after merge
- For manual Docker builds on direct pushes, use workflow_dispatch
Optimizations:
- Only copies site-packages directory (not entire Python lib) - reduces image size from ~4.5GB to ~2.5GB
- Export time reduced from ~4 minutes to ~2 minutes
- UV uses copy mode to avoid hardlink warnings across filesystems
Vulnerability Thresholds: - CRITICAL: Fail build - HIGH: Fail build - MEDIUM: Report but pass - LOW: Report but pass
Local Execution:
# Build image
docker build -t redhound:test .
# Test image
docker run --rm --entrypoint python redhound:test -c "import backend; from backend.api.app import app"
# Scan image
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy:latest image redhound:test
11. Notification¶
Purpose: Send pipeline status to Slack
Information Sent: - Repository and branch - Commit message and author - Pipeline status (success/failure) - Job results (passed/failed) - Vulnerability summary (if Docker build ran) - Link to workflow run
Configuration:
Required GitHub secrets:
- SLACK_BOT_TOKEN: Bot User OAuth Token (starts with xoxb-)
- SLACK_CHANNEL_ID: Channel ID where notifications will be sent (starts with C)
Setting Up Slack App:
- Create a Slack App:
- Go to https://api.slack.com/apps
- Click "Create New App" → "From scratch"
- Enter app name (e.g., "CI/CD Notifications")
- Select your Slack workspace
-
Click "Create App"
-
Configure Bot Token Scopes:
- In the app settings, go to "OAuth & Permissions" (left sidebar)
- Scroll to "Scopes" → "Bot Token Scopes"
-
Add the following scopes:
chat:write- Send messages to channelschat:write.public- Send messages to public channels (if posting to public channels)
-
Install App to Workspace:
- Scroll to top of "OAuth & Permissions" page
- Click "Install to Workspace"
- Review permissions and click "Allow"
-
Copy the "Bot User OAuth Token" (starts with
xoxb-)- This is your
SLACK_BOT_TOKEN
- This is your
-
Get Channel ID:
- Open Slack in your browser
- Navigate to the channel where you want notifications
- Look at the URL:
https://yourworkspace.slack.com/archives/C1234567890 - The part after
/archives/is the Channel ID (starts withC) -
Alternatively, right-click the channel → "View channel details" → Channel ID is at the bottom
-
Add Secrets to GitHub:
- Go to your repository → Settings → Secrets and variables → Actions
- Click "New repository secret"
- Add
SLACK_BOT_TOKENwith the bot token from step 3 - Add
SLACK_CHANNEL_IDwith the channel ID from step 4
Note: Notification is optional and only runs if secrets are configured. If the bot is deleted or deactivated, you'll see account_inactive errors in workflow logs.
Additional Notification Workflows¶
Dependabot Notifications¶
Workflow: .github/workflows/dependabot-notifications.yml
Triggers: Dependabot PR opened or closed
Notifies: Update type (major/minor/patch) and PR status (opened/merged/closed)
Workflow Triggers¶
Pull Request Events¶
Triggers: PR to any branch when relevant files change
Runs: - Fast validation (lint, type-check, security, basic tests) - Skips Docker build (faster, saves disk space) - Typically completes in 5-10 minutes
Files monitored:
- **.py (Python files)
- pyproject.toml (dependencies)
- uv.lock (lock file)
- Dockerfile (container image)
- docker-compose*.yml (compose files)
- .pre-commit-config.yaml (pre-commit config)
- .secrets.baseline (secrets baseline)
- .github/workflows/** (workflow files)
Push Events (After Merge)¶
Triggers: Push to main, master, or dev branches
Behavior: - Merge commits (2+ parent commits): Runs Docker build only (no validation jobs) - Validation already passed in the PR - Only the Docker build job runs; Slack notification is sent by the "Notification (merge)" job - Typically completes in 3-5 minutes - Direct commits (1 parent): Runs full validation and Docker build - Use for hotfixes or direct commits - All validation jobs plus Docker build; main "Notification" job reports results - Typically completes in 10-15 minutes
Why different job sets: PR runs validation jobs only (no Docker). Merge runs Docker only. They do not run the same jobs with Docker skipped; each event runs a distinct set.
Manual Trigger¶
Can be manually triggered via GitHub Actions UI:
Runs comprehensive validation + Docker build regardless of commit type.
Concurrency Control¶
Strategy: Cancel in-progress runs when new commits are pushed
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.head.ref || github.ref }}
cancel-in-progress: true
Benefits: - Saves CI minutes - Faster feedback on latest changes - Prevents queue buildup
Caching Strategy¶
Shared Virtual Environment¶
Optimization: All validation jobs share a single virtual environment prepared by the setup-venv job.
How it works:
1. setup-venv job runs first after should-run
2. Checks for cached venv with key based on pyproject.toml and uv.lock hashes
3. On cache miss: runs uv sync --locked --extra dev once and caches result
4. On cache hit: skips sync entirely
5. All downstream jobs (lint, type-check, security, test) restore the same cached venv
6. Each job uses actions/cache/restore@v5 with fail-on-cache-miss: true for reliability
Why one venv (dev) for all jobs: Cache size (~3.5GB) is dominated by production dependencies (torch, transformers, etc.). Using a minimal ci extra does not reduce cache size meaningfully, and uv lock does not change when adding optional groups, so the cache key stays the same. One full dev venv keeps behavior simple and cache consistent. The ci extra in pyproject.toml remains available for optional local use (e.g. uv sync --extra ci for quick lint/type-check only).
Benefits:
- Eliminates redundant uv sync calls: Only one sync per cache key (vs. 6 syncs previously)
- Faster on cache miss: ~2-3 minutes saved when dependencies change
- Faster on cache hit: All jobs restore cache (~1-2 min) and start work immediately
- More reliable: Shared cache means all jobs use identical environment
Cache performance:
- Cache size: ~295 MB (compressed)
- Cache restore: ~1-2 minutes (download + decompress)
- Cache save: ~30-60 seconds (compress + upload)
- Note: Cache restore time is acceptable given it eliminates 5 redundant uv sync operations (each ~1-3 min)
Cache key structure:
Cache invalidation: Cache refreshes automatically when:
- Dependencies change in pyproject.toml
- Lock file updates via uv lock
- GitHub Actions cache eviction (7 days inactivity or storage limits)
Pre-commit Cache¶
- uses: actions/cache@v5
with:
path: ~/.cache/pre-commit
key: pre-commit-${{ runner.os }}-${{ hashFiles('.pre-commit-config.yaml') }}
Cache Hit: Reuse pre-commit environments Cache Miss: Install pre-commit hooks
Docker Build Cache¶
Mode: min caches only final image layers (faster cache upload, good cache hit rate)
Alternative: max caches all intermediate layers (slower upload, more cache hits)
UV Cache¶
The setup-uv action includes built-in caching via enable-cache: true:
What it caches: - UV binary downloads - Python installations - Package downloads and wheels
Benefits: Faster setup across all jobs using UV
Environment Variables¶
Version Pinning¶
All tool versions are pinned in workflow environment:
env:
PYTHON_VERSION: "3.12.12"
UV_VERSION: "0.9.13"
RUFF_VERSION: "0.14.8"
PYRIGHT_VERSION: "1.1.407"
DETECT_SECRETS_VERSION: "1.5.0"
# ... more versions
Benefits: - Reproducible builds - Explicit version control - Easy version updates
Configuration Variables¶
Reusable Actions¶
Setup Python and UV¶
Custom composite action for consistent setup:
# .github/actions/setup-python-uv/action.yml
- name: Setup Python and UV
uses: ./.github/actions/setup-python-uv
with:
python-version: ${{ env.PYTHON_VERSION }}
uv-version: ${{ env.UV_VERSION }}
Actions: 1. Install Python 2. Install uv 3. Add uv to PATH
PR Comment¶
Custom action for posting comments on PRs:
- name: Comment PR
uses: ./.github/actions/pr-comment
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
message: |
## Security Scan Results
...
Permissions¶
Workflow uses minimal required permissions:
permissions:
contents: read # Read repository contents
pull-requests: write # Comment on PRs
checks: write # Write check results
actions: read # Read workflow runs
security-events: write # Write security events
statuses: write # Write commit statuses
Timeouts¶
Each job has a timeout to prevent hanging:
should-run: 1 minutesetup-venv: 5 minutescheck-lock: 3 minuteslint: 5 minutestype-check: 8 minutessecurity: 8 minutesdetect-secrets: 3 minutespre-commit: 5 minutestest: 10 minutesmerge-validation: 2 minutesdocker-build: 15 minutesnotification: 2 minutes
Failure Handling¶
Job Dependencies¶
Jobs run in parallel when possible, with dependencies:
should-run (determines workflow scope: PR vs Push vs Merge)
├── setup-venv (prepares shared venv - runs once, skipped on merge commits)
└── check-lock (verifies lock sync, skipped on merge commits)
Validation jobs (skipped on merge commits):
├── lint (uses shared venv)
├── type-check (uses shared venv)
├── security (uses shared venv)
├── detect-secrets (uses shared venv)
├── pre-commit (uses shared venv)
└── test (uses shared venv, faster on PRs)
Docker (push events only):
└── docker-build (always runs on pushes unless deps-only change)
└── notification (always runs)
Workflow behavior by event type:
| Event Type | Validation | Docker | Duration |
|---|---|---|---|
| PR (code-only) | ✓ Fast (incremental type-check, pip-audit skipped, no slow tests) | ✗ Skipped | 5-7 min |
| PR (with deps) | ✓ Fast (incremental type-check, pip-audit runs, no slow tests) | ✗ Skipped | 7-10 min |
| Merge commit push | ✗ Skipped | ✓ Build only | 3-5 min |
| Direct push | ✓ Full (full type-check, pip-audit, all tests) | ✓ Build | 10-15 min |
| Manual | ✓ Full (all tests) | ✓ Build | 10-15 min |
Note: setup-venv and check-lock run in parallel as they have no interdependency.
Continue on Error¶
Some jobs continue on error:
- security: Reports findings but doesn't fail
- docker-build scan: Reports vulnerabilities as artifact
Conditional Execution¶
Jobs skip when not needed:
- merge-validation: Only on PRs
- docker-build: Only on pushes
- notification: Only when core jobs succeed
Local CI Simulation¶
Run Full CI Locally¶
# 1. Check lock sync
make check-lock-sync
# 2. Run linting
uv run ruff format --check .
uv run ruff check .
# 3. Run type checking
npx pyright backend/ cli/
# 4. Run security scans
uv run bandit -r backend/ cli/ -ll
uv run pip-audit --desc
# 5. Run secret detection
uv run detect-secrets scan --baseline .secrets.baseline
# 6. Run pre-commit hooks
uv run pre-commit run --all-files
# 7. Run tests
REDHOUND_MOCK_MODE=true pytest -v -n auto --cov=backend
# 8. Build Docker image
docker build -t redhound:test .
Act (Run GitHub Actions Locally)¶
# Install act
brew install act # macOS
# or
curl https://raw.githubusercontent.com/nektos/act/master/install.sh | sudo bash
# Run workflow locally
act push
# Run specific job
act -j test
# Run with secrets
act -s GITHUB_TOKEN=<token>
Troubleshooting¶
Lock File Out of Sync¶
Error: uv.lock is out of sync with pyproject.toml
Solution:
Linting Failures¶
Error: Ruff linting or formatting errors
Solution:
# Auto-fix issues
uv run ruff check --fix .
uv run ruff format .
git add .
git commit -m "style: fix linting issues"
git push
Type Check Failures¶
Error: Pyright type errors
Solution:
# Run locally to see errors
npx pyright redhound/ cli/
# Fix type errors in code
# Add type hints, fix type mismatches, etc.
git add .
git commit -m "fix: resolve type errors"
git push
Test Failures¶
Error: Pytest test failures
Solution:
# Run tests locally
REDHOUND_MOCK_MODE=true pytest -v
# Debug specific test
REDHOUND_MOCK_MODE=true pytest -v tests/path/to/test.py::test_name
# Fix failing tests
git add .
git commit -m "fix: resolve test failures"
git push
Docker Build Failures¶
Error: Docker image build or scan failures
Solution:
# Build locally
docker build -t redhound:test .
# Check for errors in Dockerfile
# Fix dependency issues, etc.
git add Dockerfile
git commit -m "fix: resolve Docker build issues"
git push
Cache Issues¶
Symptoms: Unexpected failures, stale dependencies
Solution: 1. Go to GitHub Actions 2. Click on workflow run 3. Click "Re-run jobs" → "Re-run all jobs" 4. Check "Clear cache" option
Or manually clear cache:
Docker Build "No Space Left" Errors¶
Error: failed to copy files: copy file range failed: no space left on device
Cause: GitHub Actions runners have limited disk space (~14GB free by default). Large Docker builds with heavy dependencies (PyTorch, CUDA libs) can exhaust this during the image export phase.
Solution (already implemented):
The workflow automatically frees up ~40GB before building by removing:
- .NET tools (/usr/share/dotnet)
- Android SDK (/usr/local/lib/android)
- GHC (/opt/ghc)
- CodeQL (/opt/hostedtoolcache/CodeQL)
- Node modules (/usr/local/lib/node_modules)
- Azure CLI (/opt/az)
- Gradle (/usr/share/gradle*)
- PowerShell (/usr/local/share/powershell)
- Unused Docker images, volumes, and build cache
Additional optimizations:
- Docker image only copies site-packages (not entire Python lib), reducing size from 4.5GB → 2.5GB
- Export phase completes faster, reducing disk pressure
Manual fix (if still failing):
# Check disk space
df -h
# Clean up more aggressively
sudo apt-get clean
sudo apt-get autoremove -y
sudo rm -rf /usr/share/swift
sudo rm -rf /usr/local/share/boost
Performance Optimization¶
Why CI Can Be Slow¶
CI performance depends on several factors:
Common slowdown causes:
-
Cache misses - When
pyproject.tomloruv.lockchanges, the venv cache key changes. Thesetup-venvjob runsuv sync --extra cionce (~30-60s with minimal deps), then all other jobs restore the cached venv instantly. -
GitHub Actions cache eviction - GitHub evicts caches after 7 days of inactivity or when storage limits are reached. This forces a fresh
uv syncon the next run. -
Test suite - Tests (5-6 min) scale with codebase size. Already optimized: PRs exclude slow/integration/api tests.
-
Type checking (optimized) - On PRs: incremental (30-60s). On push: full (2-3 min).
-
Docker build - Full image build + Trivy scan on cache miss can take several minutes.
-
Runner availability - During busy times, slow I/O or CPU wait can add latency.
How to Keep CI Fast¶
1. Prevent unnecessary lock changes:
Run uv lock locally when changing pyproject.toml to avoid "lock updated in CI → new run → cache miss" cycles:
The uv-lock pre-commit hook now runs on both pre-commit and pre-push to catch this automatically.
2. Leverage optimizations already in place:
The workflow is heavily optimized for fast PR feedback:
✅ Incremental type checking - Only changed files on PRs (saves 1-2 min) ✅ Conditional pip-audit - Skips on code-only PRs (saves ~2 min) ✅ Test filtering - PRs skip slow/integration tests (saves 2-3 min) ✅ Docker skip on PRs - No image build during PR review (saves 8-10 min)
3. Minimize dependency changes:
Group dependency updates when possible. Each lock change invalidates the venv cache for all subsequent runs until the cache is warmed.
4. Understand when pip-audit runs:
The workflow skips pip-audit on code-only PRs to save ~2 minutes. To trigger it: - Include "dep", "bump", or "upgrade" in PR title - Add "dependencies" label to PR - Push directly to main/master/dev (always runs)
5. Mark slow tests correctly:
Use pytest markers so CI skips them on PRs:
@pytest.mark.slow
def test_heavy_operation():
...
@pytest.mark.integration
def test_with_real_api():
...
6. Profile and optimize heavy operations:
Monitor job durations in GitHub Actions. If type-check or tests grow beyond reasonable limits:
- Type-check: Already optimized with incremental checking on PRs
- Tests: Already parallelized with pytest-xdist, consider splitting into more tiers if needed
Best Practices¶
1. Keep Branches Green¶
- Run tests locally before pushing
- Fix CI failures immediately
- Don't merge PRs with failing checks
2. Update Dependencies Automatically¶
The project uses fully automated dependency management with GitHub's native auto-merge:
Zero-Touch Workflow:
- Dependabot creates PRs weekly (Sunday 9am)
- CI automatically updates uv.lock for Dependabot PRs
- Patch and minor updates enable auto-merge via dependabot/fetch-metadata action
- PRs merge instantly when CI passes (no polling)
- Major updates get labeled requires-review for manual approval
Efficiency Benefits: - 85% faster than polling-based solutions - ~25 CI minutes saved per week (no wait actions) - 100% accurate version detection using Dependabot metadata - Zero manual work for safe updates
How It Works:
1. Dependabot metadata action reads exact update type from PR
2. Workflow enables GitHub's native auto-merge for patch/minor
3. When CI passes, GitHub merges automatically (instant)
4. Major updates wait for manual review with requires-review label
What You Do:
- Nothing for 90% of PRs (auto-merge handles them)
- Review PRs labeled requires-review (1-2/month) and click "Merge" on GitHub
- All updates tracked at: https://github.com/your-org/redhound/pulls?q=author:dependabot
3. Monitor Pipeline Performance¶
- Check workflow run times
- Optimize slow jobs
- Use caching effectively
4. Security First¶
- Never commit secrets
- Review security scan results
- Update vulnerable dependencies promptly
5. Write Good Commit Messages¶
- Follow Conventional Commits
- Be descriptive
- Reference issues when applicable
Viewing Pipeline Runs¶
Cloudflare Pages (Documentation)¶
The documentation site is deployed to redhound.pages.dev via Cloudflare Pages. The frontend app is deployed to redhound.vercel.app (Vercel); Pages is for docs only. No GitHub Actions are used for docs deploy (to avoid consuming Actions minutes).
Recommended: Commit built site, no build on Cloudflare (~30 s deploy)¶
The built site is committed to the repo. Cloudflare runs no build (clone + upload only), so each deploy finishes in ~30 seconds.
1. Cloudflare dashboard
- Root directory:
docs(or leave default if you use the path below). - Build command: leave empty, or set to
exit 0. - Build output directory:
mkdocs/site(with rootdocs) ordocs/mkdocs/site(with repo root).
2. Publishing doc changes
After editing files under docs/content/ or docs/mkdocs/:
cd docs/mkdocs
uv run mkdocs build -f mkdocs.yml
cp _headers site/
cp _redirects site/
git add site/
git commit -m "docs: rebuild site"
git push
Cloudflare will pick up the push and deploy the updated site/ in ~30 s.
3. One-time setup
The directory docs/mkdocs/site/ is tracked in the repo so Cloudflare can deploy it. If the directory is missing (e.g. new clone), run the build and copy steps above, then commit and push site/.
Alternative: Build on Cloudflare (no Actions, but slow)¶
If you prefer not to commit the built site, you can let Cloudflare build on each deploy. Each run reinstalls Python and pip (~2+ minutes). In the dashboard:
| Setting | Value |
|---|---|
| Root directory | docs |
| Build command | cd mkdocs && pip install -r requirements.txt && mkdocs build -f mkdocs.yml && cp _headers site/ && cp _redirects site/ |
| Build output directory | mkdocs/site |
Troubleshooting¶
Error: Failed to publish your Function. Got error: Unknown internal error occurred.
Known Cloudflare backend issue. Ensure the deployed output contains only static assets (no _worker.js, _routes.json, or functions directory). Check Cloudflare Status and workers-sdk issues; re-run the deploy (often transient).
References¶
- GitHub Actions Documentation
- Cloudflare Pages Documentation
- uv Documentation
- Ruff Documentation
- Pyright Documentation
- Trivy Documentation
Next Steps¶
- Read Developer Onboarding for development setup
- Read Mock Mode Guide for cost-free testing
- Read Docker Setup for local services configuration