Testing Guide¶

This guide covers testing strategies, best practices, and procedures for the Redhound trading system.

Testing Philosophy¶

The project follows a comprehensive testing strategy:

Unit Tests: Test individual functions and classes in isolation
Integration Tests: Test interactions between components
Performance Tests: Validate system performance and scalability
Mock Mode: Enable fast, cost-free testing without API calls

Coverage Target: >80% code coverage for core business logic

Test Suite Overview¶

The full test suite contains 1,454 tests. Tests are organized by layer:

Test Structure¶

tests/
├── agents/analysts/        # Analyst agent unit tests
│   ├── test_technical_analyst.py
│   ├── test_sentiment_analyst.py
│   ├── test_news_analyst.py
│   ├── test_market_context_analyst.py
│   └── test_candlestick_patterns.py
├── api/                    # API endpoint tests
│   ├── test_app.py
│   ├── test_health.py
│   ├── test_signals.py
│   ├── test_session_endpoints.py
│   ├── test_analytics_endpoints.py
│   ├── test_market_data_endpoints.py
│   ├── test_market_intelligence_endpoints.py
│   ├── test_scanner_endpoints.py
│   ├── test_callbacks.py
│   ├── test_error_handling.py
│   └── test_validation_models.py
├── data/                   # Data layer tests
│   ├── test_cache.py
│   ├── test_validation.py
│   └── utils/              # Utility tests (batch indicators, earnings, etc.)
├── database/               # Database layer tests
│   ├── models/             # SQLAlchemy model tests
│   └── repositories/       # Repository tests (base, stock profile, memory, debate, etc.)
├── integration/            # Cross-component integration tests
│   ├── test_parallel_workflow.py
│   ├── test_agent_database.py
│   ├── test_cache_integration.py
│   ├── test_data_quality_pipeline.py
│   ├── test_fundamentals_analyst.py
│   └── ...
├── orchestration/          # Orchestration layer tests
│   ├── test_conditional_logic.py
│   ├── test_signal_aggregator.py
│   ├── test_metrics_validator.py
│   ├── test_synchronization.py
│   └── test_setup.py
├── performance/            # Performance benchmarks
│   ├── test_cache_performance.py
│   ├── test_concurrent_vectors.py
│   └── test_hnsw_performance.py
├── risk/                   # Risk system tests
│   ├── test_metrics.py
│   ├── test_scorer.py
│   └── test_validator.py
├── services/               # Service layer tests
│   ├── test_signal_service.py
│   ├── test_market_data_service.py
│   ├── test_analytics_service.py
│   ├── test_session_service.py
│   ├── test_scanner_service.py
│   ├── test_screener_engine.py
│   ├── test_opportunity_service.py
│   ├── test_stock_profile_cache.py
│   └── ...
├── unit/                   # Pure unit tests (DCF, financial ratios, health)
├── utils/                  # Utility tests (validation, logging, metrics)
├── validation/             # Validation pipeline tests
│   └── validators/         # Individual validator tests
└── conftest.py             # Shared fixtures

Test Categories¶

Tests are marked with pytest markers for selective execution:

@pytest.mark.unit: Unit tests (fast, isolated)
@pytest.mark.integration: Integration tests (slower, cross-component; all tests in tests/integration/ are marked)
@pytest.mark.slow: Slow-running tests
@pytest.mark.api: Tests that hit external APIs (requires API keys)
@pytest.mark.talib: Tests that require TA-Lib (optional / heavier)

CI behaviour: On pull requests, the pipeline runs only tests that are not slow, integration, api, or talib, for fast feedback. On push (e.g. to main), the full suite runs.

Known dependency warnings (pytest)¶

The test suite filters several upstream warnings in pyproject.toml via filterwarnings. They are worth notice; the causes live in dependencies, not in this repo.

Warning	Source	Cause	What to do
DeprecationWarning: torch.jit.script	`torch.jit._script`	PyTorch deprecated `torch.jit.script` in favour of `torch.compile` / `torch.export`. Triggered when the sentiment analyst (or transformers) loads the model.	No change in this repo. Remove the filter when upgrading to a transformers/torch release that no longer uses `torch.jit.script`.

Recommendation: When upgrading transformers, torch, or pandas, run the full test suite without temporarily disabling filterwarnings and check if any of these warnings reappear; if they no longer do, remove the corresponding filter from pyproject.toml.

Running Tests¶

Quick Start¶

# Run all tests in mock mode (fast, no API costs)
REDHOUND_MOCK_MODE=true make test

# Run specific test file
REDHOUND_MOCK_MODE=true make test TEST_ARGS='-k test_technical_analyst'

# Run tests with coverage
REDHOUND_MOCK_MODE=true make test-coverage

# View coverage report
open htmlcov/index.html  # macOS
xdg-open htmlcov/index.html  # Linux

Test Execution Modes¶

1. Mock Mode (Recommended for Development)¶

# Enable mock mode globally
export REDHOUND_MOCK_MODE=true

# Run all tests
make test

# Run specific test category
pytest -m unit
pytest -m integration

Benefits: - No API costs - Fast execution (seconds vs minutes) - Deterministic results - No network dependencies

2. Real API Mode (Integration Validation)¶

# Disable mock mode
export REDHOUND_MOCK_MODE=false

# Set required API keys
export OPENAI_API_KEY=sk-...
export ALPHA_VANTAGE_API_KEY=...

# Run integration tests only
pytest -m integration

# Run API tests
pytest -m api

Use Cases: - Validate API integrations - Test real LLM responses - Verify data vendor connectivity

3. Smoke test: full graph run with real API¶

To confirm the app runs end-to-end with real LLMs (e.g. after reverting a feature or big changes), run a single propagate with one analyst to limit cost and time.

Prerequisites:

OPENAI_API_KEY set (in .env or export OPENAI_API_KEY=sk-...)
Mock mode off: export REDHOUND_MOCK_MODE=false or ensure it is unset / false in .env

Option A – Python one-liner (single propagate):

export REDHOUND_MOCK_MODE=false
export OPENAI_API_KEY=sk-your-key-here

uv run python -c "
from backend.config.settings import DEFAULT_CONFIG
from backend.orchestration.trading_graph import RedhoundGraph

config = DEFAULT_CONFIG.copy()
config['mock_mode'] = False
config['max_debate_rounds'] = 1
config['max_risk_discuss_rounds'] = 1

graph = RedhoundGraph(selected_analysts=['technical'], config=config, debug=True)
state, decision = graph.propagate('AAPL', '2024-12-01')

assert 'final_trade_decision' in state, 'missing final_trade_decision'
print('OK: decision=', decision, '| final_trade_decision:', state.get('final_trade_decision', '')[:80])
"

If this finishes without errors and prints OK: decision=..., the graph (including sequential analysts and the post-revert code paths) runs correctly with the real API.

Option B – Interactive CLI:

export REDHOUND_MOCK_MODE=false
export OPENAI_API_KEY=sk-your-key-here

redhound

Then: pick one analyst (e.g. Technical), one ticker (e.g. AAPL), one date, and run. A successful run with a final decision indicates the full CLI and graph path work with the API.

Option C – App validation (no propagate):

export REDHOUND_MOCK_MODE=false
export OPENAI_API_KEY=sk-your-key-here

make validate-app

This checks imports, config, and graph initialization (and state creation), but does not run propagate. Use Option A or B to fully validate execution with the API.

Selective Test Execution¶

# Run tests by marker
pytest -m unit                    # Unit tests only
pytest -m integration             # Integration tests only
pytest -m "not slow"              # Exclude slow tests
pytest -m "unit and not api"      # Unit tests without API calls

# Run tests by name pattern
pytest -k "test_technical"        # All tests with "technical" in name
pytest -k "test_analyst_"         # All analyst tests

# Run specific test file
pytest tests/orchestration/test_trading_graph.py

# Run specific test function
pytest tests/orchestration/test_trading_graph.py::test_graph_initialization

# Run tests in parallel (faster)
pytest -n auto                    # Auto-detect CPU cores
pytest -n 4                       # Use 4 workers

Coverage Reports¶

# Recommended: Use Makefile target (generates HTML, XML, and terminal reports)
make test-coverage

# Or use pytest directly for more control:
# Generate coverage report
pytest --cov=backend --cov-report=term-missing

# Generate HTML coverage report
pytest --cov=backend --cov-report=html

# Generate XML coverage report (for CI)
pytest --cov=backend --cov-report=xml

# Fail if coverage below threshold
pytest --cov=backend --cov-fail-under=80

Writing Tests¶

Test Structure¶

Follow the Arrange-Act-Assert (AAA) pattern:

def test_example():
    # Arrange: Set up test data and dependencies
    config = DEFAULT_CONFIG.copy()
    config["mock_mode"] = True

    # Act: Execute the code under test
    result = function_under_test(config)

    # Assert: Verify the expected outcome
    assert result == expected_value

Unit Test Example¶

import pytest
from backend.orchestration.conditional_logic import should_continue_debate

def test_should_continue_debate_within_limit():
    """Test debate continuation when within round limit."""
    # Arrange
    state = {
        "investment_debate_state": {
            "count": 1
        },
        "config": {
            "max_debate_rounds": 3
        }
    }

    # Act
    result = should_continue_debate(state)

    # Assert
    assert result is True

def test_should_continue_debate_at_limit():
    """Test debate termination when at round limit."""
    # Arrange
    state = {
        "investment_debate_state": {
            "count": 3
        },
        "config": {
            "max_debate_rounds": 3
        }
    }

    # Act
    result = should_continue_debate(state)

    # Assert
    assert result is False

Integration Test Example¶

import pytest
from backend.orchestration.trading_graph import RedhoundGraph
from backend.config.settings import DEFAULT_CONFIG

@pytest.mark.integration
def test_full_trading_workflow():
    """Test complete trading workflow from start to finish."""
    # Arrange
    config = DEFAULT_CONFIG.copy()
    config["mock_mode"] = True
    config["max_debate_rounds"] = 1
    config["max_risk_discuss_rounds"] = 1

    graph = RedhoundGraph(
        selected_analysts=["technical", "fundamentals"],
        config=config
    )

    # Act
    final_state, decision = graph.propagate("AAPL", "2024-12-01")

    # Assert
    assert decision in ["BUY", "SELL", "HOLD"]
    assert final_state["technical_report"] is not None
    assert final_state["fundamentals_report"] is not None
    assert final_state["final_trade_decision"] is not None

Async Test Example¶

import pytest
from backend.data.cache import CacheClient

@pytest.mark.asyncio
async def test_cache_get_set():
    """Test async cache operations."""
    # Arrange
    cache = CacheClient()
    key = "test_key"
    value = "test_value"

    # Act
    await cache.set(key, value, ttl=60)
    result = await cache.get(key)

    # Assert
    assert result == value

Parametrized Test Example¶

import pytest

@pytest.mark.parametrize("ticker,expected_valid", [
    ("AAPL", True),
    ("MSFT", True),
    ("INVALID", False),
    ("", False),
    (None, False),
])
def test_ticker_validation(ticker, expected_valid):
    """Test ticker validation with various inputs."""
    result = validate_ticker(ticker)
    assert result == expected_valid

Test Fixtures¶

Common Fixtures¶

# tests/utils/fixtures.py
import pytest
from backend.config.settings import DEFAULT_CONFIG
from backend.orchestration.trading_graph import RedhoundGraph

@pytest.fixture
def mock_config():
    """Fixture providing a config with mock mode enabled."""
    config = DEFAULT_CONFIG.copy()
    config["mock_mode"] = True
    config["max_debate_rounds"] = 1
    config["max_risk_discuss_rounds"] = 1
    return config

@pytest.fixture
def mock_graph(mock_config):
    """Fixture providing a RedhoundGraph in mock mode."""
    return RedhoundGraph(
        selected_analysts=["technical", "fundamentals"],
        config=mock_config
    )

@pytest.fixture
def sample_state():
    """Fixture providing a sample agent state."""
    return {
        "ticker": "AAPL",
        "date": "2024-12-01",
        "technical_report": "Sample technical report",
        "fundamentals_report": "Sample fundamentals report",
        "config": DEFAULT_CONFIG.copy(),
    }

Using Fixtures¶

def test_with_fixtures(mock_config, sample_state):
    """Test using pre-configured fixtures."""
    # Fixtures are automatically injected
    assert mock_config["mock_mode"] is True
    assert sample_state["ticker"] == "AAPL"

Mocking¶

Mocking LLM Calls¶

from backend.utils.mock_llm import FakeLLM

def test_agent_with_mock_llm():
    """Test agent with mocked LLM."""
    # Arrange
    mock_llm = FakeLLM(agent_type="technical_analyst")

    # Act
    response = mock_llm.invoke("Analyze AAPL")

    # Assert
    assert "Technical Analysis" in response.content
    assert mock_llm.call_count == 1

Mocking Memory¶

from backend.utils.mock_memory import create_mock_memory

def test_agent_with_mock_memory():
    """Test agent with mocked memory."""
    # Arrange
    config = {"data_cache_dir": "/tmp/test_cache"}
    memory = create_mock_memory("test_memory", config, preloaded=True)

    # Act
    memories = memory.get_memories("bullish breakout", n_matches=2)

    # Assert
    assert len(memories) > 0

Mocking External APIs¶

from unittest.mock import patch, MagicMock

@patch('backend.data.vendors.fmp.FMPClient.get_historical_data')
def test_fmp_data_fetch(mock_get_data):
    """Test FMP data fetching with mocked API."""
    # Arrange
    mock_data = MagicMock()
    mock_data.history.return_value = pd.DataFrame({
        'Close': [100, 101, 102],
        'Volume': [1000, 1100, 1200],
    })
    mock_ticker.return_value = mock_data

    # Act
    data = fetch_stock_data("AAPL", "2024-01-01", "2024-01-03")

    # Assert
    assert len(data) == 3
    assert data['Close'].iloc[0] == 100

Performance Testing¶

Cache Performance¶

@pytest.mark.performance
async def test_cache_performance():
    """Test cache hit/miss performance."""
    cache = CacheClient()
    key = "test_key"
    value = "test_value"

    # Measure set performance
    start = time.monotonic()
    await cache.set(key, value, ttl=60)
    set_time = time.monotonic() - start

    # Measure get performance (cache hit)
    start = time.monotonic()
    result = await cache.get(key)
    get_time = time.monotonic() - start

    # Assert performance thresholds
    assert set_time < 0.1, f"Cache set too slow: {set_time:.3f}s"
    assert get_time < 0.01, f"Cache get too slow: {get_time:.3f}s"
    assert result == value

Test Best Practices¶

1. Test Naming¶

Use descriptive test names that explain what is being tested:

# Good
def test_should_continue_debate_returns_true_when_within_limit():
    pass

# Bad
def test_debate():
    pass

2. Test Independence¶

Each test should be independent and not rely on other tests:

# Good
def test_feature_a():
    setup_data()
    result = test_feature_a()
    assert result == expected

def test_feature_b():
    setup_data()
    result = test_feature_b()
    assert result == expected

# Bad
def test_feature_a():
    global shared_state
    shared_state = setup_data()
    result = test_feature_a()
    assert result == expected

def test_feature_b():
    # Relies on test_feature_a running first
    result = test_feature_b(shared_state)
    assert result == expected

3. Test One Thing¶

Each test should verify one specific behavior:

# Good
def test_technical_analyst_returns_report():
    result = technical_analyst(state)
    assert result["technical_report"] is not None

def test_technical_analyst_includes_macd():
    result = technical_analyst(state)
    assert "MACD" in result["technical_report"]

# Bad
def test_technical_analyst():
    result = technical_analyst(state)
    assert result["technical_report"] is not None
    assert "MACD" in result["technical_report"]
    assert "RSI" in result["technical_report"]
    assert len(result["technical_report"]) > 100

4. Use Assertions Effectively¶

# Good
assert result == expected, f"Expected {expected}, got {result}"
assert len(items) > 0, "Items list should not be empty"

# Bad
assert result  # What are we checking?

5. Test Edge Cases¶

def test_divide_by_zero():
    with pytest.raises(ZeroDivisionError):
        divide(10, 0)

def test_empty_list():
    result = process_list([])
    assert result == []

def test_none_input():
    result = process_value(None)
    assert result is None

Continuous Integration¶

GitHub Actions¶

Tests run automatically on: - Push to any branch - Pull request creation/update - Manual workflow dispatch

# .github/workflows/cicd.yml (excerpt)
test:
  runs-on: ubuntu-latest
  steps:
    - name: Run tests
      run: |
        pytest -v --tb=short -n auto \
          --cov=backend --cov-report=xml \
          --cov-fail-under=10 --junitxml=junit.xml

Local Pre-commit¶

Run tests before committing:

# Add to .git/hooks/pre-commit
#!/bin/bash
REDHOUND_MOCK_MODE=true pytest -m "not slow and not integration"

Debugging Tests¶

Run Tests with Debugging¶

# Run with verbose output
pytest -v

# Run with detailed output
pytest -vv

# Show print statements
pytest -s

# Drop into debugger on failure
pytest --pdb

# Drop into debugger on first failure
pytest -x --pdb

Use ipdb for Debugging¶

def test_with_debugging():
    # Add breakpoint
    import ipdb; ipdb.set_trace()

    result = function_under_test()
    assert result == expected

View Test Logs¶

# Run with log output
pytest --log-cli-level=DEBUG

# Save logs to file
pytest --log-file=test.log --log-file-level=DEBUG

Test Coverage Goals¶

Coverage Targets¶

Overall Coverage: >80%
Core Business Logic: >90%
Orchestration Layer: >85%
Agent Layer: >80%
Data Layer: >75%
API Layer: >70%

Excluded from Coverage¶

CLI interface (integration-heavy, tested via E2E)
Docker configuration files
Scripts and utilities
Test files themselves

Improving Coverage¶

# Recommended: Use Makefile target
make test-coverage
open htmlcov/index.html

# Or use pytest directly:
# Identify uncovered lines
pytest --cov=backend --cov-report=term-missing

# Generate HTML report for detailed analysis
pytest --cov=backend --cov-report=html
open htmlcov/index.html

Troubleshooting¶

Tests Failing in CI but Passing Locally¶

Possible Causes: - Environment differences - Missing dependencies - Timing issues

Solutions:

# Run tests in same environment as CI
docker run -it python:3.12 /bin/bash
# Inside container:
pip install uv
uv sync --locked --extra dev
pytest

Slow Test Execution¶

Solutions:

# Run tests in parallel
pytest -n auto

# Skip slow tests
pytest -m "not slow"

# Use mock mode
export REDHOUND_MOCK_MODE=true

Flaky Tests¶

Symptoms: Tests pass sometimes, fail other times

Solutions: - Use deterministic data (avoid random values) - Mock external dependencies - Use proper synchronization for async tests - Increase timeouts for timing-sensitive tests

Next Steps¶

Read CI/CD Documentation for pipeline details
Read Mock Mode for cost-free testing
Read Developer Onboarding for setup
Read Architecture to understand system design