Testing Guide¶
This guide covers testing strategies, best practices, and procedures for the Redhound trading system.
Testing Philosophy¶
The project follows a comprehensive testing strategy:
- Unit Tests: Test individual functions and classes in isolation
- Integration Tests: Test interactions between components
- Performance Tests: Validate system performance and scalability
- Mock Mode: Enable fast, cost-free testing without API calls
Coverage Target: >80% code coverage for core business logic
Test Suite Overview¶
The full test suite contains 1,454 tests. Tests are organized by layer:
Test Structure¶
tests/
├── agents/analysts/ # Analyst agent unit tests
│ ├── test_technical_analyst.py
│ ├── test_sentiment_analyst.py
│ ├── test_news_analyst.py
│ ├── test_market_context_analyst.py
│ └── test_candlestick_patterns.py
├── api/ # API endpoint tests
│ ├── test_app.py
│ ├── test_health.py
│ ├── test_signals.py
│ ├── test_session_endpoints.py
│ ├── test_analytics_endpoints.py
│ ├── test_market_data_endpoints.py
│ ├── test_market_intelligence_endpoints.py
│ ├── test_scanner_endpoints.py
│ ├── test_callbacks.py
│ ├── test_error_handling.py
│ └── test_validation_models.py
├── data/ # Data layer tests
│ ├── test_cache.py
│ ├── test_validation.py
│ └── utils/ # Utility tests (batch indicators, earnings, etc.)
├── database/ # Database layer tests
│ ├── models/ # SQLAlchemy model tests
│ └── repositories/ # Repository tests (base, stock profile, memory, debate, etc.)
├── integration/ # Cross-component integration tests
│ ├── test_parallel_workflow.py
│ ├── test_agent_database.py
│ ├── test_cache_integration.py
│ ├── test_data_quality_pipeline.py
│ ├── test_fundamentals_analyst.py
│ └── ...
├── orchestration/ # Orchestration layer tests
│ ├── test_conditional_logic.py
│ ├── test_signal_aggregator.py
│ ├── test_metrics_validator.py
│ ├── test_synchronization.py
│ └── test_setup.py
├── performance/ # Performance benchmarks
│ ├── test_cache_performance.py
│ ├── test_concurrent_vectors.py
│ └── test_hnsw_performance.py
├── risk/ # Risk system tests
│ ├── test_metrics.py
│ ├── test_scorer.py
│ └── test_validator.py
├── services/ # Service layer tests
│ ├── test_signal_service.py
│ ├── test_market_data_service.py
│ ├── test_analytics_service.py
│ ├── test_session_service.py
│ ├── test_scanner_service.py
│ ├── test_screener_engine.py
│ ├── test_opportunity_service.py
│ ├── test_stock_profile_cache.py
│ └── ...
├── unit/ # Pure unit tests (DCF, financial ratios, health)
├── utils/ # Utility tests (validation, logging, metrics)
├── validation/ # Validation pipeline tests
│ └── validators/ # Individual validator tests
└── conftest.py # Shared fixtures
Test Categories¶
Tests are marked with pytest markers for selective execution:
@pytest.mark.unit: Unit tests (fast, isolated)@pytest.mark.integration: Integration tests (slower, cross-component; all tests intests/integration/are marked)@pytest.mark.slow: Slow-running tests@pytest.mark.api: Tests that hit external APIs (requires API keys)@pytest.mark.talib: Tests that require TA-Lib (optional / heavier)
CI behaviour: On pull requests, the pipeline runs only tests that are not slow, integration, api, or talib, for fast feedback. On push (e.g. to main), the full suite runs.
Known dependency warnings (pytest)¶
The test suite filters several upstream warnings in pyproject.toml via filterwarnings. They are worth notice; the causes live in dependencies, not in this repo.
| Warning | Source | Cause | What to do |
|---|---|---|---|
| DeprecationWarning: torch.jit.script | torch.jit._script |
PyTorch deprecated torch.jit.script in favour of torch.compile / torch.export. Triggered when the sentiment analyst (or transformers) loads the model. |
No change in this repo. Remove the filter when upgrading to a transformers/torch release that no longer uses torch.jit.script. |
Recommendation: When upgrading transformers, torch, or pandas, run the full test suite without temporarily disabling filterwarnings and check if any of these warnings reappear; if they no longer do, remove the corresponding filter from pyproject.toml.
Running Tests¶
Quick Start¶
# Run all tests in mock mode (fast, no API costs)
REDHOUND_MOCK_MODE=true make test
# Run specific test file
REDHOUND_MOCK_MODE=true make test TEST_ARGS='-k test_technical_analyst'
# Run tests with coverage
REDHOUND_MOCK_MODE=true make test-coverage
# View coverage report
open htmlcov/index.html # macOS
xdg-open htmlcov/index.html # Linux
Test Execution Modes¶
1. Mock Mode (Recommended for Development)¶
# Enable mock mode globally
export REDHOUND_MOCK_MODE=true
# Run all tests
make test
# Run specific test category
pytest -m unit
pytest -m integration
Benefits: - No API costs - Fast execution (seconds vs minutes) - Deterministic results - No network dependencies
2. Real API Mode (Integration Validation)¶
# Disable mock mode
export REDHOUND_MOCK_MODE=false
# Set required API keys
export OPENAI_API_KEY=sk-...
export ALPHA_VANTAGE_API_KEY=...
# Run integration tests only
pytest -m integration
# Run API tests
pytest -m api
Use Cases: - Validate API integrations - Test real LLM responses - Verify data vendor connectivity
3. Smoke test: full graph run with real API¶
To confirm the app runs end-to-end with real LLMs (e.g. after reverting a feature or big changes), run a single propagate with one analyst to limit cost and time.
Prerequisites:
OPENAI_API_KEYset (in.envorexport OPENAI_API_KEY=sk-...)- Mock mode off:
export REDHOUND_MOCK_MODE=falseor ensure it is unset /falsein.env
Option A – Python one-liner (single propagate):
export REDHOUND_MOCK_MODE=false
export OPENAI_API_KEY=sk-your-key-here
uv run python -c "
from backend.config.settings import DEFAULT_CONFIG
from backend.orchestration.trading_graph import RedhoundGraph
config = DEFAULT_CONFIG.copy()
config['mock_mode'] = False
config['max_debate_rounds'] = 1
config['max_risk_discuss_rounds'] = 1
graph = RedhoundGraph(selected_analysts=['technical'], config=config, debug=True)
state, decision = graph.propagate('AAPL', '2024-12-01')
assert 'final_trade_decision' in state, 'missing final_trade_decision'
print('OK: decision=', decision, '| final_trade_decision:', state.get('final_trade_decision', '')[:80])
"
If this finishes without errors and prints OK: decision=..., the graph (including sequential analysts and the post-revert code paths) runs correctly with the real API.
Option B – Interactive CLI:
Then: pick one analyst (e.g. Technical), one ticker (e.g. AAPL), one date, and run. A successful run with a final decision indicates the full CLI and graph path work with the API.
Option C – App validation (no propagate):
This checks imports, config, and graph initialization (and state creation), but does not run propagate. Use Option A or B to fully validate execution with the API.
Selective Test Execution¶
# Run tests by marker
pytest -m unit # Unit tests only
pytest -m integration # Integration tests only
pytest -m "not slow" # Exclude slow tests
pytest -m "unit and not api" # Unit tests without API calls
# Run tests by name pattern
pytest -k "test_technical" # All tests with "technical" in name
pytest -k "test_analyst_" # All analyst tests
# Run specific test file
pytest tests/orchestration/test_trading_graph.py
# Run specific test function
pytest tests/orchestration/test_trading_graph.py::test_graph_initialization
# Run tests in parallel (faster)
pytest -n auto # Auto-detect CPU cores
pytest -n 4 # Use 4 workers
Coverage Reports¶
# Recommended: Use Makefile target (generates HTML, XML, and terminal reports)
make test-coverage
# Or use pytest directly for more control:
# Generate coverage report
pytest --cov=backend --cov-report=term-missing
# Generate HTML coverage report
pytest --cov=backend --cov-report=html
# Generate XML coverage report (for CI)
pytest --cov=backend --cov-report=xml
# Fail if coverage below threshold
pytest --cov=backend --cov-fail-under=80
Writing Tests¶
Test Structure¶
Follow the Arrange-Act-Assert (AAA) pattern:
def test_example():
# Arrange: Set up test data and dependencies
config = DEFAULT_CONFIG.copy()
config["mock_mode"] = True
# Act: Execute the code under test
result = function_under_test(config)
# Assert: Verify the expected outcome
assert result == expected_value
Unit Test Example¶
import pytest
from backend.orchestration.conditional_logic import should_continue_debate
def test_should_continue_debate_within_limit():
"""Test debate continuation when within round limit."""
# Arrange
state = {
"investment_debate_state": {
"count": 1
},
"config": {
"max_debate_rounds": 3
}
}
# Act
result = should_continue_debate(state)
# Assert
assert result is True
def test_should_continue_debate_at_limit():
"""Test debate termination when at round limit."""
# Arrange
state = {
"investment_debate_state": {
"count": 3
},
"config": {
"max_debate_rounds": 3
}
}
# Act
result = should_continue_debate(state)
# Assert
assert result is False
Integration Test Example¶
import pytest
from backend.orchestration.trading_graph import RedhoundGraph
from backend.config.settings import DEFAULT_CONFIG
@pytest.mark.integration
def test_full_trading_workflow():
"""Test complete trading workflow from start to finish."""
# Arrange
config = DEFAULT_CONFIG.copy()
config["mock_mode"] = True
config["max_debate_rounds"] = 1
config["max_risk_discuss_rounds"] = 1
graph = RedhoundGraph(
selected_analysts=["technical", "fundamentals"],
config=config
)
# Act
final_state, decision = graph.propagate("AAPL", "2024-12-01")
# Assert
assert decision in ["BUY", "SELL", "HOLD"]
assert final_state["technical_report"] is not None
assert final_state["fundamentals_report"] is not None
assert final_state["final_trade_decision"] is not None
Async Test Example¶
import pytest
from backend.data.cache import CacheClient
@pytest.mark.asyncio
async def test_cache_get_set():
"""Test async cache operations."""
# Arrange
cache = CacheClient()
key = "test_key"
value = "test_value"
# Act
await cache.set(key, value, ttl=60)
result = await cache.get(key)
# Assert
assert result == value
Parametrized Test Example¶
import pytest
@pytest.mark.parametrize("ticker,expected_valid", [
("AAPL", True),
("MSFT", True),
("INVALID", False),
("", False),
(None, False),
])
def test_ticker_validation(ticker, expected_valid):
"""Test ticker validation with various inputs."""
result = validate_ticker(ticker)
assert result == expected_valid
Test Fixtures¶
Common Fixtures¶
# tests/utils/fixtures.py
import pytest
from backend.config.settings import DEFAULT_CONFIG
from backend.orchestration.trading_graph import RedhoundGraph
@pytest.fixture
def mock_config():
"""Fixture providing a config with mock mode enabled."""
config = DEFAULT_CONFIG.copy()
config["mock_mode"] = True
config["max_debate_rounds"] = 1
config["max_risk_discuss_rounds"] = 1
return config
@pytest.fixture
def mock_graph(mock_config):
"""Fixture providing a RedhoundGraph in mock mode."""
return RedhoundGraph(
selected_analysts=["technical", "fundamentals"],
config=mock_config
)
@pytest.fixture
def sample_state():
"""Fixture providing a sample agent state."""
return {
"ticker": "AAPL",
"date": "2024-12-01",
"technical_report": "Sample technical report",
"fundamentals_report": "Sample fundamentals report",
"config": DEFAULT_CONFIG.copy(),
}
Using Fixtures¶
def test_with_fixtures(mock_config, sample_state):
"""Test using pre-configured fixtures."""
# Fixtures are automatically injected
assert mock_config["mock_mode"] is True
assert sample_state["ticker"] == "AAPL"
Mocking¶
Mocking LLM Calls¶
from backend.utils.mock_llm import FakeLLM
def test_agent_with_mock_llm():
"""Test agent with mocked LLM."""
# Arrange
mock_llm = FakeLLM(agent_type="technical_analyst")
# Act
response = mock_llm.invoke("Analyze AAPL")
# Assert
assert "Technical Analysis" in response.content
assert mock_llm.call_count == 1
Mocking Memory¶
from backend.utils.mock_memory import create_mock_memory
def test_agent_with_mock_memory():
"""Test agent with mocked memory."""
# Arrange
config = {"data_cache_dir": "/tmp/test_cache"}
memory = create_mock_memory("test_memory", config, preloaded=True)
# Act
memories = memory.get_memories("bullish breakout", n_matches=2)
# Assert
assert len(memories) > 0
Mocking External APIs¶
from unittest.mock import patch, MagicMock
@patch('backend.data.vendors.fmp.FMPClient.get_historical_data')
def test_fmp_data_fetch(mock_get_data):
"""Test FMP data fetching with mocked API."""
# Arrange
mock_data = MagicMock()
mock_data.history.return_value = pd.DataFrame({
'Close': [100, 101, 102],
'Volume': [1000, 1100, 1200],
})
mock_ticker.return_value = mock_data
# Act
data = fetch_stock_data("AAPL", "2024-01-01", "2024-01-03")
# Assert
assert len(data) == 3
assert data['Close'].iloc[0] == 100
Performance Testing¶
Cache Performance¶
@pytest.mark.performance
async def test_cache_performance():
"""Test cache hit/miss performance."""
cache = CacheClient()
key = "test_key"
value = "test_value"
# Measure set performance
start = time.monotonic()
await cache.set(key, value, ttl=60)
set_time = time.monotonic() - start
# Measure get performance (cache hit)
start = time.monotonic()
result = await cache.get(key)
get_time = time.monotonic() - start
# Assert performance thresholds
assert set_time < 0.1, f"Cache set too slow: {set_time:.3f}s"
assert get_time < 0.01, f"Cache get too slow: {get_time:.3f}s"
assert result == value
Test Best Practices¶
1. Test Naming¶
Use descriptive test names that explain what is being tested:
# Good
def test_should_continue_debate_returns_true_when_within_limit():
pass
# Bad
def test_debate():
pass
2. Test Independence¶
Each test should be independent and not rely on other tests:
# Good
def test_feature_a():
setup_data()
result = test_feature_a()
assert result == expected
def test_feature_b():
setup_data()
result = test_feature_b()
assert result == expected
# Bad
def test_feature_a():
global shared_state
shared_state = setup_data()
result = test_feature_a()
assert result == expected
def test_feature_b():
# Relies on test_feature_a running first
result = test_feature_b(shared_state)
assert result == expected
3. Test One Thing¶
Each test should verify one specific behavior:
# Good
def test_technical_analyst_returns_report():
result = technical_analyst(state)
assert result["technical_report"] is not None
def test_technical_analyst_includes_macd():
result = technical_analyst(state)
assert "MACD" in result["technical_report"]
# Bad
def test_technical_analyst():
result = technical_analyst(state)
assert result["technical_report"] is not None
assert "MACD" in result["technical_report"]
assert "RSI" in result["technical_report"]
assert len(result["technical_report"]) > 100
4. Use Assertions Effectively¶
# Good
assert result == expected, f"Expected {expected}, got {result}"
assert len(items) > 0, "Items list should not be empty"
# Bad
assert result # What are we checking?
5. Test Edge Cases¶
def test_divide_by_zero():
with pytest.raises(ZeroDivisionError):
divide(10, 0)
def test_empty_list():
result = process_list([])
assert result == []
def test_none_input():
result = process_value(None)
assert result is None
Continuous Integration¶
GitHub Actions¶
Tests run automatically on: - Push to any branch - Pull request creation/update - Manual workflow dispatch
# .github/workflows/cicd.yml (excerpt)
test:
runs-on: ubuntu-latest
steps:
- name: Run tests
run: |
pytest -v --tb=short -n auto \
--cov=backend --cov-report=xml \
--cov-fail-under=10 --junitxml=junit.xml
Local Pre-commit¶
Run tests before committing:
# Add to .git/hooks/pre-commit
#!/bin/bash
REDHOUND_MOCK_MODE=true pytest -m "not slow and not integration"
Debugging Tests¶
Run Tests with Debugging¶
# Run with verbose output
pytest -v
# Run with detailed output
pytest -vv
# Show print statements
pytest -s
# Drop into debugger on failure
pytest --pdb
# Drop into debugger on first failure
pytest -x --pdb
Use ipdb for Debugging¶
def test_with_debugging():
# Add breakpoint
import ipdb; ipdb.set_trace()
result = function_under_test()
assert result == expected
View Test Logs¶
# Run with log output
pytest --log-cli-level=DEBUG
# Save logs to file
pytest --log-file=test.log --log-file-level=DEBUG
Test Coverage Goals¶
Coverage Targets¶
- Overall Coverage: >80%
- Core Business Logic: >90%
- Orchestration Layer: >85%
- Agent Layer: >80%
- Data Layer: >75%
- API Layer: >70%
Excluded from Coverage¶
- CLI interface (integration-heavy, tested via E2E)
- Docker configuration files
- Scripts and utilities
- Test files themselves
Improving Coverage¶
# Recommended: Use Makefile target
make test-coverage
open htmlcov/index.html
# Or use pytest directly:
# Identify uncovered lines
pytest --cov=backend --cov-report=term-missing
# Generate HTML report for detailed analysis
pytest --cov=backend --cov-report=html
open htmlcov/index.html
Troubleshooting¶
Tests Failing in CI but Passing Locally¶
Possible Causes: - Environment differences - Missing dependencies - Timing issues
Solutions:
# Run tests in same environment as CI
docker run -it python:3.12 /bin/bash
# Inside container:
pip install uv
uv sync --locked --extra dev
pytest
Slow Test Execution¶
Solutions:
# Run tests in parallel
pytest -n auto
# Skip slow tests
pytest -m "not slow"
# Use mock mode
export REDHOUND_MOCK_MODE=true
Flaky Tests¶
Symptoms: Tests pass sometimes, fail other times
Solutions: - Use deterministic data (avoid random values) - Mock external dependencies - Use proper synchronization for async tests - Increase timeouts for timing-sensitive tests
Next Steps¶
- Read CI/CD Documentation for pipeline details
- Read Mock Mode for cost-free testing
- Read Developer Onboarding for setup
- Read Architecture to understand system design