Data Backfill
The backfill_history.py script performs a one-time backfill of historical OHLCV data for the S&P 500 universe (or custom symbol list) using Alpha Vantage.
Purpose
- Populate TimescaleDB with 2+ years of daily OHLCV bars
- Enable backtesting and strategy development
- Required before running live strategies (historical data for indicators)
Source: Alpha Vantage Daily Adjusted endpoint (20+ years available)
Prerequisites
- Docker infrastructure running:
docker compose up -d timescaledb redis - Alpha Vantage API key:
- Free tier: https://www.alphavantage.co/support/#api-key
- Set in environment:
SA_ALPHA_VANTAGE_API_KEY - Rate limit: 5 requests/minute (free tier)
- Python environment:
pip install -e ".[dev]"
Usage
Full S&P 500 backfill (2 years)
python scripts/backfill_history.py --years 2 --symbols sp500
Output:
Backfill: 2 year(s) of OHLCV history
------------------------------------------------------------
Universe loaded: 503 S&P 500 symbols
Total: 503
Completed: 0
Remaining: 503
Estimated time at 5 req/min: 1h 40m 36s
[AAPL ] 1/503 ( 0.2%) ETA: 1h 40m 36s
[MSFT ] 2/503 ( 0.4%) ETA: 1h 39m 12s
...
Duration: ~1 hour 40 minutes on free tier (5 req/min rate limit)
Quick smoke test (3 symbols)
python scripts/backfill_history.py --years 2 --symbols AAPL,MSFT,GOOG
Output:
Symbol list: 3 symbols from command line
Total: 3
Completed: 0
Remaining: 3
Estimated time at 5 req/min: 36s
Duration: ~36 seconds
Resume interrupted run
The script tracks completed symbols in Redis (backfill:completed set). If interrupted (Ctrl+C, network error), restart to continue:
python scripts/backfill_history.py --resume
# or just re-run without --reset (resume is default behavior)
Output:
Total: 503
Completed: 127
Remaining: 376
Estimated time at 5 req/min: 1h 15m 12s
Only symbols not in the backfill:completed set will be fetched.
Reset and start over
python scripts/backfill_history.py --reset --years 2 --symbols sp500
Warning: Clears the backfill:completed Redis key. All symbols will be re-fetched (duplicates will be deduplicated by TimescaleDB primary key).
Options
| Flag | Default | Description |
|---|---|---|
--years | 2 | Number of years of history to fetch |
--symbols | sp500 | Symbol list: sp500 or comma-separated tickers |
--resume | (default) | Skip symbols already in progress set (implicit) |
--reset | (off) | Clear progress set and start from scratch |
Rate Limiting
Alpha Vantage free tier: 5 requests per minute (500 API calls per day)
Script behavior:
- Respects Alpha Vantage rate limits via adaptive pacing
- Displays dynamic ETA based on actual throughput
- Safe to interrupt (Ctrl+C) — progress is saved
Estimated times:
| Symbols | Duration (5 req/min) |
|---|---|
| 10 | ~2 minutes |
| 100 | ~20 minutes |
| 503 (S&P 500) | ~1 hour 40 minutes |
Premium tier: 75 req/min → ~7 minutes for full S&P 500
Idempotency
Redis Progress Key
Key: backfill:completed Type: Set Members: Stock symbols (e.g., "AAPL", "MSFT") TTL: None (persistent)
Behavior:
- Symbol added to set after successful storage in TimescaleDB
- Failed symbols are not added → will be retried on next run
- Check membership before fetching:
redis.sismember("backfill:completed", "AAPL")
Database Deduplication
OHLCV table has a primary key on (symbol, timestamp). Duplicate inserts are ignored (ON CONFLICT DO NOTHING behavior).
Result: Safe to re-run the script multiple times — duplicates will be skipped.
Error Handling
Symbol-level errors
If a single symbol fails (network error, API limit, invalid ticker), the script logs a warning and continues:
[WARN] INVALID: API returned no data
Behavior:
- Symbol is not added to
backfill:completed - Will be retried on next run
- Fetched symbols are recorded before moving to the next
Keyboard interrupt (Ctrl+C)
Interrupted after 42 symbols. Progress saved.
Run with --resume (or just re-run) to continue.
Behavior:
- Completed symbols are already in
backfill:completed - Re-run the script to continue from where it left off
Network failure
Transient errors: Logged and skipped; retry on next run Persistent errors: Check API key, internet connectivity, Alpha Vantage status
Output
Console
Live progress line with dynamic ETA:
[AAPL ] 42/503 ( 8.3%) ETA: 1h 32m 15s
Fields:
[SYMBOL]— Current symbol being processed42/503— Completed count / total(8.3%)— Percentage completeETA: 1h 32m 15s— Estimated time remaining (updates based on actual rate)
Final Summary
============================================================
Backfill complete in 1h 38m 42s
Fetched: 503 Errors: 0 Skipped: 0
============================================================
Logs
Warnings and errors are logged to structlog (console by default):
logger.warning("backfill.symbol_error", symbol=symbol, error=str(exc))
Verification
Check database
# Connect to TimescaleDB
docker exec -it alpha-oracle-timescaledb-1 psql -U trader -d stock_analysis
# Count symbols
SELECT COUNT(DISTINCT symbol) FROM ohlcv;
-- Expected: 503
# Check date range for AAPL
SELECT MIN(timestamp), MAX(timestamp) FROM ohlcv WHERE symbol = 'AAPL';
# Sample data
SELECT * FROM ohlcv WHERE symbol = 'AAPL' ORDER BY timestamp DESC LIMIT 5;
Check Redis progress
# Count completed symbols
docker exec -it alpha-oracle-redis-1 redis-cli SCARD backfill:completed
-- Expected: 503
# Check specific symbol
docker exec -it alpha-oracle-redis-1 redis-cli SISMEMBER backfill:completed AAPL
-- Expected: 1 (true)
Troubleshooting
“Connection refused” (TimescaleDB)
docker compose up -d timescaledb
docker compose logs timescaledb
“Connection refused” (Redis)
docker compose up -d redis
“Alpha Vantage API key invalid”
- Check environment variable:
echo $SA_ALPHA_VANTAGE_API_KEY - Verify key at https://www.alphavantage.co/support/#api-key
- Set in
.envfile or export:export SA_ALPHA_VANTAGE_API_KEY=your_key_here
“Rate limit exceeded”
- Free tier: 5 req/min, 500 req/day
- Wait 24 hours for daily limit reset
- Upgrade to premium: https://www.alphavantage.co/premium/
“No data returned for symbol”
- Symbol may be delisted or invalid ticker
- Alpha Vantage does not have data for all symbols
- Check ticker spelling (use primary exchange symbol)
Slow progress
- Expected on free tier (5 req/min)
- Script displays accurate ETA based on actual rate
- Premium tier: 75 req/min → 15x faster
Next Steps
After backfill completes:
- Run backtest:
from src.strategy.backtest import BacktestEngine # ... -
Start daily bar job: System will automatically fetch latest bars daily via scheduler (6:00 AM ET)
- Enable strategies: Strategies require historical data for indicator calculations