ML Signal Intelligence

The system uses machine learning (specifically XGBoost, a gradient boosting algorithm) to predict stock price movements and generate buy/sell signals. This is more sophisticated than rule-based strategies because it learns patterns from data.

What Are ML Signals?

ML signals are predictions about whether a stock will go up, down, or stay flat over the next 5 trading days. The system:

Collects 50+ features about each stock (technical indicators, fundamentals, market data)
Trains an XGBoost model to predict 5-day forward returns
Classifies movements: UP (>1%), DOWN (<-1%), or FLAT (-1% to +1%)
Generates signals: Only acts on predictions with high confidence (≥55%)

The model learns from historical data and continuously retrains to adapt to changing market conditions.

The 50+ Features

Features are organized into five categories:

Technical Features (20+ features)

Price and volume-based indicators:

Moving averages (10-day, 50-day, 200-day)
RSI (Relative Strength Index)
MACD (Moving Average Convergence Divergence)
Bollinger Bands
Volume ratios (volume relative to 20-day average)
Price momentum (5-day, 10-day, 20-day returns)

Fundamental Features (10+ features)

Financial statement data:

P/E ratio (price-to-earnings)
P/B ratio (price-to-book)
P/S ratio (price-to-sales)
EV/EBITDA (enterprise value to earnings)
Debt-to-equity ratio
ROE (return on equity)
Revenue growth
Earnings growth
Sector and industry classification

Cross-Asset Features (8 features)

Market context:

SPY correlation (correlation with S&P 500)
Sector correlation (correlation with sector ETF)
SPY beta (sensitivity to market movements)
VIX level (market volatility index)
Relative strength vs. SPY
Relative strength vs. sector

Alternative Features (6 features)

Non-traditional data:

Insider buying/selling (Form 4 filings)
Net insider transaction value
Short interest ratio
Days to cover short positions
Short interest change rate

Temporal Features (4 features)

Time-based patterns:

Day of week
Week of month
Month of year
Quarter

All features are engineered with point-in-time correctness, meaning they only use data that would have been available at prediction time. No future leaking.

How Predictions Work

Training Phase

Historical data: Collect 2+ years of daily bars and fundamentals
Feature computation: Calculate all 50+ features for each stock-date
Label creation: Tag each date with 5-day forward return (UP/DOWN/FLAT)
Model training: XGBoost learns patterns that predict forward returns
Validation: Test on out-of-sample data to measure accuracy

Minimum 500 training samples required before model is considered valid.

Prediction Phase

Latest data: Fetch today’s price, volume, fundamentals
Compute features: Calculate all 50+ features using latest data
Model inference: XGBoost predicts probability of UP/DOWN/FLAT
Confidence check: Only act if max probability ≥ 55%
Signal generation: Create BUY signal for UP, SELL signal for DOWN

Example:

Stock: AAPL
Features: [RSI=45.2, MA_10/50_cross=True, PE=28.4, ...]
Prediction: [DOWN: 15%, FLAT: 25%, UP: 60%]
Confidence: 60% (≥ 55% threshold)
→ Signal: BUY (LONG) with strength 0.60

Confidence Calibration

Raw probabilities from XGBoost can be overconfident. The system uses confidence calibration to adjust probabilities to match empirical accuracy.

If the model says “60% confident,” calibration ensures that historically, it was correct ~60% of the time at that confidence level.

This prevents overtrading on false confidence.

Model Retraining

The market changes constantly, so the model must adapt:

Retraining schedule: Every Sunday at 2am (weekly)
Incremental learning: Uses latest data from the past week
Staleness detection: Model is considered stale after 14 days without retraining
Automatic fallback: If model is stale or missing, system uses rule-based strategies instead

Retraining job runs automatically via APScheduler cron. No manual intervention needed.

Drift Monitoring

Data drift occurs when the statistical properties of features change over time. The system monitors drift using PSI (Population Stability Index):

PSI score: Measures how much feature distributions have shifted
Thresholds: PSI > 0.1 = minor drift, PSI > 0.2 = major drift
Alerts: Dashboard shows drift warnings when detected
Response: Major drift triggers model retraining

Drift monitoring runs daily and is displayed on the Model Health dashboard.

MLSignalStrategy Details

The ML strategy operates like other strategies but with ML-generated signals:

Name: ml_xgboost_signal
Min hold days: 3 (PDT compliant)
Signal generation: Only high-confidence predictions (≥55%)
Feature importance: Tracks which features drive predictions
Model path: models/xgboost_signal.joblib

If the model is missing or stale, the strategy returns FLAT signals (no trades) until a valid model is available.

Model Performance Tracking

The dashboard displays:

Rolling accuracy: 30-day and 90-day accuracy rates
Precision/recall: For UP and DOWN predictions
Feature importance: Top 10 features driving predictions
Drift heatmap: PSI scores for all features
Model version history: Track deployed models over time

Configuration

ML settings in config/settings.yaml:

ml:
  prediction_horizon: 5              # Predict 5-day forward returns
  up_threshold: 0.01                 # >1% = UP
  down_threshold: -0.01              # <-1% = DOWN
  min_training_samples: 500          # Min data points for training
  retrain_interval_days: 7           # Weekly retraining
  model_staleness_days: 14           # Model expires after 14 days
  confidence_threshold: 0.55         # Min 55% confidence to act

  xgb_params:                        # XGBoost hyperparameters
    n_estimators: 300
    max_depth: 6
    learning_rate: 0.05
    subsample: 0.8
    colsample_bytree: 0.8

When to Use ML Signals

Advantages:

Learns complex, non-linear patterns
Adapts to changing market conditions
Integrates diverse data sources (technical, fundamental, alternative)
Feature importance shows what drives predictions

Disadvantages:

Requires more data (500+ samples)
Computationally expensive (retraining takes time)
Less interpretable than rule-based strategies
Risk of overfitting to historical patterns

Recommended for:

Users comfortable with machine learning concepts
Portfolios with access to rich fundamental and alternative data
Long-term trading (5+ day horizons)

Trading Strategies — Rule-based strategies (Swing Momentum, Mean Reversion, Value Factor)
Risk Management — How ML signals are constrained by risk limits
PDT Rule — Why MLSignalStrategy uses min_hold_days = 3
Monitoring & Alerts — Model health notifications