CORESharpe Ratio Decay
Splits the backtest in half and compares in-sample vs out-of-sample Sharpe ratio. Also checks mean return decay independently. Uses the worse of the two as the effective decay score.
If a strategy only works on the data it was built on, it was fitted to noise. Real edge doesn't evaporate the moment you look at new data — it decays gradually. A cliff-edge drop between in-sample and out-of-sample Sharpe is one of the clearest red flags in quantitative trading.
COREMonte Carlo Robustness
Three-component test: (1) bootstrap mean stability across 300 resamples, (2) sequence independence across 200 permutations, (3) worst-5th-percentile equity curve across 500 permutations.
The order your trades happened in history was partly random. If the strategy only works with that exact sequence — if shuffling the order destroys the returns — the edge isn't real. This test reshuffles thousands of times and checks whether the edge survives.
COREProfit / Loss Concentration
Measures what percentage of total profit comes from the top 1% of trades and the top 2 individual trades. Removes the top trades and recalculates Sharpe.
Remove your two best trades. Is the strategy still profitable? If those two trades are holding up the entire P&L, you don't have a strategy — you got lucky twice. This is surprisingly common and surprisingly easy to miss in standard backtesting.
ADVANCEDCPCV Path Stability
Combinatorial Purged Cross-Validation with 6 splits and 2 test folds. Applies embargo gaps to prevent data leakage. Generates multiple walk-forward paths and measures path Sharpe standard deviation. Thresholds: path_std < 1.5 = robust, < 2.5 = acceptable, ≥ 2.5 = fails.
Standard train/test splits leak information at the boundary. CPCV with purging and embargo prevents this.
Academic Reference
López de Prado — "Advances in Financial Machine Learning" (2018), Ch.12
ADVANCEDDeflated Sharpe Ratio (DSR)
The raw Sharpe ratio is almost always overstated. DSR corrects it for three things most people ignore: the fat tails in real trading returns (your losses are bigger than a normal distribution predicts), the fact that short backtests always look better by chance, and how many strategies you tested before finding this one. The result is the Sharpe ratio you'd need to see before being statistically confident the edge is real.
Most people don't realise how much selection bias inflates their results. Test 20 strategies and the best one will look great even if all 20 are pure noise. DSR is the only metric I know of that directly accounts for this. It's why the same strategy can score very differently depending on how many alternatives you tested first.
// DSR Formula — click to expand
// DSR Formula
SR* = SR × √(T-1) × (1 - γ·SR + (κ/4)·SR²)^(-½)
E[max SR | n] = (1 - e^(-π/2))·Φ⁻¹(1 - 1/n)
+ e^(-π/2)·Φ⁻¹(1 - 1/(n·e))
DSR = Φ( (SR* - E[max SR | n_trials]) / σ_SR )
Where:
T = number of trades
γ = skewness of returns
κ = excess kurtosis of returns
n_trials = strategies tested (our n parameter)
Φ = standard normal CDF
σ_SR = √((1 + 0.5·SR²) / (T-1))
DSR is a probability (0–1). We scale it to 0–100. A DSR of 95/100 means there is a 95% chance the Sharpe ratio is not the result of selection from noise. DSR is capped at 99.8 in portfolio reports — 100/100 would imply absolute certainty, which is never warranted.
Academic Reference
Bailey, Borwein, López de Prado & Zhu — "The Deflated Sharpe Ratio" (2014), J. Portfolio Management
Harvey, Liu & Zhu — "…and the Cross-Section of Expected Returns" (2016), Review of Financial Studies
CORESharpe Confidence Interval
Computes the 95% CI for the annualised Sharpe using the Mertens (2002) standard error formula, corrected for skewness and kurtosis. Reports the lower CI bound — the pessimistic estimate of the real Sharpe.
A Sharpe of 1.5 on 30 trades has a CI so wide it includes zero.
ADVANCEDAutocorrelation (Ljung-Box + Newey-West)
Tests for serial correlation in returns using the Ljung-Box Q-statistic at lags 1, 5, and 10. If significant autocorrelation is detected, computes Newey-West HAC standard error and quantifies the inflation factor vs the IID assumption.
Serial correlation — common with look-ahead bias or return smoothing — invalidates DSR, CI, and Bootstrap tests by overstating significance.
// Ljung-Box Q-Statistic — click to expand
// Ljung-Box Q-Statistic
Q(h) = T(T+2) × Σₖ₌₁ʰ [ ρ̂²(k) / (T-k) ]
Where:
T = number of observations
h = number of lags tested (we use h = 1, 5, 10)
ρ̂(k) = sample autocorrelation at lag k
H��: returns are independently distributed
Reject H₀ if Q(h) > χ²(h, 0.05)
If rejected, Newey-West HAC correction:
σ²_NW = Σₜ εₜ² + 2·Σₖ₌₁ᴸ w(k)·Σₜ εₜ·εₜ₋ₖ
where w(k) = 1 - k/(L+1), L = floor(4·(T/100)^(2/9))
We test at lags 1, 5, and 10. A p-value below 0.05 at any lag indicates the returns are not independently distributed — the strategy may have look-ahead bias or the test statistics are inflated.
Academic Reference
Ljung & Box — "On a Measure of Lack of Fit in Time Series Models" (1978), Biometrika 65(2)
Newey & West — "A Simple... Heteroskedasticity and Autocorrelation Consistent Covariance Matrix" (1987), Econometrica 55(3)
COREBootstrap Stability
500 bootstrap resamples of the trade sequence. Reports the percentage with positive mean return.
A robust edge shows positive expectancy in the vast majority of resampled universes.
COREMinimum Backtest Length
Computes the Harvey-Liu-Zhu minimum observations required for statistical significance at the strategy's Sharpe level, adjusted for n_trials.
Short backtests always overfit. This check quantifies exactly how many trades you need before the results are credible.
INFORMATIONALInformation Ratio (vs Benchmark)
When dates are present, SPY is automatically fetched and aligned. Computes OLS beta, Jensen's alpha (annualised), tracking error, and IR. IR > 0.5 = strong alpha. Without dates, returns score=50 informational.
A strategy backtested in a bull market with high beta looks good because the market went up, not because of skill.
INFORMATIONALBeta Exposure (vs Benchmark)
SPY inferred automatically when dates present. Beta > 0.8 triggers Manual Audit Required. Beta > 0.6 = high market exposure. Informational only without dates.
High beta means returns are explained by market direction, not strategy skill.