// Validation Methodology

How QuantProof
scores your strategy

I built QuantProof because I got tired of backtests that looked great and fell apart in live trading. This page explains every check we run, why it exists, and the research behind it. If you're going to trust a score with real money, you deserve to know exactly how it's calculated.

Engine v6.2  ·  Updated 2026-03-23  ·  44 scored checks + 5 crash simulations + portfolio analyser
// How the Fundable Score is calculated
Risk Management
38%
Overfitting Detection
22%
Compliance
15%
Regime Robustness
14%
Execution Reality
11%
The 44 scored checks feed into five weighted categories. Two operational diagnostics (Alpha Decay, Symbolic Overfit) are appended to every report but do not affect category weights. One conditional check (Single-Regime Warning) appears only when detected. Information Ratio and Beta Exposure are in Overfitting (22%). Without dates, they return score=50 informational.

Score caps: no date column → cap 60 · single-regime backtest → cap 55 · institutional compliance failure → cap 60 · Sharpe >10 → cap 30. Plausibility checks (4) flag data integrity issues and can trigger Manual Audit Required — they do not contribute to weighted score.
00
Validation Depth (n_trials)
Controls Monte Carlo and DSR rigour — not a check category
This is one of the most important settings and most people miss it. n_trials tells the engine how many strategies you tested before submitting this one. The more strategies you tested, the higher your chances of finding a good-looking one by luck. A Sharpe of 1.5 means something very different if it's your first strategy vs your 50th — this parameter captures that difference.
Tiern_trialsWho uses itDSR correction factor
Quickn = 1Fast scan — no selection bias correctionE[max SR] ≈ 0 (uncorrected)
Standard (free)n = 10Default — typical retail trader tests ~10 strategiesE[max SR] ≈ 0.8–1.2σ
Institutional (₹299)n = 20Forced for PDF reports — more realistic selection biasE[max SR] ≈ 1.2–1.6σ
Deep (₹999 Pro)n = 20–50Slider — matches prop firm / fund pre-screening depthE[max SR] ≈ 1.6–2.2σ
The same strategy, same data, can score 80/100 at n=1 and 30/100 at n=50. Both numbers are correct — they're answering different questions. n=1 asks: does this strategy have genuine edge? n=50 asks: would this edge hold up if a fund manager was screening it against 50 alternatives? If you're serious about deploying, run both.
01
Overfitting Detection
22% of Fundable Score · 11 checks
This is where most backtests die. Optimise any strategy on historical data and it will look brilliant — that's just mathematics. The question is whether it has real predictive power or just memorised noise. Every check in this category is designed to catch the difference.
CORESharpe Ratio Decay
Splits the backtest in half and compares in-sample vs out-of-sample Sharpe ratio. Also checks mean return decay independently. Uses the worse of the two as the effective decay score.
If a strategy only works on the data it was built on, it was fitted to noise. Real edge doesn't evaporate the moment you look at new data — it decays gradually. A cliff-edge drop between in-sample and out-of-sample Sharpe is one of the clearest red flags in quantitative trading.
COREMonte Carlo Robustness
Three-component test: (1) bootstrap mean stability across 300 resamples, (2) sequence independence across 200 permutations, (3) worst-5th-percentile equity curve across 500 permutations.
The order your trades happened in history was partly random. If the strategy only works with that exact sequence — if shuffling the order destroys the returns — the edge isn't real. This test reshuffles thousands of times and checks whether the edge survives.
COREProfit / Loss Concentration
Measures what percentage of total profit comes from the top 1% of trades and the top 2 individual trades. Removes the top trades and recalculates Sharpe.
Remove your two best trades. Is the strategy still profitable? If those two trades are holding up the entire P&L, you don't have a strategy — you got lucky twice. This is surprisingly common and surprisingly easy to miss in standard backtesting.
ADVANCEDCPCV Path Stability
Combinatorial Purged Cross-Validation with 6 splits and 2 test folds. Applies embargo gaps to prevent data leakage. Generates multiple walk-forward paths and measures path Sharpe standard deviation. Thresholds: path_std < 1.5 = robust, < 2.5 = acceptable, ≥ 2.5 = fails.
Standard train/test splits leak information at the boundary. CPCV with purging and embargo prevents this.
Academic Reference
López de Prado — "Advances in Financial Machine Learning" (2018), Ch.12
ADVANCEDDeflated Sharpe Ratio (DSR)
The raw Sharpe ratio is almost always overstated. DSR corrects it for three things most people ignore: the fat tails in real trading returns (your losses are bigger than a normal distribution predicts), the fact that short backtests always look better by chance, and how many strategies you tested before finding this one. The result is the Sharpe ratio you'd need to see before being statistically confident the edge is real.
Most people don't realise how much selection bias inflates their results. Test 20 strategies and the best one will look great even if all 20 are pure noise. DSR is the only metric I know of that directly accounts for this. It's why the same strategy can score very differently depending on how many alternatives you tested first.
// DSR Formula — click to expand
// DSR Formula
SR* = SR × √(T-1) × (1 - γ·SR + (κ/4)·SR²)^(-½) E[max SR | n] = (1 - e^(-π/2))·Φ⁻¹(1 - 1/n) + e^(-π/2)·Φ⁻¹(1 - 1/(n·e)) DSR = Φ( (SR* - E[max SR | n_trials]) / σ_SR ) Where: T = number of trades γ = skewness of returns κ = excess kurtosis of returns n_trials = strategies tested (our n parameter) Φ = standard normal CDF σ_SR = √((1 + 0.5·SR²) / (T-1))
DSR is a probability (0–1). We scale it to 0–100. A DSR of 95/100 means there is a 95% chance the Sharpe ratio is not the result of selection from noise. DSR is capped at 99.8 in portfolio reports — 100/100 would imply absolute certainty, which is never warranted.
Academic Reference
Bailey, Borwein, López de Prado & Zhu — "The Deflated Sharpe Ratio" (2014), J. Portfolio Management
Harvey, Liu & Zhu — "…and the Cross-Section of Expected Returns" (2016), Review of Financial Studies
CORESharpe Confidence Interval
Computes the 95% CI for the annualised Sharpe using the Mertens (2002) standard error formula, corrected for skewness and kurtosis. Reports the lower CI bound — the pessimistic estimate of the real Sharpe.
A Sharpe of 1.5 on 30 trades has a CI so wide it includes zero.
ADVANCEDAutocorrelation (Ljung-Box + Newey-West)
Tests for serial correlation in returns using the Ljung-Box Q-statistic at lags 1, 5, and 10. If significant autocorrelation is detected, computes Newey-West HAC standard error and quantifies the inflation factor vs the IID assumption.
Serial correlation — common with look-ahead bias or return smoothing — invalidates DSR, CI, and Bootstrap tests by overstating significance.
// Ljung-Box Q-Statistic — click to expand
// Ljung-Box Q-Statistic
Q(h) = T(T+2) × Σₖ₌₁ʰ [ ρ̂²(k) / (T-k) ] Where: T = number of observations h = number of lags tested (we use h = 1, 5, 10) ρ̂(k) = sample autocorrelation at lag k H��: returns are independently distributed Reject H₀ if Q(h) > χ²(h, 0.05) If rejected, Newey-West HAC correction: σ²_NW = Σₜ εₜ² + 2·Σₖ₌₁ᴸ w(k)·Σₜ εₜ·εₜ₋ₖ where w(k) = 1 - k/(L+1), L = floor(4·(T/100)^(2/9))
We test at lags 1, 5, and 10. A p-value below 0.05 at any lag indicates the returns are not independently distributed — the strategy may have look-ahead bias or the test statistics are inflated.
Academic Reference
Ljung & Box — "On a Measure of Lack of Fit in Time Series Models" (1978), Biometrika 65(2)
Newey & West — "A Simple... Heteroskedasticity and Autocorrelation Consistent Covariance Matrix" (1987), Econometrica 55(3)
COREBootstrap Stability
500 bootstrap resamples of the trade sequence. Reports the percentage with positive mean return.
A robust edge shows positive expectancy in the vast majority of resampled universes.
COREMinimum Backtest Length
Computes the Harvey-Liu-Zhu minimum observations required for statistical significance at the strategy's Sharpe level, adjusted for n_trials.
Short backtests always overfit. This check quantifies exactly how many trades you need before the results are credible.
INFORMATIONALInformation Ratio (vs Benchmark)
When dates are present, SPY is automatically fetched and aligned. Computes OLS beta, Jensen's alpha (annualised), tracking error, and IR. IR > 0.5 = strong alpha. Without dates, returns score=50 informational.
A strategy backtested in a bull market with high beta looks good because the market went up, not because of skill.
INFORMATIONALBeta Exposure (vs Benchmark)
SPY inferred automatically when dates present. Beta > 0.8 triggers Manual Audit Required. Beta > 0.6 = high market exposure. Informational only without dates.
High beta means returns are explained by market direction, not strategy skill.
02
Risk Management
38% of Fundable Score · 11 checks
Risk management gets the highest weight because it's where live accounts actually die. I've seen strategies with beautiful Sharpe ratios blow up accounts in three months because nobody stress-tested the drawdown. Genuine edge isn't enough if you can't survive the variance to collect it.
Max Drawdown
Maximum peak-to-trough equity curve decline as a percentage. Institutional threshold: <20%. Prop firm threshold: <10%.
Every prop firm and fund has a hard number here. Exceed it and the account is closed — doesn't matter how good your average returns are. Knowing your worst case before you go live is not optional.
Calmar Ratio
CAGR divided by Maximum Drawdown. Uses geometric annualisation. Threshold: > 1.5.
Measures return per unit of drawdown risk — a strategy with high returns but equal drawdown is not efficient.
Value at Risk (VaR 99%)
99th percentile loss expressed as a ratio to strategy volatility (VaR/σ). Normal distribution gives VaR/σ ≈ 2.33. Ratios above 4.0σ signal extreme fat tails.
Absolute VaR without context is meaningless. Normalising by volatility reveals whether tail risk is proportional to normal behaviour.
CVaR / Expected Shortfall
Average loss in the worst 5% of outcomes. Uses kernel density estimation for small samples (<100 trades). CVaR is the risk measure used by Basel III for bank capital requirements.
VaR tells you the door. CVaR tells you what's on the other side. Knowing the threshold is only half the picture — the average of your worst outcomes matters just as much for position sizing and risk of ruin.
Sortino Ratio
Return divided by downside deviation only, using the correct Sortino & van der Meer (1991) formula.
Sharpe penalises upside volatility equally with downside. Sortino only penalises harmful volatility.
Max Losing Streak
Longest consecutive losing trades. Also flags zero losing streaks over 20+ trades as implausible — a signature of look-ahead bias.
A run of consecutive losses will test you psychologically and financially. Most prop firms pull accounts after 5-8 in a row. More importantly: if you don't know your worst historical streak before going live, you'll be genuinely shocked when it happens — and that shock leads to bad decisions.
Recovery Factor
Ratio of total gross profits to total gross losses. Threshold: > 1.5.
A strategy where wins barely cover losses has no margin for the inevitable variance.
Absolute Sharpe Ratio
Annualised Sharpe using sample standard deviation (ddof=1) with correct annualisation for the strategy's actual trade frequency. Institutional minimum: 1.0.
The baseline risk-adjusted return measure. Required by every CTA, hedge fund, and prop firm.
ADVANCEDProbability of Ruin
Generalised gambler's ruin at institutional position sizing (N=50 capital units, 2% risk per trade). Uses reward-to-risk ratio in the ruin calculation, not just win rate — the asymmetric formula accounts for unequal win/loss sizes.
At 2% risk per trade, a genuinely profitable strategy should have near-zero ruin probability.
Academic Reference
Vince — "The Mathematics of Money Management" (1992)
Drawdown Duration & Recovery
Time in maximum drawdown, scaled to backtest length. Threshold scales from 180 days up to 730 days for long backtests. Checks whether the strategy ever recovered from its worst drawdown.
Fixed thresholds penalise long backtests unfairly — thresholds scale with backtest length.
Ulcer Index
Root mean square of all drawdown percentages throughout the backtest. Also reports the Martin Ratio (Sharpe / Ulcer Index).
Max drawdown is one number from one bad period. The Ulcer Index captures how painful the entire journey was — a strategy that spends 18 months in a slow grind down is different from one that had one sharp drop and recovered. The Ulcer Index reflects that difference.
03
Regime Robustness
14% of Fundable Score · 6 checks
Most retail backtests run from 2019 to 2024. That's almost entirely a bull market interrupted by a 33-day crash that recovered in months. A strategy that looks great over that period hasn't been tested — it's been lucky. These checks look for what happens when the market isn't cooperating.
Bull / Bear / Consolidation Performance
When date data is present, classifies trades using actual market regime periods (S&P 500 bull/bear cycles from 2000–2026). Without dates, uses rolling return proxy. Computes Sharpe and win rate for each regime separately.
If your strategy only has data from 2019-2024, you've essentially backtested one market regime. The strategy hasn't seen a real bear market, a rate hiking cycle, or an extended sideways grind. This check uses actual historical regime periods to find out.
ADVANCEDVolatility Spike Stress Test
Simulates a 3x volatility spike using Student's t-distribution (df=3, fat tails) with AR(1) autocorrelation to model crisis momentum. Strategy-specific seed ensures different strategies get different stress paths.
Volatility doesn't change gradually — it spikes. A strategy calibrated to a 15 VIX environment can be completely destroyed when VIX hits 45. This stress test simulates that jump and measures how badly the strategy breaks.
Performance Consistency
Percentage of rolling windows with positive mean return throughout the backtest.
A strategy profitable on average but with extended unprofitable periods is difficult to trade live.
CONDITIONALSingle-Regime Backtest Warning
Injected only when detected: if > 85% of dated trades fall within a single market regime, this check appears with a Manual Audit Required flag and triggers a score cap at 55.
A strategy backtested entirely in one regime has never been tested in other conditions. The score cap prevents these strategies from appearing fundable.
Regime Coverage
Checks whether the backtest includes a regime column covering multiple market conditions (BULL, BEAR, CONSOLIDATION, TRANSITION).
Explicit regime labelling enables regime-conditional position sizing in live trading.
04
Execution Reality
11% of Fundable Score · 6 checks
Backtests lie about execution. They assume you always get filled at the price you wanted, pay no commissions, and never move the market. None of that is true. The gap between backtest and live performance is almost always explained by execution costs — and these checks quantify exactly how much of your edge survives them.
Slippage Impact (0.1% and 0.3%)
Two checks. 0.1% = institutional / large-cap. 0.3% = retail / small-cap / illiquid. Both scale with trade frequency.
This is the most common way apparently profitable strategies fail in live trading. The edge was real — but it was smaller than the cost of executing it. 0.1% slippage on every trade adds up faster than most people think, especially at high frequency.
Commission Drag
Three tiers by frequency: 2bps institutional (≤252/yr), 10bps retail equity (≤2520/yr), 20bps HFT (>2520/yr).
Commission cost scales with frequency — high-frequency strategies pay proportionally more.
Partial Fill Simulation
Models 80% fill rate — common in illiquid markets and fast sessions.
Most backtests assume 100% fills. In reality, limit orders miss and fast markets leave orders partially unfilled.
Live Trading Gap Estimate
Applies a 40% Sharpe degradation from backtest to live, based on empirical research on retail strategy performance.
Every strategy loses some edge when deployed live. A strategy needs sufficient backtest Sharpe to still be viable after this degradation.
ADVANCEDImpact-Adjusted Capacity (Almgren-Chriss)
Computes strategy Sharpe at $100k, $1M, $10M, and $100M AUM using the Almgren-Chriss optimal execution model. Uses the strategy's actual volatility and trade frequency to estimate market impact at each scale.
This is what nobody tells you when you start scaling up. A strategy that works beautifully with ₹50k starts to break at ₹5L and is completely destroyed at ₹5Cr — because your own orders move the market against you. The Almgren-Chriss model estimates exactly where your specific strategy starts to degrade.
// Almgren-Chriss Market Impact — click to expand
// Almgren-Chriss Market Impact
Cost(X) = τ·γ·X²/2·σ² + η·X²/T Where: X = total shares to trade T = execution time horizon σ = daily volatility of the asset γ = permanent impact coefficient η = temporary impact coefficient τ = risk aversion parameter Impact-adjusted Sharpe: SR_adjusted = (μ - Cost(X)/Capital) / σ_portfolio We compute at 4 AUM levels: $100k → retail / prop firm challenge scale $1M → small fund / serious retail $10M → institutional small allocation $100M → fund-level — most retail strategies fail here
Strategies with high trade frequency and high AUM face multiplicative impact costs. A scalping strategy at $10M AUM can have 3–5x the effective costs of the same strategy at $100k.
Academic Reference
Almgren & Chriss — "Optimal Execution of Portfolio Transactions" (2000), J. Risk 3(2)
05
Compliance
15% of Fundable Score · 3 checks
Two very different sets of rules for two very different audiences. Prop firm rules are strict on drawdown because the firm is risking their capital on a 30-day challenge. Institutional rules care more about long-term Sharpe and Calmar because a fund allocator is thinking in years not months. A strategy can pass one and fail the other — knowing which is useful before you pay for a challenge.
Prop Firm Compliance Gate (5-gate quick check)
Fast binary gate: positive mean return, Sharpe ≥ 1.0, max drawdown < 20%, win rate > 45%, trades > 50. Requires 4/5 gates in standard mode, all 5 in strict mode.
Immediately identifies strategies far from prop firm eligible before running detailed per-firm analysis.
Prop Firm Compliance (FTMO / Topstep / The5ers)
Checks all three major prop firm challenge rules: max drawdown ≤10%, profit target ≥6–10%, Sharpe ≥1.0. Reports which firms the strategy is eligible for.
Each firm has slightly different rules. Knowing which violations to fix saves time before paying for a challenge.
Institutional Compliance (CTA / Fund Standard)
Sharpe ≥1.0, max drawdown <25%, Calmar ≥1.0. More appropriate than prop firm rules for long multi-year backtests.
A 5-year institutional backtest should not be judged by 30-day prop firm challenge rules.
06
Historical Crash Simulations
Informational — do not affect Fundable Score
These don't affect your score — they're here because I think every trader should see how their strategy would have behaved in the worst markets of the last 30 years. We simulate each crash using fat-tailed distributions because real crises have fatter tails than normal markets. Each strategy gets a unique path based on its own volatility profile — the same crash hits a high-vol strategy much harder than a low-vol one.
2008 Global Financial Crisis
Sep 2008 – Mar 2009
S&P 500 −56%. VIX hit 80. Correlations spiked to 1. Leverage unwinding forced selling across all asset classes simultaneously.
2020 COVID Crash
Feb 2020 – Mar 2020
Fastest 30%+ drop in history — 34% in 33 days. Circuit breakers triggered 4 times. Then one of the fastest recoveries ever.
2022 Rate Hike Bear Market
Jan 2022 – Dec 2022
Fed raised rates 425bps in 12 months. Momentum strategies destroyed. Growth stocks fell 60–90%.
2010 Flash Crash
May 6, 2010
Dow dropped 1000 points in minutes from algorithmic cascade. Liquidity vanished — many stocks traded at $0.01.
1998 LTCM / Russia Default
Aug–Sep 1998
Russia defaulted; LTCM collapsed. Correlation assumptions broke simultaneously. Strategies with similar factor exposures destroyed together.
Crash simulations are illustrative — they show stress exposure, not predictions. Strategies without date data receive [UNCALIBRATED] labels because market regime cannot be verified. Strategies with high AR(1) autocorrelation receive a 30% exposure multiplier.
07
Portfolio Analyser
Separate product — ₹199 · 2–3 strategies · institutional-grade portfolio construction
Running three strategies that look different is not the same as running a diversified portfolio. Two momentum strategies on different assets will still crash together in a bear market. This analyser runs the full 44-check validation on each strategy, then measures whether they genuinely diversify each other — using the same methods a quantitative fund would use before allocating capital across multiple strategies.
// CORRELATION METHODOLOGY

We use normalised Pearson correlation on unit-variance PnL returns, not raw PnL values. This removes scale-dependence — a ₹100-average-trade Kite strategy and a ₹10,000-average-trade IBKR strategy are compared on signal shape, not magnitude.

// Normalised Correlation — click to expand
// Normalised Correlation
r̂ᵢ = rᵢ / σᵢ (normalise each series to unit variance) ρ(i,j) = Σₜ r̂ᵢ(t)·r̂ⱼ(t) / √[Σ r̂ᵢ²·Σ r̂ⱼ²] Date-aligned when dates available (inner join). Positional alignment as fallback (trade 1 vs trade 1). ⚠ Positional alignment is meaningless if strategies trade at different frequencies. Upload dated CSVs for accurate correlation.
We use Pearson (linear) correlation. For options-heavy portfolios, non-linear tail dependence (copulas) would be more accurate — Pearson is appropriate for equity and futures strategies. All correlation matrices are verified positive semi-definite.
// RISK PARITY WEIGHTS

Portfolio is weighted by inverse volatility (risk parity), not equal weight. Each strategy is allocated in proportion to 1/vol so that all strategies contribute equally to total portfolio variance. This is the institutional standard for multi-strategy allocation.

// Risk Parity (Inverse Volatility) — click to expand
// Risk Parity (Inverse Volatility)
wᵢ = (1/σᵢ) / Σⱼ(1/σⱼ) Verification: wᵢ·σᵢ = constant for all i (equal risk contribution) Σᵢ wᵢ = 1.0 Combined portfolio return: rₚ(t) = Σᵢ wᵢ·rᵢ(t) (padded with 0 for missing trades) Combined Sharpe: SR_p = E[rₚ] / σ(rₚ) · √252 Where σᵢ is computed over the full historical PnL series. Note: assumes stationarity — vol regime shifts are not captured.
Risk parity weights assume historical volatility reflects future volatility. For live strategies, vol estimates should be recomputed periodically. For backtests, full-period vol is the appropriate baseline.
// DRAWDOWN COINCIDENCE

The diversification score can look good while strategies still crash together. Drawdown coincidence measures what fraction of drawdown-period timesteps have 2+ strategies simultaneously in drawdown — the actual stress correlation that matters for live trading.

// Drawdown Coincidence Formula — click to expand
// Drawdown Coincidence Formula
DD_score(t) = 1 if DDᵢ(t) < threshold, 0 otherwise where DDᵢ(t) = (EQᵢ(t) - max EQᵢ(s≤t)) / max EQᵢ(s≤t) threshold = -2% (configurable) C = |{t : Σᵢ DD_score(t) ≥ 2}| / |{t : Σᵢ DD_score(t) ≥ 1}| Interpretation: C < 0.30 → Strategies fall independently (good) C = 0.30–0.60 → Moderate simultaneous drawdowns C > 0.60 → High coincidence — portfolio drawdown protection is largely illusory
Drawdown coincidence is independent of correlation. Two strategies can have low linear correlation but high coincidence if they both fail during the same macro regimes. This is a more honest measure of real diversification than correlation alone.
// PORTFOLIO METRICS AT A GLANCE
Diversification Score
0–100 score. Per-pair scoring: near-zero corr → 82–90. Negative corr (hedge) → 96–100. Corr 0.3–0.6 → 10–40. Corr > 0.6 → near 0. Sharpe quality adds up to +8 bonus.
Combined Sharpe
Risk-parity weighted combined portfolio Sharpe. Not the average of individual Sharpes — the actual portfolio return divided by portfolio volatility. Should exceed average individual Sharpe for uncorrelated strategies (Markowitz benefit).
Combined Max Drawdown
Maximum drawdown computed from the actual risk-parity weighted combined equity curve — not the average of individual drawdowns. A well-diversified portfolio has a combined DD far below any individual strategy.
Portfolio DSR
Deflated Sharpe applied to the combined strategy, correcting for n strategies tested. Capped at 99.8 — 100/100 would imply absolute certainty, which is never warranted. Same methodology as single-strategy DSR.
⚠ Known limitations: (1) Pearson is linear — options portfolios need copula-based tail dependence. (2) Risk parity weights assume stationary volatility — recompute periodically for live strategies. (3) Positional alignment (no dates) is unreliable for strategies at different frequencies. (4) n=10 per strategy in portfolio mode for latency reasons — use single-strategy validation at n=50 before adding a strategy to a portfolio.
08
Operational Checks
Not in weighted score — always appended, always visible
These two checks don't affect your score but I always read them first. Alpha Decay tells you if the strategy is gradually losing its edge over the backtest period — which is often the most honest signal of whether it will work going forward. Symbolic Overfit catches the strategies that look statistically clean but are suspiciously too perfect compared to real noise.
Alpha Decay (Half-Life)
Compares first-half vs second-half Sharpe. Fits an exponential decay curve to the ACF to estimate the half-life of the alpha signal.
A strategy where performance deteriorates significantly in the second half may be losing its edge or was overfitted to the first period.
Symbolic Overfit Detection
Compares equity curve characteristics (smoothness, entropy, kurtosis) against 50 noise baselines with the same volatility. Reports an overfit score 0–100 and adjusted Sharpe penalised for model complexity.
Complements statistical overfit checks (CPCV, DSR) with a pattern-based detector. A strategy that looks suspiciously smooth may be structurally overfitted even if statistical tests pass.
09
Plausibility Checks
4 checks — flag data integrity issues, can trigger Manual Audit Required
These checks exist because people sometimes upload data that can't possibly reflect real trading — whether by accident or on purpose. A Sharpe of 15 has never been documented in any institutional strategy over a multi-year period. An equity curve that never drawdowns more than 2% over 500 trades is almost certainly look-ahead bias. These flags don't punish you — they tell you the data needs a closer look before any score means anything.
Sharpe Plausibility
Sharpe > 10 triggers Manual Audit Required. Sharpe > 5 triggers Review Recommended. No documented institutional strategy has sustained Sharpe > 3.0 over 5+ years.
Frequency-Return Plausibility
HFT strategies (>20,000 trades/year) with extreme annual returns (>500%) require impossible liquidity and are flagged as implausible.
Equity Curve Smoothness
Suspiciously smooth curves with very low max drawdown (<2%) are a known signature of look-ahead bias or data errors.
Kelly Plausibility
Kelly fraction > 1.0 (bet more than your entire bankroll every trade) is mathematically impossible in a real strategy.
⚠   QuantProof validates the statistical properties of backtests — it doesn't predict the future. A high score means your strategy has the right foundations. What happens in live markets depends on a hundred things this tool can't see. Not financial advice — do your own due diligence before risking real capital.

Validate your strategy now

Upload your CSV and get a full score in under 30 seconds. Free to start — no account needed.
3 validations per day on the free tier.

← Back to Validator