The Harrying: Methods and Findings

Dataset

4,451 finished Premier League fixtures across 12 seasons:

Season Fixtures SM Weather Backfilled Coverage Notes
2014/153800%37899%Open-Meteo only
2015/163800%380100%Open-Meteo only
2016/173800%37899%Open-Meteo only
2017/183800%380100%Open-Meteo only
2018/193800%380100%Open-Meteo only
2019/203800%380100%Open-Meteo only
2020/2138024% (92)288100%Empty/restricted crowds
2021/2238097% (370)10100%Full crowds returned
2022/2338082% (312)68100%
2023/2438099% (375)5100%
2024/25380100% (380)0100%
2025/26271100% (271)0100%Season in progress (GW27)

Total: 4,451 fixtures, 4,447 with weather data (99.9%). 35 unique teams. 45 unique venues.

Data sources

SportMonks Football API

  • Fixtures with participants;scores;state;weatherReport;venue includes
  • Season IDs: 12, 10, 13, 6397, 12962, 16036, 17420, 18378, 19734, 21646, 23614, 25583
  • Weather data from SportMonks weatherReport include: temperature, humidity, wind speed/direction, pressure, cloud cover, description text
  • Weather category derived from description text: clear, cloudy, light_rain, heavy_rain, snow, fog

Open-Meteo Historical Weather API

  • Monthly climate normals per venue (temperature, humidity, precipitation, wind speed)
  • Used to compute temperature deviation from seasonal expectation
  • Backfill source for 2,647 fixtures: all of 2014/15–2019/20 (SportMonks weather not available for those seasons), plus 371 fixtures in 2020/21–2023/24 missing match-day reports
  • Backfill uses hourly archive endpoint with WMO weather codes mapped to the same 6 categories
  • Backfilled entries marked with "(open-meteo backfill)" in description field
  • Free tier, no API key. Aggressive rate limiting (~10 req/min)

Club Elo API

  • Full Elo history per team fetched as CSV from api.clubelo.com
  • Rating looked up for each fixture date
  • Win probability formula: 1 / (1 + 10^(-delta/400)) with HOME_ADV = 65
  • Why 65? Club Elo’s modelling uses a fixed home-field advantage offset of 65 Elo points, and we keep that value for internal consistency rather than fitting a Premier League-specific parameter.
  • Draw probability: 0.26 * (1 - |homeWinProb - 0.5| * 2)

Pitch dimensions

  • CSV source of truth: data/premier-league-pitch-dimensions-all-seasons.csv
  • Covers all seasons, and accounts for stadium changes.

Travel distance

  • Haversine distance from away team's most common home venue to fixture venue
  • One venue correction: SportMonks had Tottenham's pre-2022 venue at coordinates in Ontario, Canada (42.98N). Corrected to 51.60N

Match ball

  • One ball per season. Nike Ordem 2014/15–2019/20, Nike Flight 2020/21 through 2024/25, Puma Orbita 2025/26
  • Perfectly confounded with season; cannot isolate ball effect

Elo adjustment methodology

For each fixture, compute:

  • Actual outcome: H = +1, D = 0, A = -1
  • Elo-expected outcome: (expectedHomeWinPct - expectedAwayWinPct) / 100
  • Raw residual: actual - expected (per fixture)
  • Group residual: mean of raw residuals within group

The raw baseline residual across all 4,447 fixtures is -19.9%. The Elo model (with HOME_ADV = 65) systematically overestimates home advantage relative to actual PL outcomes. All reported "Elo Adj" figures are relative to this baseline, so that +0.0% = average, positive = conditions favour home beyond what team quality explains, negative = conditions favour away.

This controls for the fact that different groups contain different teams. For example, "snow" fixtures disproportionately feature northern clubs who happen to be strong (Liverpool, Man City). Without Elo adjustment, snow might appear neutral because the strong home teams mask the weather effect.


Findings

Weather conditions (n=4,447, 99.9% coverage)

Extreme weather significantly erodes home advantage:

ConditionNHome win%Elo AdjSignal
Clear83942.9%-2.0%Neutral
Cloudy2,28045.9%+2.2%Neutral
Light rain1,16744.0%-0.5%Neutral
Heavy rain4136.6%-19.9%Strong away
Snow9336.6%-15.5%Strong away
Fog2725.9%-18.6%Strong away

Mild conditions (clear, cloudy, light rain) are functionally identical for home advantage. Severe conditions (heavy rain, snow, fog) reduce home advantage by 15-20 percentage points after controlling for team quality.

High humidity (90%+) shows a modest away lean: -1.7% Elo-adjusted (n=727).

Travel distance (n=4,447)

Monotonic gradient:

DistanceNHome win%Elo Adj
<50 km67742.1%-3.3%
50-100 km44642.4%-2.6%
100-200 km1,43044.8%-0.2%
200-300 km1,30144.8%+1.5%
300+ km59346.9%+2.9%

6.2pp swing from short to long travel. The gradient is smooth and monotonic. Short-distance fixtures (derbies) reduce home advantage; long-distance away trips amplify it. The effect is directionally consistent across all bands but modest in absolute terms.

Day of week (n=4,447)

DayNHome win%Elo Adj
Monday25544.7%+2.9%
Tuesday23747.3%+9.1%
Wednesday33946.0%+2.7%
Thursday12241.0%-2.4%
Friday10844.4%-3.8%
Saturday2,25144.8%-1.0%
Sunday1,13542.9%-0.8%

Tuesday is by far the strongest home day (+9.1%, n=237), likely reflecting midweek fixtures where short-notice travel disadvantages away fans and teams. Wednesday is also mildly positive (+2.7%). Friday and Thursday are away-leaning. Sunday (-0.8%) is essentially neutral over 12 seasons — the stronger Sunday away effect visible in smaller samples does not persist.

Latitude (n=4,447)

BandNHome win%Elo Adj
South (<51N)59535.1%-1.6%
Low-Mid (51-52N)1,48046.9%-1.2%
Mid (52-53N)81537.3%-2.9%
North (53N+)1,55749.4%+3.3%

Northern clubs have stronger home advantage even after Elo adjustment. The Mid band (52-53N) is the weakest, partly driven by relegation-quality teams (Norwich, Leicester, Wolves in poor form). The effect overlaps with travel distance.

Season comparison (n=4,447)

SeasonHome win%Elo AdjNotes
2014/1545.2%+2.6%
2015/1641.3%-3.3%
2016/1749.5%+7.3%
2017/1845.5%+5.9%
2018/1947.6%+3.3%
2019/2045.3%+3.0%
2020/2137.9%-15.6%Covid: restricted/empty crowds
2021/2242.9%-3.8%Full crowds return
2022/2348.4%+7.1%Home advantage peak
2023/2446.1%+1.5%
2024/2540.8%-6.4%
2025/2642.1%-2.1%In progress

2020/21 is a natural experiment: remove crowds and matchday atmosphere, and home advantage collapses (-15.6% Elo-adjusted). 2022/23 shows a strong rebound (+7.1%). The pre-Covid seasons show substantial variation — 2016/17 (+7.3%) and 2017/18 (+5.9%) were strong home years — with no clear long-term trend.

Excluding the Covid season, there is no meaningful trend in home advantage over time (r=-0.020, n=4,067).

Additivity (n=4,447)

Factors assigned per fixture:

  • Home factors (+1 each): long travel (300+ km), evening kickoff (18+)
  • Away factors (-1 each): bad weather (heavy rain/snow/fog), short travel (<50 km), Sunday, high humidity (90%+)
Net scoreNHome win%Elo Adj
-2 or below37036.2%-7.8%
-11,31643.5%-1.2%
01,92846.1%+1.4%
+178745.6%+2.1%
+2 or above4647.8%+1.3%

The gradient from -2 to 0 is clear (~9pp swing). The +1 and +2 groups are directionally consistent but close together (+2.1% and +1.3%), making the positive end of the composite ambiguous. The negative factors (bad weather, derbies, Sunday) are more reliable differentiators than the positive ones.


Correlations and trends

Continuous variable correlations (n=4,447)

Pearson r between variable and per-fixture Elo residual:

VariablerPartial r (ctrl Elo)Sig
Away travel distance (km)+0.004+0.005
Home Elo+0.026n/a
Away Elo+0.038n/a*
Venue capacity+0.041+0.049**/**
Humidity (%)+0.004+0.004
Wind speed (km/h)+0.009+0.009
Temperature (C)-0.010-0.009
Venue latitude (N)-0.006-0.005
Pitch area (m2)+0.012+0.011
Kickoff hour (UTC)+0.011+0.011
Temp deviation from normal-0.007-0.007

Elo difference explains outcomes strongly (r=+0.409 with raw outcome). Across 12 seasons, venue capacity is the only continuous variable with a statistically significant Pearson correlation with Elo residual (r=+0.041, p<0.01), and it survives partial correlation controlling for Elo difference (partial r=+0.049, p<0.001). Larger grounds correlate with stronger home advantage independent of team quality. Travel distance, which was the sole significant variable in the 6-season analysis, is not significant over 12 seasons (r=+0.004). The weather and scheduling effects are real but non-linear — they appear at extremes (heavy rain, snow, fog) rather than as continuous gradients.

Scheduling trends

SeasonSunday%Midweek%Evening%Avg travel
2014/1524.1%11.6%13.0%191 km
2015/1622.4%11.3%14.5%191 km
2016/1722.5%12.4%14.3%189 km
2017/1820.3%15.0%16.6%182 km
2018/1922.6%16.1%18.4%180 km
2019/2023.2%21.8%18.7%186 km
2020/2132.6%20.3%41.6%247 km
2021/2230.0%18.2%20.8%181 km
2022/2324.5%16.6%17.4%177 km
2023/2426.8%14.7%19.2%175 km
2024/2530.3%15.0%19.7%177 km
2025/2627.7%15.1%25.8%188 km

2020/21 was a structural outlier: 41.6% evening fixtures (vs ~14-20% normally) and 247 km average travel, both consequences of compressed scheduling with empty stadiums. 2019/20 also shows elevated midweek share (21.8%) from the Covid-delayed end of season. Pre-2020 scheduling was broadly stable. No persistent trend in Sunday, midweek, or evening share outside these pandemic-affected seasons.


What did not survive

These findings were significant in earlier smaller datasets but are not robust across 12 seasons:

  • Wednesday as a home fortress: +13.0% in the 3-season dataset, -1.1% in the 6-season dataset, and +2.7% in the 12-season dataset. The large swing reflects small-sample noise at shorter windows. Over 12 seasons (n=339) Wednesday is mildly home-positive but not a strong signal.
  • Sunday as a consistently away-friendly day: -3.2% over 6 seasons, -0.8% over 12 seasons (essentially neutral).
  • Travel distance as a statistically significant continuous predictor: was the sole significant variable (r=+0.044) in the 6-season analysis. Across 12 seasons the correlation is r=+0.004 — not significant. The binned gradient is still visible and monotonic, but the linear signal does not persist at scale.
  • Evening kickoffs favouring home: +8.0% in the 3-season dataset, +1.6% over 6 seasons, +2.4% over 12 seasons. Directionally positive but modest and inconsistent across windows.

Caveats

  1. 2,647 fixtures (59%) use Open-Meteo backfilled weather rather than SportMonks match-day reports. The first six seasons (2014/15–2019/20) are entirely backfilled. Open-Meteo provides hourly conditions at venue coordinates, matched to kickoff hour. This is actual historical weather data, not forecasts, but may differ slightly from pitch-side observations. Extreme weather events (heavy rain, fog, snow) should be broadly reliable; mild condition classification is less certain.
  2. 2020/21 had restricted/empty crowds. This is both a confounder (reduces home advantage baseline) and a feature (natural experiment proving crowd/atmosphere matters).
  3. Pitch dimensions are confounded with club identity. Chelsea always plays on 103x67.5m. Any "pitch effect" is inseparable from the club effect.
  4. Match ball is perfectly confounded with season.
  5. Small samples: heavy rain (41), fog (27), snow (93). Direction is consistent but magnitude uncertain.
  6. No injury/form/motivation control. Elo adjusts for overall team quality but not within-season fluctuations.
  7. Travel distance is haversine (straight line), not actual journey time. A 200 km trip to a well-connected city may be easier than a 150 km trip to an isolated ground.
  8. Multiple comparisons: with 13+ analysis groups, some findings may be spurious. The robust findings (weather categories, travel gradient) show consistent direction across multiple related cuts of the data.
  9. Continuous variable correlations are weak (r < 0.05). The effects are real at extremes but do not manifest as linear gradients — binned analysis is more informative than correlation for these data.