Soccer value bets

This example illustrates how to estimate value bets for soccer fixtures by training a machine learning multi-output classifier.

# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT

import numpy as np
import pandas as pd
from sportsbet.datasets import SoccerDataLoader
from sklearn.neighbors import KNeighborsClassifier

Extracting the training data

We extract the training data for the spanish league. We also remove any missing values and select the market average odds.

dataloader = SoccerDataLoader(param_grid={'league': ['Spain']})
X_train, Y_train, _ = dataloader.extract_train_data(
    drop_na_thres=1.0, odds_type='market_average'
)

Out:

Football-Data.co.uk: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00

The input data:

X_train
home_team away_team league division year home_team_soccer_power_index away_team_soccer_power_index home_team_probability_win away_team_probability_win probability_draw home_team_projected_score away_team_projected_score match_quality
date
2016-08-19 La Coruna Eibar Spain 1 2017 66.52 62.29 0.5003 0.2260 0.2738 1.47 0.79 64.335545
2016-08-19 Malaga Osasuna Spain 1 2017 72.57 56.93 0.5475 0.1897 0.2628 1.56 0.70 63.805561
2016-08-19 La Coruna Eibar Spain 1 2017 66.52 62.29 0.5003 0.2260 0.2738 1.47 0.79 64.335545
2016-08-19 Malaga Osasuna Spain 1 2017 72.57 56.93 0.5475 0.1897 0.2628 1.56 0.70 63.805561
2016-08-20 Barcelona Betis Spain 1 2017 96.35 69.95 0.9591 0.0071 0.0337 3.40 0.42 81.054510
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2021-10-28 Granada Getafe Spain 1 2022 58.45 65.57 0.3631 0.3127 0.3242 1.05 0.95 61.805620
2021-10-28 Celta Sociedad Spain 1 2022 71.68 78.29 0.3206 0.3957 0.2837 1.17 1.34 74.839331
2021-10-28 Levante Ath Madrid Spain 1 2022 63.23 84.94 0.1873 0.5664 0.2463 0.91 1.77 72.494516
2021-10-28 Granada Getafe Spain 1 2022 58.45 65.57 0.3631 0.3127 0.3242 1.05 0.95 61.805620
2021-12-20 Levante Valencia Spain 1 2022 60.85 72.40 0.3228 0.4059 0.2713 1.26 1.45 66.124428

18815 rows × 13 columns



The targets:

Y_train
away_win__full_time_goals draw__full_time_goals home_win__full_time_goals over_2.5__full_time_goals under_2.5__full_time_goals
0 False False True True False
1 False True False False True
2 False False True True False
3 False True False False True
4 False False True True False
... ... ... ... ... ...
18810 False True False False True
18811 False False True False True
18812 False False True True False
18813 False False True True False
18814 True False False True False

18815 rows × 5 columns



Training a multi-output classifier

We train a KNeighborsClassifier using only numerical features from the input data. We also use the extracted targets.

num_features = [
    col
    for col in X_train.columns
    if X_train[col].dtype in (np.dtype(int), np.dtype(float))
]
clf = KNeighborsClassifier()
clf.fit(X_train[num_features], Y_train)

Out:

KNeighborsClassifier()

Extracting the fixtures data

We extract the fixtures data. The columns by default match the columns of the training data.

X_fix, _, Odds_fix = dataloader.extract_fixtures_data()

The input data:

X_fix
home_team away_team league division year home_team_soccer_power_index away_team_soccer_power_index home_team_probability_win away_team_probability_win probability_draw home_team_projected_score away_team_projected_score match_quality
date
2021-12-22 Heracles Cambuur Netherlands 1 2022 46.71 40.26 0.5205 0.2328 0.2467 1.74 1.09 43.245823
2021-12-22 Willem II Nijmegen Netherlands 1 2022 38.74 44.15 0.4051 0.3270 0.2679 1.44 1.26 41.268452
2021-12-22 Heerenveen Feyenoord Netherlands 1 2022 46.16 70.41 0.1798 0.5946 0.2256 0.97 1.95 55.762642
2021-12-22 Napoli Spezia Italy 1 2022 78.97 48.96 0.7386 0.0900 0.1714 2.30 0.66 60.445106
2021-12-22 Empoli Milan Italy 1 2022 55.62 75.96 0.1797 0.6030 0.2173 1.00 2.01 64.217893
2021-12-22 Verona Fiorentina Italy 1 2022 63.03 69.60 0.3125 0.4226 0.2649 1.22 1.46 66.152273
2021-12-22 Ajax For Sittard Netherlands 1 2022 88.91 35.23 0.9383 0.0124 0.0493 3.76 0.36 50.463981
2021-12-22 Roma Sampdoria Italy 1 2022 73.22 58.23 0.6161 0.1725 0.2114 2.07 1.00 64.870302
2021-12-22 Inter Torino Italy 1 2022 86.52 65.47 0.7361 0.0884 0.1755 2.24 0.63 74.537330
2021-12-22 Venezia Lazio Italy 1 2022 48.78 67.64 0.2422 0.5123 0.2455 1.11 1.72 56.682343
2021-12-22 Sassuolo Bologna Italy 1 2022 66.22 62.26 0.4690 0.2872 0.2438 1.72 1.29 64.178973
2021-12-22 Troyes Brest France 1 2022 51.80 58.17 0.3913 0.3345 0.2742 1.34 1.22 54.800509
2021-12-22 St Etienne Nantes France 1 2022 47.29 58.99 0.3502 0.3690 0.2807 1.21 1.25 52.495994
2021-12-22 Clermont Strasbourg France 1 2022 53.40 66.23 0.3004 0.4410 0.2586 1.23 1.55 59.127008
2021-12-22 Montpellier Angers France 1 2022 58.37 57.60 0.4522 0.2760 0.2718 1.45 1.07 57.982444
2021-12-22 Monaco Rennes France 1 2022 72.15 71.16 0.4255 0.3071 0.2674 1.45 1.19 71.651580
2021-12-22 Marseille Reims France 1 2022 66.33 55.58 0.5956 0.1437 0.2608 1.55 0.61 60.481034
2021-12-22 Lyon Metz France 1 2022 67.70 46.64 0.6598 0.1421 0.1981 2.18 0.90 55.230506
2021-12-22 Lorient Paris SG France 1 2022 48.68 83.02 0.1287 0.6864 0.1849 0.90 2.32 61.373024
2021-12-22 Ath Bilbao Real Madrid Spain 1 2022 75.99 85.38 0.2709 0.4673 0.2618 1.16 1.61 80.411801
2021-12-22 Bordeaux Lille France 1 2022 50.54 72.56 0.2261 0.5287 0.2452 1.05 1.73 59.580543
2021-12-22 Gaziantep Alanyaspor Turkey 1 2022 37.78 39.46 0.4252 0.3291 0.2457 1.66 1.44 38.601730
2021-12-22 Sivasspor Rizespor Turkey 1 2022 49.52 27.33 0.6486 0.1383 0.2131 1.99 0.79 35.221382
2021-12-22 Hatayspor Konyaspor Turkey 1 2022 40.65 46.33 0.3272 0.3898 0.2829 1.15 1.29 43.304541
2021-12-22 Karagumruk Fenerbahce Turkey 1 2022 37.99 52.76 0.2553 0.5074 0.2373 1.24 1.84 44.173056
2021-12-22 Utrecht Twente Netherlands 1 2022 60.97 52.88 0.5129 0.2331 0.2541 1.67 1.04 56.637569
2021-12-22 Nice Lens France 1 2022 63.61 61.16 0.4798 0.2643 0.2560 1.61 1.13 62.360946
2021-12-22 Granada Ath Madrid Spain 1 2022 61.19 83.46 0.1711 0.6003 0.2286 0.93 1.94 70.610680
2021-12-23 Yeni Malatyaspor Kayserispor Turkey 1 2022 26.13 32.94 0.3460 0.3717 0.2823 1.20 1.26 29.142448
2021-12-23 Besiktas Goztep Turkey 1 2022 46.31 30.75 0.5722 0.2136 0.2141 2.13 1.24 36.959058


The market average odds:

Odds_fix
market_average__away_win__odds market_average__draw__odds market_average__home_win__odds market_average__over_2.5__odds market_average__under_2.5__odds
0 3.37 3.76 2.03 1.65 2.23
1 2.85 3.26 2.51 1.91 1.89
2 1.53 4.34 5.84 1.65 2.25
3 12.17 6.27 1.25 1.47 2.68
4 1.76 3.99 4.46 1.57 2.40
5 2.43 3.45 2.89 1.70 2.16
6 41.22 16.43 1.04 1.16 4.93
7 5.62 4.32 1.57 1.63 2.30
8 9.50 5.41 1.33 1.62 2.34
9 1.90 3.63 4.13 1.81 2.02
10 3.55 3.68 2.03 1.60 2.35
11 2.86 3.34 2.50 1.90 1.92
12 2.79 3.32 2.58 2.02 1.81
13 2.43 3.38 2.94 1.93 1.89
14 3.42 3.30 2.21 1.97 1.85
15 3.20 3.49 2.23 1.87 1.96
16 5.44 3.74 1.68 2.17 1.70
17 8.12 5.41 1.35 1.46 2.69
18 1.38 5.15 7.88 1.54 2.47
19 2.17 3.49 3.37 1.99 1.84
20 1.81 3.82 4.29 1.79 2.04
21 2.90 3.42 2.34 1.83 1.97
22 5.48 4.00 1.59 1.86 1.93
23 3.00 3.35 2.33 1.99 1.82
24 2.13 3.49 3.27 1.82 1.98
25 4.25 3.87 1.78 1.78 2.04
26 3.53 3.45 2.11 1.85 1.97
27 1.61 3.77 6.28 2.14 1.73
28 2.99 3.32 2.34 1.94 1.85
29 4.98 4.11 1.62 1.64 2.23


Estimating the value bets

We can estimate the value bets by using the fitted classifier.

Y_pred_prob = np.concatenate(
    [prob[:, 1].reshape(-1, 1) for prob in clf.predict_proba(X_fix[num_features])],
    axis=1,
)
X_fix_info = X_fix[['home_team', 'away_team']].reset_index()
value_bets = pd.concat([X_fix_info, Y_pred_prob * Odds_fix > 1], axis=1).set_index(
    'date'
)
value_bets.rename(
    columns={
        col: col.split('__')[1] for col in value_bets.columns if col.endswith('odds')
    }
)
home_team away_team away_win draw home_win over_2.5 under_2.5
date
2021-12-22 Heracles Cambuur True False True True False
2021-12-22 Willem II Nijmegen True False False False True
2021-12-22 Heerenveen Feyenoord True False True False True
2021-12-22 Napoli Spezia False False True True False
2021-12-22 Empoli Milan False True False True False
2021-12-22 Verona Fiorentina False False True True False
2021-12-22 Ajax For Sittard False False True False False
2021-12-22 Roma Sampdoria True False False False True
2021-12-22 Inter Torino True False False True False
2021-12-22 Venezia Lazio False True False True False
2021-12-22 Sassuolo Bologna False True False False True
2021-12-22 Troyes Brest False True False True False
2021-12-22 St Etienne Nantes True False True False True
2021-12-22 Clermont Strasbourg False False True True False
2021-12-22 Montpellier Angers True True False False True
2021-12-22 Monaco Rennes False False True False True
2021-12-22 Marseille Reims True True False True False
2021-12-22 Lyon Metz False False True True False
2021-12-22 Lorient Paris SG False True True False False
2021-12-22 Ath Bilbao Real Madrid False True True False True
2021-12-22 Bordeaux Lille False True False False True
2021-12-22 Gaziantep Alanyaspor False True False False True
2021-12-22 Sivasspor Rizespor False True False False True
2021-12-22 Hatayspor Konyaspor False True False False True
2021-12-22 Karagumruk Fenerbahce False True False False True
2021-12-22 Utrecht Twente True True False False True
2021-12-22 Nice Lens True False False True False
2021-12-22 Granada Ath Madrid False False True False True
2021-12-23 Yeni Malatyaspor Kayserispor True False False True False
2021-12-23 Besiktas Goztep False False True True False


Total running time of the script: ( 0 minutes 39.046 seconds)

Gallery generated by Sphinx-Gallery