Note
Click here to download the full example code
Soccer value bets¶
This example illustrates how to estimate value bets for soccer fixtures by training a machine learning multi-output classifier.
# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT
import numpy as np
import pandas as pd
from sportsbet.datasets import SoccerDataLoader
from sklearn.neighbors import KNeighborsClassifier
Extracting the training data¶
We extract the training data for the spanish league. We also remove any missing values and select the market average odds.
dataloader = SoccerDataLoader(param_grid={'league': ['Spain']})
X_train, Y_train, _ = dataloader.extract_train_data(
drop_na_thres=1.0, odds_type='market_average'
)
Out:
Football-Data.co.uk: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
The input data:
X_train
home_team | away_team | league | division | year | home_team_soccer_power_index | away_team_soccer_power_index | home_team_probability_win | away_team_probability_win | probability_draw | home_team_projected_score | away_team_projected_score | match_quality | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||||
2016-08-19 | La Coruna | Eibar | Spain | 1 | 2017 | 66.52 | 62.29 | 0.5003 | 0.2260 | 0.2738 | 1.47 | 0.79 | 64.335545 |
2016-08-19 | Malaga | Osasuna | Spain | 1 | 2017 | 72.57 | 56.93 | 0.5475 | 0.1897 | 0.2628 | 1.56 | 0.70 | 63.805561 |
2016-08-19 | La Coruna | Eibar | Spain | 1 | 2017 | 66.52 | 62.29 | 0.5003 | 0.2260 | 0.2738 | 1.47 | 0.79 | 64.335545 |
2016-08-19 | Malaga | Osasuna | Spain | 1 | 2017 | 72.57 | 56.93 | 0.5475 | 0.1897 | 0.2628 | 1.56 | 0.70 | 63.805561 |
2016-08-20 | Barcelona | Betis | Spain | 1 | 2017 | 96.35 | 69.95 | 0.9591 | 0.0071 | 0.0337 | 3.40 | 0.42 | 81.054510 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-10-28 | Granada | Getafe | Spain | 1 | 2022 | 58.45 | 65.57 | 0.3631 | 0.3127 | 0.3242 | 1.05 | 0.95 | 61.805620 |
2021-10-28 | Celta | Sociedad | Spain | 1 | 2022 | 71.68 | 78.29 | 0.3206 | 0.3957 | 0.2837 | 1.17 | 1.34 | 74.839331 |
2021-10-28 | Levante | Ath Madrid | Spain | 1 | 2022 | 63.23 | 84.94 | 0.1873 | 0.5664 | 0.2463 | 0.91 | 1.77 | 72.494516 |
2021-10-28 | Granada | Getafe | Spain | 1 | 2022 | 58.45 | 65.57 | 0.3631 | 0.3127 | 0.3242 | 1.05 | 0.95 | 61.805620 |
2021-12-20 | Levante | Valencia | Spain | 1 | 2022 | 60.85 | 72.40 | 0.3228 | 0.4059 | 0.2713 | 1.26 | 1.45 | 66.124428 |
18815 rows × 13 columns
The targets:
Y_train
away_win__full_time_goals | draw__full_time_goals | home_win__full_time_goals | over_2.5__full_time_goals | under_2.5__full_time_goals | |
---|---|---|---|---|---|
0 | False | False | True | True | False |
1 | False | True | False | False | True |
2 | False | False | True | True | False |
3 | False | True | False | False | True |
4 | False | False | True | True | False |
... | ... | ... | ... | ... | ... |
18810 | False | True | False | False | True |
18811 | False | False | True | False | True |
18812 | False | False | True | True | False |
18813 | False | False | True | True | False |
18814 | True | False | False | True | False |
18815 rows × 5 columns
Training a multi-output classifier¶
We train a KNeighborsClassifier
using only numerical
features from the input data. We also use the extracted targets.
num_features = [
col
for col in X_train.columns
if X_train[col].dtype in (np.dtype(int), np.dtype(float))
]
clf = KNeighborsClassifier()
clf.fit(X_train[num_features], Y_train)
Out:
KNeighborsClassifier()
Extracting the fixtures data¶
We extract the fixtures data. The columns by default match the columns of the training data.
X_fix, _, Odds_fix = dataloader.extract_fixtures_data()
The input data:
X_fix
home_team | away_team | league | division | year | home_team_soccer_power_index | away_team_soccer_power_index | home_team_probability_win | away_team_probability_win | probability_draw | home_team_projected_score | away_team_projected_score | match_quality | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||||
2021-12-22 | Heracles | Cambuur | Netherlands | 1 | 2022 | 46.71 | 40.26 | 0.5205 | 0.2328 | 0.2467 | 1.74 | 1.09 | 43.245823 |
2021-12-22 | Willem II | Nijmegen | Netherlands | 1 | 2022 | 38.74 | 44.15 | 0.4051 | 0.3270 | 0.2679 | 1.44 | 1.26 | 41.268452 |
2021-12-22 | Heerenveen | Feyenoord | Netherlands | 1 | 2022 | 46.16 | 70.41 | 0.1798 | 0.5946 | 0.2256 | 0.97 | 1.95 | 55.762642 |
2021-12-22 | Napoli | Spezia | Italy | 1 | 2022 | 78.97 | 48.96 | 0.7386 | 0.0900 | 0.1714 | 2.30 | 0.66 | 60.445106 |
2021-12-22 | Empoli | Milan | Italy | 1 | 2022 | 55.62 | 75.96 | 0.1797 | 0.6030 | 0.2173 | 1.00 | 2.01 | 64.217893 |
2021-12-22 | Verona | Fiorentina | Italy | 1 | 2022 | 63.03 | 69.60 | 0.3125 | 0.4226 | 0.2649 | 1.22 | 1.46 | 66.152273 |
2021-12-22 | Ajax | For Sittard | Netherlands | 1 | 2022 | 88.91 | 35.23 | 0.9383 | 0.0124 | 0.0493 | 3.76 | 0.36 | 50.463981 |
2021-12-22 | Roma | Sampdoria | Italy | 1 | 2022 | 73.22 | 58.23 | 0.6161 | 0.1725 | 0.2114 | 2.07 | 1.00 | 64.870302 |
2021-12-22 | Inter | Torino | Italy | 1 | 2022 | 86.52 | 65.47 | 0.7361 | 0.0884 | 0.1755 | 2.24 | 0.63 | 74.537330 |
2021-12-22 | Venezia | Lazio | Italy | 1 | 2022 | 48.78 | 67.64 | 0.2422 | 0.5123 | 0.2455 | 1.11 | 1.72 | 56.682343 |
2021-12-22 | Sassuolo | Bologna | Italy | 1 | 2022 | 66.22 | 62.26 | 0.4690 | 0.2872 | 0.2438 | 1.72 | 1.29 | 64.178973 |
2021-12-22 | Troyes | Brest | France | 1 | 2022 | 51.80 | 58.17 | 0.3913 | 0.3345 | 0.2742 | 1.34 | 1.22 | 54.800509 |
2021-12-22 | St Etienne | Nantes | France | 1 | 2022 | 47.29 | 58.99 | 0.3502 | 0.3690 | 0.2807 | 1.21 | 1.25 | 52.495994 |
2021-12-22 | Clermont | Strasbourg | France | 1 | 2022 | 53.40 | 66.23 | 0.3004 | 0.4410 | 0.2586 | 1.23 | 1.55 | 59.127008 |
2021-12-22 | Montpellier | Angers | France | 1 | 2022 | 58.37 | 57.60 | 0.4522 | 0.2760 | 0.2718 | 1.45 | 1.07 | 57.982444 |
2021-12-22 | Monaco | Rennes | France | 1 | 2022 | 72.15 | 71.16 | 0.4255 | 0.3071 | 0.2674 | 1.45 | 1.19 | 71.651580 |
2021-12-22 | Marseille | Reims | France | 1 | 2022 | 66.33 | 55.58 | 0.5956 | 0.1437 | 0.2608 | 1.55 | 0.61 | 60.481034 |
2021-12-22 | Lyon | Metz | France | 1 | 2022 | 67.70 | 46.64 | 0.6598 | 0.1421 | 0.1981 | 2.18 | 0.90 | 55.230506 |
2021-12-22 | Lorient | Paris SG | France | 1 | 2022 | 48.68 | 83.02 | 0.1287 | 0.6864 | 0.1849 | 0.90 | 2.32 | 61.373024 |
2021-12-22 | Ath Bilbao | Real Madrid | Spain | 1 | 2022 | 75.99 | 85.38 | 0.2709 | 0.4673 | 0.2618 | 1.16 | 1.61 | 80.411801 |
2021-12-22 | Bordeaux | Lille | France | 1 | 2022 | 50.54 | 72.56 | 0.2261 | 0.5287 | 0.2452 | 1.05 | 1.73 | 59.580543 |
2021-12-22 | Gaziantep | Alanyaspor | Turkey | 1 | 2022 | 37.78 | 39.46 | 0.4252 | 0.3291 | 0.2457 | 1.66 | 1.44 | 38.601730 |
2021-12-22 | Sivasspor | Rizespor | Turkey | 1 | 2022 | 49.52 | 27.33 | 0.6486 | 0.1383 | 0.2131 | 1.99 | 0.79 | 35.221382 |
2021-12-22 | Hatayspor | Konyaspor | Turkey | 1 | 2022 | 40.65 | 46.33 | 0.3272 | 0.3898 | 0.2829 | 1.15 | 1.29 | 43.304541 |
2021-12-22 | Karagumruk | Fenerbahce | Turkey | 1 | 2022 | 37.99 | 52.76 | 0.2553 | 0.5074 | 0.2373 | 1.24 | 1.84 | 44.173056 |
2021-12-22 | Utrecht | Twente | Netherlands | 1 | 2022 | 60.97 | 52.88 | 0.5129 | 0.2331 | 0.2541 | 1.67 | 1.04 | 56.637569 |
2021-12-22 | Nice | Lens | France | 1 | 2022 | 63.61 | 61.16 | 0.4798 | 0.2643 | 0.2560 | 1.61 | 1.13 | 62.360946 |
2021-12-22 | Granada | Ath Madrid | Spain | 1 | 2022 | 61.19 | 83.46 | 0.1711 | 0.6003 | 0.2286 | 0.93 | 1.94 | 70.610680 |
2021-12-23 | Yeni Malatyaspor | Kayserispor | Turkey | 1 | 2022 | 26.13 | 32.94 | 0.3460 | 0.3717 | 0.2823 | 1.20 | 1.26 | 29.142448 |
2021-12-23 | Besiktas | Goztep | Turkey | 1 | 2022 | 46.31 | 30.75 | 0.5722 | 0.2136 | 0.2141 | 2.13 | 1.24 | 36.959058 |
The market average odds:
Odds_fix
market_average__away_win__odds | market_average__draw__odds | market_average__home_win__odds | market_average__over_2.5__odds | market_average__under_2.5__odds | |
---|---|---|---|---|---|
0 | 3.37 | 3.76 | 2.03 | 1.65 | 2.23 |
1 | 2.85 | 3.26 | 2.51 | 1.91 | 1.89 |
2 | 1.53 | 4.34 | 5.84 | 1.65 | 2.25 |
3 | 12.17 | 6.27 | 1.25 | 1.47 | 2.68 |
4 | 1.76 | 3.99 | 4.46 | 1.57 | 2.40 |
5 | 2.43 | 3.45 | 2.89 | 1.70 | 2.16 |
6 | 41.22 | 16.43 | 1.04 | 1.16 | 4.93 |
7 | 5.62 | 4.32 | 1.57 | 1.63 | 2.30 |
8 | 9.50 | 5.41 | 1.33 | 1.62 | 2.34 |
9 | 1.90 | 3.63 | 4.13 | 1.81 | 2.02 |
10 | 3.55 | 3.68 | 2.03 | 1.60 | 2.35 |
11 | 2.86 | 3.34 | 2.50 | 1.90 | 1.92 |
12 | 2.79 | 3.32 | 2.58 | 2.02 | 1.81 |
13 | 2.43 | 3.38 | 2.94 | 1.93 | 1.89 |
14 | 3.42 | 3.30 | 2.21 | 1.97 | 1.85 |
15 | 3.20 | 3.49 | 2.23 | 1.87 | 1.96 |
16 | 5.44 | 3.74 | 1.68 | 2.17 | 1.70 |
17 | 8.12 | 5.41 | 1.35 | 1.46 | 2.69 |
18 | 1.38 | 5.15 | 7.88 | 1.54 | 2.47 |
19 | 2.17 | 3.49 | 3.37 | 1.99 | 1.84 |
20 | 1.81 | 3.82 | 4.29 | 1.79 | 2.04 |
21 | 2.90 | 3.42 | 2.34 | 1.83 | 1.97 |
22 | 5.48 | 4.00 | 1.59 | 1.86 | 1.93 |
23 | 3.00 | 3.35 | 2.33 | 1.99 | 1.82 |
24 | 2.13 | 3.49 | 3.27 | 1.82 | 1.98 |
25 | 4.25 | 3.87 | 1.78 | 1.78 | 2.04 |
26 | 3.53 | 3.45 | 2.11 | 1.85 | 1.97 |
27 | 1.61 | 3.77 | 6.28 | 2.14 | 1.73 |
28 | 2.99 | 3.32 | 2.34 | 1.94 | 1.85 |
29 | 4.98 | 4.11 | 1.62 | 1.64 | 2.23 |
Estimating the value bets¶
We can estimate the value bets by using the fitted classifier.
Y_pred_prob = np.concatenate(
[prob[:, 1].reshape(-1, 1) for prob in clf.predict_proba(X_fix[num_features])],
axis=1,
)
X_fix_info = X_fix[['home_team', 'away_team']].reset_index()
value_bets = pd.concat([X_fix_info, Y_pred_prob * Odds_fix > 1], axis=1).set_index(
'date'
)
value_bets.rename(
columns={
col: col.split('__')[1] for col in value_bets.columns if col.endswith('odds')
}
)
home_team | away_team | away_win | draw | home_win | over_2.5 | under_2.5 | |
---|---|---|---|---|---|---|---|
date | |||||||
2021-12-22 | Heracles | Cambuur | True | False | True | True | False |
2021-12-22 | Willem II | Nijmegen | True | False | False | False | True |
2021-12-22 | Heerenveen | Feyenoord | True | False | True | False | True |
2021-12-22 | Napoli | Spezia | False | False | True | True | False |
2021-12-22 | Empoli | Milan | False | True | False | True | False |
2021-12-22 | Verona | Fiorentina | False | False | True | True | False |
2021-12-22 | Ajax | For Sittard | False | False | True | False | False |
2021-12-22 | Roma | Sampdoria | True | False | False | False | True |
2021-12-22 | Inter | Torino | True | False | False | True | False |
2021-12-22 | Venezia | Lazio | False | True | False | True | False |
2021-12-22 | Sassuolo | Bologna | False | True | False | False | True |
2021-12-22 | Troyes | Brest | False | True | False | True | False |
2021-12-22 | St Etienne | Nantes | True | False | True | False | True |
2021-12-22 | Clermont | Strasbourg | False | False | True | True | False |
2021-12-22 | Montpellier | Angers | True | True | False | False | True |
2021-12-22 | Monaco | Rennes | False | False | True | False | True |
2021-12-22 | Marseille | Reims | True | True | False | True | False |
2021-12-22 | Lyon | Metz | False | False | True | True | False |
2021-12-22 | Lorient | Paris SG | False | True | True | False | False |
2021-12-22 | Ath Bilbao | Real Madrid | False | True | True | False | True |
2021-12-22 | Bordeaux | Lille | False | True | False | False | True |
2021-12-22 | Gaziantep | Alanyaspor | False | True | False | False | True |
2021-12-22 | Sivasspor | Rizespor | False | True | False | False | True |
2021-12-22 | Hatayspor | Konyaspor | False | True | False | False | True |
2021-12-22 | Karagumruk | Fenerbahce | False | True | False | False | True |
2021-12-22 | Utrecht | Twente | True | True | False | False | True |
2021-12-22 | Nice | Lens | True | False | False | True | False |
2021-12-22 | Granada | Ath Madrid | False | False | True | False | True |
2021-12-23 | Yeni Malatyaspor | Kayserispor | True | False | False | True | False |
2021-12-23 | Besiktas | Goztep | False | False | True | True | False |
Total running time of the script: ( 0 minutes 39.046 seconds)