Note
Click here to download the full example code
Extracting data from Football-Data¶
This example illustrates the usage of Football-Data dataloader.
# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT
import pandas as pd
from sportsbet.datasets import FDSoccerDataLoader
Getting the available parameters¶
We can get the available parameters in order to select the training data
to be extracted, using the get_all_params()
class method.
The available parameters can be presented as a DataFrame.
params_df = pd.DataFrame(params).sort_values(['league', 'year', 'division'], ignore_index=True)
params_df
division | league | year | |
---|---|---|---|
0 | 1 | Argentina | 2013 |
1 | 1 | Argentina | 2014 |
2 | 1 | Argentina | 2015 |
3 | 1 | Argentina | 2016 |
4 | 1 | Argentina | 2017 |
... | ... | ... | ... |
753 | 1 | USA | 2018 |
754 | 1 | USA | 2019 |
755 | 1 | USA | 2020 |
756 | 1 | USA | 2021 |
757 | 1 | USA | 2022 |
758 rows × 3 columns
We select to extract training data only for the year 2021 of the first division Spanish and Italian leagues.
param_grid = {'league': ['Spain', 'Italy'], 'division': [1], 'year': [2021]}
Getting the available odds types¶
We can get the available odds types in order to match the output of the
training data, using the get_odds_types()
class method.
Out:
['bet365', 'bet365_closing', 'betbrain', 'betbrain_average', 'betbrain_maximum', 'betwin', 'betwin_closing', 'bluesquare', 'gamebookers', 'interwetten', 'interwetten_closing', 'ladbrokes', 'market_average', 'market_average_closing', 'market_maximum', 'market_maximum_closing', 'pinnacle', 'pinnacle_closing', 'sporting', 'sportingbet', 'stanjames', 'stanleybet', 'vcbet', 'vcbet_closing', 'williamhill', 'williamhill_closing']
We select the odds types to be the market average.
odds_type = 'market_average'
Extracting the training data¶
We extract the training data, keeping columns and rows with non missing values by setting the drop_na_thres parameter equal to 1.0.
dataloader = FDSoccerDataLoader(param_grid=param_grid)
X_train, Y_train, Odds_train = dataloader.extract_train_data(drop_na_thres=1.0, odds_type=odds_type)
Out:
Football-Data.co.uk: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
The input data:
X_train
date | home_team | away_team | market_maximum__home_win__odds | market_maximum__draw__odds | market_maximum__away_win__odds | market_average__home_win__odds | market_average__draw__odds | market_average__away_win__odds | bet365_closing__home_win__odds | bet365_closing__draw__odds | bet365_closing__away_win__odds | betwin_closing__home_win__odds | betwin_closing__draw__odds | betwin_closing__away_win__odds | interwetten_closing__home_win__odds | interwetten_closing__draw__odds | interwetten_closing__away_win__odds | pinnacle_closing__home_win__odds | pinnacle_closing__draw__odds | pinnacle_closing__away_win__odds | williamhill_closing__home_win__odds | williamhill_closing__draw__odds | williamhill_closing__away_win__odds | vcbet_closing__home_win__odds | vcbet_closing__draw__odds | vcbet_closing__away_win__odds | market_maximum_closing__home_win__odds | market_maximum_closing__draw__odds | market_maximum_closing__away_win__odds | market_average_closing__home_win__odds | market_average_closing__draw__odds | market_average_closing__away_win__odds | market_maximum_closing__over_2.5__odds | market_maximum_closing__under_2.5__odds | market_average_closing__over_2.5__odds | market_average_closing__under_2.5__odds | market_average_closing__size_of_asian_handicap_home_team__odds | bet365_closing__asian_handicap_home_team__odds | bet365_closing__asian_handicap_away_team__odds | pinnacle_closing__asian_handicap_home_team__odds | pinnacle_closing__asian_handicap_away_team__odds | market_maximum_closing__asian_handicap_home_team__odds | market_maximum_closing__asian_handicap_away_team__odds | market_average_closing__asian_handicap_home_team__odds | market_average_closing__asian_handicap_away_team__odds | league | division | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2020-09-19 | Fiorentina | Torino | 1.94 | 3.85 | 4.60 | 1.87 | 3.69 | 4.13 | 1.80 | 4.00 | 4.00 | 1.85 | 3.75 | 4.10 | 1.90 | 3.50 | 4.10 | 1.84 | 3.90 | 4.45 | 1.83 | 3.70 | 4.33 | 1.85 | 3.60 | 4.40 | 1.91 | 4.00 | 4.55 | 1.85 | 3.76 | 4.18 | 1.86 | 2.15 | 1.78 | 2.05 | -0.50 | 1.86 | 2.07 | 1.84 | 2.10 | 1.87 | 2.10 | 1.84 | 2.03 | Italy | 1 | 2021 |
1 | 2020-09-19 | Verona | Roma | 4.00 | 3.81 | 2.08 | 3.82 | 3.65 | 1.96 | 4.33 | 4.00 | 1.75 | 4.50 | 3.90 | 1.75 | 4.40 | 3.70 | 1.75 | 4.88 | 3.83 | 1.79 | 4.50 | 3.80 | 1.78 | 4.75 | 3.80 | 1.75 | 4.88 | 4.08 | 1.85 | 4.54 | 3.86 | 1.76 | 1.79 | 2.37 | 1.69 | 2.18 | 0.75 | 1.89 | 2.04 | 1.91 | 2.01 | 1.95 | 2.07 | 1.88 | 1.99 | Italy | 1 | 2021 |
2 | 2020-09-20 | Parma | Napoli | 5.80 | 4.52 | 1.67 | 5.13 | 4.16 | 1.63 | 6.50 | 4.75 | 1.44 | 6.25 | 4.40 | 1.50 | 6.75 | 4.20 | 1.50 | 7.77 | 4.83 | 1.45 | 7.50 | 4.50 | 1.44 | 6.50 | 4.40 | 1.50 | 7.77 | 4.87 | 1.50 | 6.83 | 4.56 | 1.47 | 1.82 | 2.26 | 1.74 | 2.10 | 1.00 | 2.05 | 1.75 | 2.15 | 1.79 | 2.18 | 1.86 | 2.09 | 1.79 | Italy | 1 | 2021 |
3 | 2020-09-20 | Genoa | Crotone | 1.98 | 3.80 | 4.35 | 1.91 | 3.59 | 4.03 | 2.37 | 3.40 | 3.00 | 2.35 | 3.30 | 3.10 | 2.50 | 3.00 | 3.00 | 2.53 | 3.26 | 3.13 | 2.45 | 3.25 | 3.00 | 2.50 | 3.20 | 3.00 | 2.60 | 3.53 | 3.29 | 2.46 | 3.23 | 3.04 | 2.14 | 1.92 | 2.03 | 1.79 | -0.25 | 2.13 | 1.81 | 2.15 | 1.80 | 2.16 | 1.84 | 2.10 | 1.79 | Italy | 1 | 2021 |
4 | 2020-09-20 | Sassuolo | Cagliari | 2.00 | 3.98 | 3.98 | 1.95 | 3.80 | 3.67 | 1.65 | 4.33 | 4.50 | 1.70 | 4.20 | 4.50 | 1.70 | 3.85 | 4.80 | 1.67 | 4.30 | 5.20 | 1.65 | 4.20 | 5.00 | 1.70 | 4.10 | 4.80 | 1.72 | 4.46 | 5.30 | 1.67 | 4.16 | 4.83 | 1.70 | 2.39 | 1.63 | 2.28 | -0.75 | 1.84 | 2.09 | 1.84 | 2.10 | 1.89 | 2.14 | 1.83 | 2.04 | Italy | 1 | 2021 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
755 | 2021-05-22 | Osasuna | Sociedad | 5.25 | 4.70 | 1.67 | 4.97 | 4.40 | 1.62 | 5.75 | 4.50 | 1.50 | 5.50 | 4.25 | 1.57 | 6.00 | 4.40 | 1.53 | 6.23 | 4.63 | 1.54 | 5.80 | 4.33 | 1.55 | 5.75 | 4.50 | 1.53 | 6.31 | 4.65 | 1.60 | 5.80 | 4.46 | 1.55 | 1.67 | 2.50 | 1.60 | 2.34 | 1.00 | 2.07 | 1.86 | 2.06 | 1.87 | 2.07 | 1.96 | 1.98 | 1.88 | Spain | 1 | 2021 |
756 | 2021-05-22 | Real Madrid | Villarreal | 1.47 | 5.50 | 7.70 | 1.42 | 5.05 | 6.92 | 1.50 | 4.50 | 5.50 | 1.55 | 4.50 | 5.50 | 1.53 | 4.70 | 5.50 | 1.60 | 4.65 | 5.43 | 1.50 | 4.75 | 5.80 | 1.55 | 4.50 | 5.50 | 1.61 | 4.80 | 6.14 | 1.56 | 4.58 | 5.45 | 1.55 | 2.72 | 1.50 | 2.57 | -1.00 | 2.01 | 1.92 | 2.00 | 1.93 | 2.02 | 2.02 | 1.95 | 1.92 | Spain | 1 | 2021 |
757 | 2021-05-22 | Valladolid | Ath Madrid | 10.75 | 5.75 | 1.37 | 9.20 | 5.27 | 1.34 | 9.50 | 5.50 | 1.30 | 9.00 | 6.00 | 1.30 | 11.00 | 5.50 | 1.30 | 9.26 | 5.94 | 1.33 | 9.50 | 5.25 | 1.32 | 10.00 | 5.50 | 1.30 | 11.00 | 6.00 | 1.35 | 9.37 | 5.64 | 1.32 | 1.69 | 2.50 | 1.63 | 2.30 | 1.50 | 1.90 | 2.03 | 1.90 | 2.02 | 1.95 | 2.06 | 1.90 | 1.97 | Spain | 1 | 2021 |
758 | 2021-05-23 | Granada | Getafe | 2.85 | 3.48 | 2.75 | 2.72 | 3.37 | 2.62 | 3.00 | 3.25 | 2.37 | 3.00 | 3.30 | 2.40 | 2.95 | 3.20 | 2.50 | 3.05 | 3.27 | 2.57 | 2.90 | 3.30 | 2.45 | 3.00 | 3.25 | 2.45 | 3.13 | 3.60 | 2.60 | 2.98 | 3.27 | 2.48 | 2.11 | 1.91 | 2.03 | 1.80 | 0.25 | 1.73 | 2.08 | 1.77 | 2.18 | 1.84 | 2.23 | 1.77 | 2.12 | Spain | 1 | 2021 |
759 | 2021-05-23 | Sevilla | Alaves | 1.60 | 4.80 | 6.60 | 1.55 | 4.31 | 6.01 | 1.36 | 5.25 | 8.00 | 1.36 | 5.25 | 7.75 | 1.35 | 5.25 | 8.50 | 1.34 | 5.70 | 9.69 | 1.36 | 5.00 | 8.50 | 1.33 | 5.25 | 9.00 | 1.42 | 5.70 | 9.69 | 1.36 | 5.27 | 8.48 | 1.63 | 2.50 | 1.57 | 2.40 | -1.50 | 1.97 | 1.96 | 1.93 | 1.99 | 2.13 | 2.03 | 1.99 | 1.88 | Spain | 1 | 2021 |
760 rows × 49 columns
The targets:
Y_train
away_win__full_time_goals | draw__full_time_goals | home_win__full_time_goals | over_2.5__full_time_goals | under_2.5__full_time_goals | |
---|---|---|---|---|---|
0 | False | False | True | False | True |
1 | False | True | False | False | True |
2 | True | False | False | False | True |
3 | False | False | True | True | False |
4 | False | True | False | False | True |
... | ... | ... | ... | ... | ... |
755 | True | False | False | False | True |
756 | False | False | True | True | False |
757 | True | False | False | True | False |
758 | False | True | False | False | True |
759 | False | False | True | False | True |
760 rows × 5 columns
The market average odds:
Odds_train
market_average__away_win__odds | market_average__draw__odds | market_average__home_win__odds | market_average__over_2.5__odds | market_average__under_2.5__odds | |
---|---|---|---|---|---|
0 | 4.13 | 3.69 | 1.87 | 1.81 | 2.02 |
1 | 1.96 | 3.65 | 3.82 | 1.79 | 2.03 |
2 | 1.63 | 4.16 | 5.13 | 1.62 | 2.29 |
3 | 4.03 | 3.59 | 1.91 | 1.91 | 1.89 |
4 | 3.67 | 3.80 | 1.95 | 1.59 | 2.36 |
... | ... | ... | ... | ... | ... |
755 | 1.62 | 4.40 | 4.97 | 1.60 | 2.35 |
756 | 6.92 | 5.05 | 1.42 | 1.48 | 2.63 |
757 | 1.34 | 5.27 | 9.20 | 1.70 | 2.18 |
758 | 2.62 | 3.37 | 2.72 | 1.96 | 1.87 |
759 | 6.01 | 4.31 | 1.55 | 1.57 | 2.39 |
760 rows × 5 columns
Extracting the fixtures data¶
We extract the fixtures data with columns that match the columns of the training data. On the other hand, the fixtures data are not affected by the param_grid selection.
X_fix, _, Odds_fix = dataloader.extract_fixtures_data()
The input data:
X_fix
date | home_team | away_team | market_maximum__home_win__odds | market_maximum__draw__odds | market_maximum__away_win__odds | market_average__home_win__odds | market_average__draw__odds | market_average__away_win__odds | bet365_closing__home_win__odds | bet365_closing__draw__odds | bet365_closing__away_win__odds | betwin_closing__home_win__odds | betwin_closing__draw__odds | betwin_closing__away_win__odds | interwetten_closing__home_win__odds | interwetten_closing__draw__odds | interwetten_closing__away_win__odds | pinnacle_closing__home_win__odds | pinnacle_closing__draw__odds | pinnacle_closing__away_win__odds | williamhill_closing__home_win__odds | williamhill_closing__draw__odds | williamhill_closing__away_win__odds | vcbet_closing__home_win__odds | vcbet_closing__draw__odds | vcbet_closing__away_win__odds | market_maximum_closing__home_win__odds | market_maximum_closing__draw__odds | market_maximum_closing__away_win__odds | market_average_closing__home_win__odds | market_average_closing__draw__odds | market_average_closing__away_win__odds | market_maximum_closing__over_2.5__odds | market_maximum_closing__under_2.5__odds | market_average_closing__over_2.5__odds | market_average_closing__under_2.5__odds | market_average_closing__size_of_asian_handicap_home_team__odds | bet365_closing__asian_handicap_home_team__odds | bet365_closing__asian_handicap_away_team__odds | pinnacle_closing__asian_handicap_home_team__odds | pinnacle_closing__asian_handicap_away_team__odds | market_maximum_closing__asian_handicap_home_team__odds | market_maximum_closing__asian_handicap_away_team__odds | market_average_closing__asian_handicap_home_team__odds | market_average_closing__asian_handicap_away_team__odds | league | division | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2021-12-16 | Antwerp | Eupen | 1.66 | 4.50 | 5.50 | 1.59 | 4.28 | 4.99 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Belgium | 1 | 2022 |
1 | 2021-12-16 | Genk | Charleroi | 1.98 | 4.02 | 4.00 | 1.90 | 3.78 | 3.70 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Belgium | 1 | 2022 |
2 | 2021-12-16 | Leicester | Tottenham | 2.20 | 3.92 | 3.40 | 2.13 | 3.69 | 3.29 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | England | 4 | 2022 |
3 | 2021-12-16 | Chelsea | Everton | 1.27 | 6.75 | 15.00 | 1.24 | 6.17 | 13.27 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | England | 4 | 2022 |
4 | 2021-12-16 | Liverpool | Newcastle | 1.15 | 10.25 | 24.98 | 1.12 | 9.52 | 21.43 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | England | 4 | 2022 |
5 | 2021-12-16 | OFI Crete | Aris | 3.04 | 3.50 | 2.44 | 2.92 | 3.25 | 2.37 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Greece | 1 | 2022 |
6 | 2021-12-16 | Ionikos | AEK | 7.00 | 4.33 | 1.58 | 6.21 | 3.98 | 1.51 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Greece | 1 | 2022 |
The market average odds:
Odds_fix
market_average__away_win__odds | market_average__draw__odds | market_average__home_win__odds | market_average__over_2.5__odds | market_average__under_2.5__odds | |
---|---|---|---|---|---|
0 | 4.99 | 4.28 | 1.59 | 1.56 | 2.39 |
1 | 3.70 | 3.78 | 1.90 | 1.57 | 2.37 |
2 | 3.29 | 3.69 | 2.13 | 1.77 | 2.07 |
3 | 13.27 | 6.17 | 1.24 | 1.71 | 2.16 |
4 | 21.43 | 9.52 | 1.12 | 1.33 | 3.34 |
5 | 2.37 | 3.25 | 2.92 | 2.24 | 1.63 |
6 | 1.51 | 3.98 | 6.21 | 2.02 | 1.78 |
Total running time of the script: ( 0 minutes 24.388 seconds)