FiveThirtyEight soccer data

This example illustrates the usage of FiveThirtyEight soccer dataloader.

# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT

import pandas as pd
from sportsbet.datasets import FTESoccerDataLoader

Getting the available parameters

We can get the available parameters in order to select the training data to be extracted, using the get_all_params() class method.

The available parameters can be presented as a DataFrame.

params_df = pd.DataFrame(params).sort_values(
    ['league', 'year', 'division'], ignore_index=True
)
params_df
division league year
0 1 Argentina 2018
1 1 Argentina 2019
2 1 Argentina 2020
3 1 Argentina 2022
4 1 Australia 2019
... ... ... ...
174 1 USA 2022
175 1 United-Soccer-League 2019
176 1 United-Soccer-League 2020
177 1 United-Soccer-League 2021
178 1 United-Soccer-League 2022

179 rows × 3 columns



We select to extract training data only for the year 2021 of all the divisions of English league.

param_grid = {'league': ['England'], 'year': [2021]}

Getting the available odds types

We can get the available odds types in order to match the output of the training data, using the get_odds_types() class method.

Out:

[]

Therefore no odds data are available.

Extracting the training data

We extract the training data using the default values for the parameters odds_type and drop_na_thres.

The input data:

X_train
league home_team away_team home_team_soccer_power_index away_team_soccer_power_index home_team_probability_win away_team_probability_win probability_draw home_team_projected_score away_team_projected_score home_team_match_importance away_team_match_importance division year match_quality
date
2020-09-11 England Watford Middlesbrough 65.12 46.31 0.6387 0.1423 0.2190 2.06 0.85 53.0 16.5 2 2021 54.127384
2020-09-12 England Huddersfield Town Norwich City 47.84 60.45 0.2708 0.4612 0.2680 1.17 1.60 18.0 43.6 2 2021 53.410804
2020-09-12 England Cardiff City Sheffield Wednesday 51.14 45.38 0.4510 0.2754 0.2736 1.54 1.15 18.6 37.2 2 2021 48.088131
2020-09-12 England Millwall Stoke City 48.58 53.50 0.3535 0.3645 0.2820 1.31 1.34 16.3 24.1 2 2021 50.921434
2020-09-12 England Preston North End Swansea City 48.02 50.19 0.3799 0.3285 0.2916 1.30 1.19 16.2 20.0 2 2021 49.081026
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2020-11-13 England Bolton Salford City 11.98 23.41 0.2144 0.5392 0.2464 0.99 1.72 14.2 61.4 4 2021 15.849212
2021-01-05 England Mansfield Town Salford City 12.82 20.45 0.2246 0.5119 0.2635 0.94 1.56 15.7 65.8 4 2021 15.760084
2021-03-30 England Carlisle United Crawley Town 10.55 10.85 0.3937 0.3466 0.2596 1.47 1.36 7.4 8.5 4 2021 10.697897
2021-05-20 England Tranmere Rovers Morecambe 11.11 17.40 0.3403 0.3779 0.2818 1.19 1.27 100.0 100.0 4 2021 13.561136
2021-05-31 England Morecambe Newport County 17.71 12.81 0.5309 0.4691 0.0000 1.23 1.13 100.0 100.0 4 2021 14.866651

51309 rows × 15 columns



The targets:

Y_train
away_win__full_time_goals draw__full_time_goals home_win__full_time_goals over_1.5__full_time_goals over_2.5__full_time_goals over_3.5__full_time_goals over_4.5__full_time_goals under_1.5__full_time_goals under_2.5__full_time_goals under_3.5__full_time_goals under_4.5__full_time_goals
0 False False True False False False False True True True True
1 True False False False False False False True True True True
2 True False False True False False False False True True True
3 False True False False False False False True True True True
4 True False False False False False False True True True True
... ... ... ... ... ... ... ... ... ... ... ...
51304 True False False True False False False False True True True
51305 False False True True False False False False True True True
51306 False False True True True True True False False False False
51307 False True False True False False False False True True True
51308 False False True False False False False True True True True

51309 rows × 11 columns



Extracting the fixtures data

We extract the fixtures data with columns that match the columns of the training data. On the other hand, the fixtures data are not affected by the param_grid selection.

The input data:

X_fix
league home_team away_team home_team_soccer_power_index away_team_soccer_power_index home_team_probability_win away_team_probability_win probability_draw home_team_projected_score away_team_projected_score home_team_match_importance away_team_match_importance division year match_quality
date
2021-12-22 France Clermont Foot Strasbourg 53.40 66.23 0.3004 0.4410 0.2586 1.23 1.55 30.5 33.2 1 2022 59.127008
2021-12-22 France Montpellier Angers 58.37 57.60 0.4522 0.2760 0.2718 1.45 1.07 20.0 9.5 1 2022 57.982444
2021-12-22 Scotland St Johnstone Ross County 27.82 31.14 0.3917 0.3261 0.2822 1.24 1.11 53.4 45.5 1 2022 29.386526
2021-12-22 Italy Empoli AC Milan 55.62 75.96 0.1797 0.6030 0.2173 1.00 2.01 3.8 61.5 1 2022 64.217893
2021-12-22 Italy Napoli Spezia 78.97 48.96 0.7386 0.0900 0.1714 2.30 0.66 58.2 47.3 1 2022 60.445106
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2022-05-29 Spain Real Valladolid SD Huesca 43.87 31.45 0.5315 0.2036 0.2649 1.66 0.93 NaN NaN 2 2022 36.635993
2022-05-29 Spain Real Oviedo UD Ibiza 34.04 27.78 0.4757 0.2155 0.3089 1.32 0.78 NaN NaN 2 2022 30.593051
2022-05-29 Spain Leganes Almeria 38.32 45.95 0.3400 0.3490 0.3110 1.12 1.13 NaN NaN 2 2022 41.789581
2022-05-29 Spain Real Sociedad II Real Zaragoza 25.54 30.82 0.3444 0.3365 0.3191 1.08 1.07 NaN NaN 2 2022 27.932676
2022-05-29 Spain Lugo Málaga 28.56 26.29 0.4287 0.2690 0.3023 1.31 0.98 NaN NaN 2 2022 27.378027

4128 rows × 15 columns



Total running time of the script: ( 0 minutes 13.426 seconds)

Gallery generated by Sphinx-Gallery