Classification Metrics Example¶
This notebook contains an example of how classification metrics can be applied to a dataset, how they can be used in other Iguanas modules and how to create your own.
Requirements¶
To run, you’ll need the following:
A dataset containing binary predictor columns and a binary target column.
Import packages¶
[1]:
from iguanas.metrics.classification import Precision, Recall, FScore, Revenue
import pandas as pd
import numpy as np
from typing import Union
Create data¶
Let’s create some dummy predictor columns and a binary target column. For this example, let’s assume the dummy predictor columns represent rules that have been applied to a dataset.
[2]:
np.random.seed(0)
y_pred = pd.Series(np.random.randint(0, 2, 1000), name = 'A')
y_preds = pd.DataFrame(np.random.randint(0, 2, (1000, 5)), columns=[i for i in 'ABCDE'])
y = pd.Series(np.random.randint(0, 2, 1000), name = 'label')
amounts = pd.Series(np.random.randint(0, 1000, 1000), name = 'amounts')
Apply optimisation functions¶
There are currently four classification metrics available:
Precision score
Recall score
Fbeta score
Revenue
Note that the FScore, Precision or Recall classes are ~100 times faster on larger datasets compared to the same functions from Sklearn’s metrics module. They also work with Koalas DataFrames, whereas the Sklearn functions do not.
Instantiate class and run fit method¶
We can run the .fit() method to calculate the optimisation metric for each column in the dataset.
Precision score¶
[3]:
precision = Precision()
# Single predictor
rule_precision = precision.fit(y_true=y, y_preds=y_pred, sample_weight=None)
# Multiple predictors
rule_precisions = precision.fit(y_true=y, y_preds=y_preds, sample_weight=None)
Recall score¶
[4]:
recall = Recall()
# Single predictor
rule_recall = recall.fit(y_true=y, y_preds=y_pred, sample_weight=None)
# Multiple predictors
rule_recalls = recall.fit(y_true=y, y_preds=y_preds, sample_weight=None)
Fbeta score (beta=1)¶
[6]:
f1 = FScore(beta=1)
# Single predictor
rule_f1 = f1.fit(y_true=y, y_preds=y_pred, sample_weight=None)
# Multiple predictors
rule_f1s = f1.fit(y_true=y, y_preds=y_preds, sample_weight=None)
Revenue¶
[8]:
rev = Revenue(y_type='Fraud', chargeback_multiplier=2)
# Single predictor
rule_rev = rev.fit(y_true=y, y_preds=y_pred, sample_weight=amounts)
# Multiple predictors
rule_revs = rev.fit(y_true=y, y_preds=y_preds, sample_weight=amounts)
Outputs¶
The .fit() method returns the optimisation metric defined by the class:
[9]:
rule_precision, rule_precisions
[9]:
(0.48214285714285715,
array([0.4875717 , 0.47109208, 0.47645951, 0.48850575, 0.4251497 ]))
[10]:
rule_recall, rule_recalls
[10]:
(0.5051975051975052,
array([0.53014553, 0.45738046, 0.52598753, 0.53014553, 0.44282744]))
[11]:
rule_f1, rule_f1s
[11]:
(0.4934010152284264,
array([0.50796813, 0.46413502, 0.5 , 0.50847458, 0.43380855]))
[12]:
rule_rev, rule_revs
[12]:
(1991,
A 15119
B -14481
C 11721
D 25063
E -74931
dtype: int64)
The .fit() method can be fed into various Iguanas modules as an argument (wherever the opt_func
parameter appears). For example, in the RuleGeneratorOpt module, you can set the metric used to optimise the rules using this methodology.
Creating your own optimisation function¶
Say we want to create a class which calculates the Positive likelihood ratio (TP rate/FP rate).
The main class structure involves having a .fit() method which has three arguments - the binary predictor(s), the binary target and any event specific weights to apply. This method should return a single numeric value.
[15]:
class PositiveLikelihoodRatio:
def fit(self,
y_true: pd.Series,
y_preds: Union[pd.Series, pd.DataFrame],
sample_weight: pd.Series) -> float:
def _calc_plr(y_true, y_preds):
# Calculate TPR
tpr = (y_true * y_preds).sum() / y_true.sum()
# Calculate FPR
fpr = ((1 - y_true) * y_preds).sum()/(1 - y_true).sum()
return 0 if tpr == 0 or fpr == 0 else tpr/fpr
if y_preds.ndim == 1:
return _calc_plr(y_true, y_preds)
else:
plrs = np.empty(y_preds.shape[1])
for i, col in enumerate(y_preds.columns):
plrs[i] = _calc_plr(y_true, y_preds[col])
return plrs
We can then apply the .fit() method to the dataset to check it works:
[16]:
plr = PositiveLikelihoodRatio()
# Single predictor
rule_plr = plr.fit(y_true=y, y_preds=y_pred, sample_weight=None)
# Multiple predictors
rule_plrs = plr.fit(y_true=y, y_preds=y_preds, sample_weight=None)
[17]:
rule_plr, rule_plrs
[17]:
(1.004588142519177,
array([1.02666243, 0.96105448, 0.98196952, 1.0305076 , 0.79801195]))
Finally, after instantiating the class, we can feed the .fit method to a relevant Iguanas module (for example, we can feed the .fit() method to the opt_func parameter in the BayesianOptimiser class so that rules are generated which maximise the Positive Likelihood Ratio).