Classification Metrics Example

This notebook contains an example of how classification metrics can be applied to a dataset, how they can be used in other Iguanas modules and how to create your own.

Requirements

To run, you’ll need the following:

  • A dataset containing binary predictor columns and a binary target column.


Import packages

[1]:
from iguanas.metrics.classification import Precision, Recall, FScore, Revenue

import pandas as pd
import numpy as np
from typing import Union

Create data

Let’s create some dummy predictor columns and a binary target column. For this example, let’s assume the dummy predictor columns represent rules that have been applied to a dataset.

[2]:
np.random.seed(0)

y_pred = pd.Series(np.random.randint(0, 2, 1000), name = 'A')
y_preds = pd.DataFrame(np.random.randint(0, 2, (1000, 5)), columns=[i for i in 'ABCDE'])
y = pd.Series(np.random.randint(0, 2, 1000), name = 'label')
amounts = pd.Series(np.random.randint(0, 1000, 1000), name = 'amounts')

Apply optimisation functions

There are currently four classification metrics available:

  • Precision score

  • Recall score

  • Fbeta score

  • Revenue

Note that the FScore, Precision or Recall classes are ~100 times faster on larger datasets compared to the same functions from Sklearn’s metrics module. They also work with Koalas DataFrames, whereas the Sklearn functions do not.

Instantiate class and run fit method

We can run the .fit() method to calculate the optimisation metric for each column in the dataset.

Precision score

[3]:
precision = Precision()
# Single predictor
rule_precision = precision.fit(y_true=y, y_preds=y_pred, sample_weight=None)
# Multiple predictors
rule_precisions = precision.fit(y_true=y, y_preds=y_preds, sample_weight=None)

Recall score

[4]:
recall = Recall()
# Single predictor
rule_recall = recall.fit(y_true=y, y_preds=y_pred, sample_weight=None)
# Multiple predictors
rule_recalls = recall.fit(y_true=y, y_preds=y_preds, sample_weight=None)

Fbeta score (beta=1)

[6]:
f1 = FScore(beta=1)
# Single predictor
rule_f1 = f1.fit(y_true=y, y_preds=y_pred, sample_weight=None)
# Multiple predictors
rule_f1s = f1.fit(y_true=y, y_preds=y_preds, sample_weight=None)

Revenue

[8]:
rev = Revenue(y_type='Fraud', chargeback_multiplier=2)
# Single predictor
rule_rev = rev.fit(y_true=y, y_preds=y_pred, sample_weight=amounts)
# Multiple predictors
rule_revs = rev.fit(y_true=y, y_preds=y_preds, sample_weight=amounts)

Outputs

The .fit() method returns the optimisation metric defined by the class:

[9]:
rule_precision, rule_precisions
[9]:
(0.48214285714285715,
 array([0.4875717 , 0.47109208, 0.47645951, 0.48850575, 0.4251497 ]))
[10]:
rule_recall, rule_recalls
[10]:
(0.5051975051975052,
 array([0.53014553, 0.45738046, 0.52598753, 0.53014553, 0.44282744]))
[11]:
rule_f1, rule_f1s
[11]:
(0.4934010152284264,
 array([0.50796813, 0.46413502, 0.5       , 0.50847458, 0.43380855]))
[12]:
rule_rev, rule_revs
[12]:
(1991,
 A    15119
 B   -14481
 C    11721
 D    25063
 E   -74931
 dtype: int64)

The .fit() method can be fed into various Iguanas modules as an argument (wherever the opt_func parameter appears). For example, in the RuleGeneratorOpt module, you can set the metric used to optimise the rules using this methodology.


Creating your own optimisation function

Say we want to create a class which calculates the Positive likelihood ratio (TP rate/FP rate).

The main class structure involves having a .fit() method which has three arguments - the binary predictor(s), the binary target and any event specific weights to apply. This method should return a single numeric value.

[15]:
class PositiveLikelihoodRatio:

    def fit(self,
            y_true: pd.Series,
            y_preds: Union[pd.Series, pd.DataFrame],
            sample_weight: pd.Series) -> float:

        def _calc_plr(y_true, y_preds):
            # Calculate TPR
            tpr = (y_true * y_preds).sum() / y_true.sum()
            # Calculate FPR
            fpr = ((1 - y_true) * y_preds).sum()/(1 - y_true).sum()
            return 0 if tpr == 0 or fpr == 0 else tpr/fpr

        if y_preds.ndim == 1:
            return _calc_plr(y_true, y_preds)
        else:
            plrs = np.empty(y_preds.shape[1])
            for i, col in enumerate(y_preds.columns):
                plrs[i] = _calc_plr(y_true, y_preds[col])
            return plrs

We can then apply the .fit() method to the dataset to check it works:

[16]:
plr = PositiveLikelihoodRatio()
# Single predictor
rule_plr = plr.fit(y_true=y, y_preds=y_pred, sample_weight=None)
# Multiple predictors
rule_plrs = plr.fit(y_true=y, y_preds=y_preds, sample_weight=None)
[17]:
rule_plr, rule_plrs
[17]:
(1.004588142519177,
 array([1.02666243, 0.96105448, 0.98196952, 1.0305076 , 0.79801195]))

Finally, after instantiating the class, we can feed the .fit method to a relevant Iguanas module (for example, we can feed the .fit() method to the opt_func parameter in the BayesianOptimiser class so that rules are generated which maximise the Positive Likelihood Ratio).