Unsupervised Metrics Example

This notebook contains an example of how unsupervised metrics can be applied to a dataset, how they can be used in other Iguanas modules and how to create your own.

Requirements

To run, you’ll need the following:

  • A dataset containing binary predictor columns


Import packages

[1]:
from iguanas.metrics.unsupervised import AlertsPerDay, PercVolume

import pandas as pd
import numpy as np

Create data

Let’s create some dummy predictor columns. For this example, let’s assume the dummy predictor columns represent rules that have been applied to a dataset.

[2]:
np.random.seed(0)

y_pred = pd.Series(np.random.randint(0, 2, 1000), name = 'A')
y_preds = pd.DataFrame(np.random.randint(0, 2, (1000, 5)), columns=[i for i in 'ABCDE'])

Apply optimisation functions

There are currently two unsupervised metrics available:

  • Alerts per day (calculates the negative squared difference between the daily number of records a rule flags vs the targetted daily number of records flagged)

  • Percentage of volume (calculates the negative squared difference between the percentage of the overall volume that the rule flags vs the targetted percentage of volume flagged)

Instantiate class and run fit method

We can run the .fit() method to calculate the optimisation metric for each column in the dataset.

Alerts per day

[3]:
apd = AlertsPerDay(n_alerts_expected_per_day=5, no_of_days_in_file=10)
# Single predictor
rule_apd = apd.fit(y_preds=y_pred)
# Multiple predictors
rule_apds = apd.fit(y_preds=y_preds)

Percentage of volume

[4]:
pv = PercVolume(perc_vol_expected=0.02)
# Single predictor
rule_pv = pv.fit(y_preds=y_pred)
# Multiple predictors
rule_pvs = pv.fit(y_preds=y_preds)

Outputs

The .fit() method returns the optimisation metric defined by the class:

[5]:
rule_apd, rule_apds
[5]:
(-2061.16, array([-2237.29, -1738.89, -2313.61, -2227.84, -2034.01]))
[6]:
rule_pv, rule_pvs
[6]:
(-0.234256, array([-0.253009, -0.199809, -0.261121, -0.252004, -0.231361]))

The .fit() method can be fed into various Iguanas modules as an argument (wherever the opt_func parameter appears). For example, in the RuleGeneratorOpt module, you can set the metric used to optimise the rules using this methodology.