Unsupervised Metrics Example¶
This notebook contains an example of how unsupervised metrics can be applied to a dataset, how they can be used in other Iguanas modules and how to create your own.
Requirements¶
To run, you’ll need the following:
A dataset containing binary predictor columns
Import packages¶
[1]:
from iguanas.metrics.unsupervised import AlertsPerDay, PercVolume
import pandas as pd
import numpy as np
Create data¶
Let’s create some dummy predictor columns. For this example, let’s assume the dummy predictor columns represent rules that have been applied to a dataset.
[2]:
np.random.seed(0)
y_pred = pd.Series(np.random.randint(0, 2, 1000), name = 'A')
y_preds = pd.DataFrame(np.random.randint(0, 2, (1000, 5)), columns=[i for i in 'ABCDE'])
Apply optimisation functions¶
There are currently two unsupervised metrics available:
Alerts per day (calculates the negative squared difference between the daily number of records a rule flags vs the targetted daily number of records flagged)
Percentage of volume (calculates the negative squared difference between the percentage of the overall volume that the rule flags vs the targetted percentage of volume flagged)
Instantiate class and run fit method¶
We can run the .fit() method to calculate the optimisation metric for each column in the dataset.
Alerts per day¶
[3]:
apd = AlertsPerDay(n_alerts_expected_per_day=5, no_of_days_in_file=10)
# Single predictor
rule_apd = apd.fit(y_preds=y_pred)
# Multiple predictors
rule_apds = apd.fit(y_preds=y_preds)
Percentage of volume¶
[4]:
pv = PercVolume(perc_vol_expected=0.02)
# Single predictor
rule_pv = pv.fit(y_preds=y_pred)
# Multiple predictors
rule_pvs = pv.fit(y_preds=y_preds)
Outputs¶
The .fit() method returns the optimisation metric defined by the class:
[5]:
rule_apd, rule_apds
[5]:
(-2061.16, array([-2237.29, -1738.89, -2313.61, -2227.84, -2034.01]))
[6]:
rule_pv, rule_pvs
[6]:
(-0.234256, array([-0.253009, -0.199809, -0.261121, -0.252004, -0.231361]))
The .fit() method can be fed into various Iguanas modules as an argument (wherever the opt_func
parameter appears). For example, in the RuleGeneratorOpt module, you can set the metric used to optimise the rules using this methodology.