SimpleFilter Example

This notebook contains an example of how the SimpleFilter class can be used to filter out low performing rules from a set.

Requirements

To run, you’ll need the following:

  • A rule set (specifically the binary columns of the rules as applied to a dataset).

  • The binary target column associated with the above dataset (or the standard rule_descriptions dataframe containing the rule performance metrics)


Import packages

[1]:
from iguanas.rule_selection import SimpleFilter
from iguanas.metrics.classification import FScore

import pandas as pd

Read in data

Let’s read in some dummy rules (stored as binary columns) and the target column.

[2]:
X_rules_train = pd.read_csv(
    'dummy_data/X_rules_train.csv',
    index_col='eid'
)
y_train = pd.read_csv(
    'dummy_data/y_train.csv',
    index_col='eid'
).squeeze()
X_rules_test = pd.read_csv(
    'dummy_data/X_rules_test.csv',
    index_col='eid'
)
y_test = pd.read_csv(
    'dummy_data//y_test.csv',
    index_col='eid'
).squeeze()
[3]:
X_rules_train.columns.tolist()
[3]:
['Rule1', 'Rule2', 'Rule3', 'Rule4', 'Rule5']

Filter rules based on performance metrics

To filter rules based on performance metrics (e.g. precision, recall, etc), you can use the SimpleFilter class.

Set up class parameters

Now we can set our class parameters for the SimpleFilter class. You can filter rules based on precision, recall or a custom metric (e.g. F1 score). Here, we’ll be filtering out rules with an F1 score < 0.99. To filter on F1 score, we’ll use the FScore class from the optimisation_functions module.

Please see the class docstring for more information on each parameter.

[4]:
f1 = FScore(beta=1)

Note that the dictionary key below is ‘OptMetric’, which is the standard column name for the custom metric (in this case, F1 score)

[5]:
filters = {
    'OptMetric': {
        'Operator': '>=',
        'Value': 0.30
    }
}
[6]:
params = {
    'filters': filters,
    'opt_func': f1.fit
}

Instantiate class and run fit method

Once the parameters have been set, we can run the .fit() method to calculate which rules should be kept.

[7]:
fr = SimpleFilter(**params)
fr.fit(
    X_rules=X_rules_train,
    y=y_train
)

Outputs

The .fit() method does not return anything. However it does create the following attribute:

  • rules_to_keep: The list of rules which meet the filtering criteria.

[8]:
fr.rules_to_keep
[8]:
['Rule1', 'Rule2', 'Rule3']

Drop filtered rules from another dataset

Use the .transform() method to drop the filtered rules from a given dataset.

[9]:
X_rules_test_filtered = fr.transform(X_rules=X_rules_test)

Outputs

The .transform() method returns a dataframe with the filtered rules dropped. It also filters the provided rule_descriptions dataframe and saves as a class attribute with the same name:

[10]:
X_rules_test_filtered.head()
[10]:
Rule1 Rule2 Rule3
eid
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
[11]:
fr.rule_descriptions
[11]:
Precision Recall PercDataFlagged OptMetric
Rule
Rule1 1.0 0.3 0.006 0.461538
Rule2 1.0 0.3 0.006 0.461538
Rule3 1.0 0.3 0.006 0.461538

Calculate filtered rules and drop them from a dataset (in one step)

You can also use the fit_transform() method to calculate the filtered rules and drop them from the training set.

[12]:
X_rules_train_filtered = fr.fit_transform(
    X_rules=X_rules_train,
    y=y_train
)

Outputs

The .fit_transform() method returns a dataframe with the filtered rules dropped, while filtering the provided rule_descriptions dataframe and saving it as a class attribute with the same name. It also creates the following attribute:

  • rules_to_keep: The list of rules which meet the filtering criteria.

[13]:
fr.rules_to_keep
[13]:
['Rule1', 'Rule2', 'Rule3']
[14]:
X_rules_train_filtered.head()
[14]:
Rule Rule1 Rule2 Rule3
eid
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
[15]:
fr.rule_descriptions
[15]:
Precision Recall PercDataFlagged OptMetric
Rule
Rule1 1.0 0.3 0.006 0.461538
Rule2 1.0 0.3 0.006 0.461538
Rule3 1.0 0.3 0.006 0.461538