SimpleFilter Example¶
This notebook contains an example of how the SimpleFilter class can be used to filter out low performing rules from a set.
Requirements¶
To run, you’ll need the following:
A rule set (specifically the binary columns of the rules as applied to a dataset).
The binary target column associated with the above dataset (or the standard rule_descriptions dataframe containing the rule performance metrics)
Import packages¶
[1]:
from iguanas.rule_selection import SimpleFilter
from iguanas.metrics.classification import FScore
import pandas as pd
Read in data¶
Let’s read in some dummy rules (stored as binary columns) and the target column.
[2]:
X_rules_train = pd.read_csv(
'dummy_data/X_rules_train.csv',
index_col='eid'
)
y_train = pd.read_csv(
'dummy_data/y_train.csv',
index_col='eid'
).squeeze()
X_rules_test = pd.read_csv(
'dummy_data/X_rules_test.csv',
index_col='eid'
)
y_test = pd.read_csv(
'dummy_data//y_test.csv',
index_col='eid'
).squeeze()
[3]:
X_rules_train.columns.tolist()
[3]:
['Rule1', 'Rule2', 'Rule3', 'Rule4', 'Rule5']
Filter rules based on performance metrics¶
To filter rules based on performance metrics (e.g. precision, recall, etc), you can use the SimpleFilter class.
Set up class parameters¶
Now we can set our class parameters for the SimpleFilter class. You can filter rules based on precision, recall or a custom metric (e.g. F1 score). Here, we’ll be filtering out rules with an F1 score < 0.99. To filter on F1 score, we’ll use the FScore class from the optimisation_functions module.
Please see the class docstring for more information on each parameter.
[4]:
f1 = FScore(beta=1)
Note that the dictionary key below is ‘OptMetric’, which is the standard column name for the custom metric (in this case, F1 score)
[5]:
filters = {
'OptMetric': {
'Operator': '>=',
'Value': 0.30
}
}
[6]:
params = {
'filters': filters,
'opt_func': f1.fit
}
Instantiate class and run fit method¶
Once the parameters have been set, we can run the .fit() method to calculate which rules should be kept.
[7]:
fr = SimpleFilter(**params)
fr.fit(
X_rules=X_rules_train,
y=y_train
)
Outputs¶
The .fit() method does not return anything. However it does create the following attribute:
rules_to_keep: The list of rules which meet the filtering criteria.
[8]:
fr.rules_to_keep
[8]:
['Rule1', 'Rule2', 'Rule3']
Drop filtered rules from another dataset¶
Use the .transform() method to drop the filtered rules from a given dataset.
[9]:
X_rules_test_filtered = fr.transform(X_rules=X_rules_test)
Outputs¶
The .transform() method returns a dataframe with the filtered rules dropped. It also filters the provided rule_descriptions dataframe and saves as a class attribute with the same name:
[10]:
X_rules_test_filtered.head()
[10]:
Rule1 | Rule2 | Rule3 | |
---|---|---|---|
eid | |||
0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 |
2 | 0 | 0 | 0 |
3 | 0 | 0 | 0 |
4 | 0 | 0 | 0 |
[11]:
fr.rule_descriptions
[11]:
Precision | Recall | PercDataFlagged | OptMetric | |
---|---|---|---|---|
Rule | ||||
Rule1 | 1.0 | 0.3 | 0.006 | 0.461538 |
Rule2 | 1.0 | 0.3 | 0.006 | 0.461538 |
Rule3 | 1.0 | 0.3 | 0.006 | 0.461538 |
Calculate filtered rules and drop them from a dataset (in one step)¶
You can also use the fit_transform() method to calculate the filtered rules and drop them from the training set.
[12]:
X_rules_train_filtered = fr.fit_transform(
X_rules=X_rules_train,
y=y_train
)
Outputs¶
The .fit_transform() method returns a dataframe with the filtered rules dropped, while filtering the provided rule_descriptions dataframe and saving it as a class attribute with the same name. It also creates the following attribute:
rules_to_keep: The list of rules which meet the filtering criteria.
[13]:
fr.rules_to_keep
[13]:
['Rule1', 'Rule2', 'Rule3']
[14]:
X_rules_train_filtered.head()
[14]:
Rule | Rule1 | Rule2 | Rule3 |
---|---|---|---|
eid | |||
0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 |
2 | 0 | 0 | 0 |
3 | 0 | 0 | 0 |
4 | 0 | 0 | 0 |
[15]:
fr.rule_descriptions
[15]:
Precision | Recall | PercDataFlagged | OptMetric | |
---|---|---|---|---|
Rule | ||||
Rule1 | 1.0 | 0.3 | 0.006 | 0.461538 |
Rule2 | 1.0 | 0.3 | 0.006 | 0.461538 |
Rule3 | 1.0 | 0.3 | 0.006 | 0.461538 |