Rule Applier Example

This notebook contains an example of how the Rule Applier can be used to apply Iguanas-readable rules to a dataset.

Requirements

To run, you’ll need the following:

  • A dataset containing the same features used in the rules.


Import packages

[16]:
from iguanas.rule_application import RuleApplier
from iguanas.metrics.classification import FScore

import pandas as pd

Read in data

Let’s read in some dummy data.

[17]:
X = pd.read_csv(
    'dummy_data/X_train.csv',
    index_col='eid'
)
y = pd.read_csv(
    'dummy_data/y_train.csv',
    index_col='eid'
).squeeze()

Apply rules

Set up class parameters

Now we can set our class parameters for the Rule Applier. Here we’re specifying an additional metric to calculate for each rule (the F1 score). However, you can omit this if you just need to calculate the standard results (Precision, Recall and PercDataFlagged).

Please see the class docstring for more information on each parameter.

[18]:
fs = FScore(beta=1)
[19]:
params = {
    'rule_strings': {
        'Rule1': "(X['account_number_num_fraud_transactions_per_account_number_1day']>=1)",
        'Rule2': "(X['account_number_num_fraud_transactions_per_account_number_1day']>=1)&(X['account_number_num_fraud_transactions_per_account_number_30day']>=1)",
        'Rule3': "(X['account_number_num_fraud_transactions_per_account_number_1day']>=1)&(X['order_total']>50.87)"
    },
    'opt_func': fs.fit
}

Instantiate class and run

Once the parameters have been set, we can run the .transform() method to apply the list of rules to the dataset. Note that you can omit the y parameter if you have unlabelled data (however ensure that if you are providing an optimisation function to opt_func, it is not expecting a target column - see the optimisation_functions module for more information):

[20]:
ara = RuleApplier(**params)
X_rules = ara.transform(
    X=X,
    y=y,
    sample_weight=None
)

Outputs

The .transform() method returns a dataframe giving the binary columns of the rules as applied to the training dataset.

A useful attribute created by running the .transform() method (when the y parameter is given) is:

  • rule_descriptions: A dataframe showing the logic of the rules and their performance metrics as applied to the dataset.

[21]:
ara.rule_descriptions.head()
[21]:
Precision Recall PercDataFlagged OptMetric Logic nConditions
Rule
Rule1 0.991837 1.000000 0.027547 0.995902 (X['account_number_num_fraud_transactions_per_... 1
Rule2 0.991837 1.000000 0.027547 0.995902 (X['account_number_num_fraud_transactions_per_... 2
Rule3 0.995851 0.987654 0.027097 0.991736 (X['account_number_num_fraud_transactions_per_... 2
[22]:
X_rules.head()
[22]:
Rule Rule1 Rule2 Rule3
eid
867-8837095-9305559 0 0 0
974-5306287-3527394 0 0 0
584-0112844-9158928 0 0 0
956-4190732-7014837 0 0 0
349-7005645-8862067 0 0 0