Rule-Based System (RBS) Pipeline Example

This notebook contains an example of how to create an RBS Pipeline.

An RBS Pipeline allows a user to configure a logical flow for decisioning events. Each stage in the pipeline consists of a set of rules which are linked to a decision. The decision that is applied to each event is dictated by the rule(s) that trigger first.

For example, in the case of approving and rejecting transactions for a e-commerce transaction use case, you might have 3 approve rules and 3 reject rules. These rules could be used in an RBS Pipeline to approve and reject transactions like so:

  1. If any approve rules trigger, approve the transaction.

  2. If no approve rules trigger, but any reject rules trigger, reject the transaction.

  3. If no rules trigger, approve any remaining transactions.

In this notebook, we’ll see how we can create this RBS Pipeline.

Requirements

To run, you’ll need the following:

  • A set of rules that you want to use in the RBS (in this example, we’ll generate these).

  • A labelled, processed dataset (nulls imputed, categorical features encoded).


Import packages

[1]:
from iguanas.rule_generation import RuleGeneratorDT
from iguanas.rbs import RBSPipeline
from iguanas.metrics.classification import FScore

import pandas as pd
from sklearn.ensemble import RandomForestClassifier

Read in data

Let’s read in some labelled, processed dummy data:

[2]:
X_train = pd.read_csv(
    'dummy_data/X_train.csv',
    index_col='eid'
)
y_train = pd.read_csv(
    'dummy_data/y_train.csv',
    index_col='eid'
).squeeze()
X_test = pd.read_csv(
    'dummy_data/X_test.csv',
    index_col='eid'
)
y_test = pd.read_csv(
    'dummy_data/y_test.csv',
    index_col='eid'
).squeeze()

Generate rules

Let’s first generate some rules (both for approving and rejecting transactions) that we’ll use later in our RBS Pipeline.

Note that in this dataset, positive cases in the target column refers to a fraudulent transaction, so we’ll need to flip y when generating approve rules.

Reject rules

[3]:
fs = FScore(beta=1)
[4]:
params = {
    'n_total_conditions': 4,
    'opt_func': fs.fit,
    'tree_ensemble': RandomForestClassifier(n_estimators=5, random_state=0, bootstrap=False),
    'precision_threshold': 0,
    'num_cores': 1,
    'target_feat_corr_types': 'Infer',
    'verbose': 0,
    'rule_name_prefix': 'RejectRule'
}
[5]:
rg_reject = RuleGeneratorDT(**params)
[6]:
X_rules_reject = rg_reject.fit(
    X=X_train,
    y=y_train,
    sample_weight=None
)

Approve rules

[7]:
params = {
    'n_total_conditions': 4,
    'opt_func': fs.fit,
    'tree_ensemble': RandomForestClassifier(n_estimators=2, random_state=0, bootstrap=False),
    'precision_threshold': 0,
    'num_cores': 1,
    'target_feat_corr_types': 'Infer',
    'verbose': 0,
    'rule_name_prefix': 'ApproveRule'
}
[8]:
rg_approve = RuleGeneratorDT(**params)
[9]:
X_rules_approve = rg_approve.fit(
    X=X_train,
    y=(1-y_train), # We flip y here so non-fraudulent transactions become the target
    sample_weight=None
)

Now let’s combine the binary columns of the approve and reject rules into one dataframe:

[10]:
X_rules = pd.concat([X_rules_reject, X_rules_approve], axis=1)
[11]:
X_rules.head()
[11]:
Rule RejectRule_6 RejectRule_13 RejectRule_12 RejectRule_10 RejectRule_9 RejectRule_4 RejectRule_0 RejectRule_7 RejectRule_8 RejectRule_5 ... RejectRule_11 RejectRule_1 RejectRule_3 RejectRule_2 ApproveRule_0 ApproveRule_1 ApproveRule_2 ApproveRule_4 ApproveRule_5 ApproveRule_3
eid
867-8837095-9305559 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 1 1 1 1 1 1
974-5306287-3527394 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 1 1 1 1 1 1
584-0112844-9158928 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 1 1 1 1 1 1
956-4190732-7014837 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 1 1 1 1 1 1
349-7005645-8862067 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 1 1 1 1 1 1

5 rows × 21 columns

[12]:
X_rules_reject.shape[1], X_rules_approve.shape[1]
[12]:
(15, 6)

Setting up the RBS Pipeline

Now, let’s set up our RBS Pipeline using the rules we’ve generated. To reiterate our approach:

  1. If any approve rules trigger, approve the transaction.

  2. If no approve rules trigger, but any reject rules trigger, reject the transaction.

  3. If no rules trigger, approve any remaining transactions.

To set up the pipeline using the logic above, we first need to create the config parameter. This is just a list which outlines the stages of the pipeline. Each stage should be defined using a single-element dictionary, where the key corresponds to the decision at that stage (either 0 or 1), and the value is a list that dictates which rules should trigger for that decision to be made.

In our example, the config will be:

[13]:
config = [
    {0: X_rules_approve.columns.tolist()},
    {1: X_rules_reject.columns.tolist()},
]

Here, the first stage is configured via the dictionary in the first element of the list. This says to apply a decision of 0 (i.e. approve) to transactions where the approve rules have triggered. The second stage is configured via the dictionary in the second element of the list. This says to apply a decision of 1 (i.e. reject) to transactions where the reject rules have triggered (and no approve rules have triggered).

We also need to specify the final decision to be made if no rules are triggered - this is set via the final_decision parameter. In our case this should be 0, as we want to approve any remaining transactions:

[14]:
final_decision = 0

With these parameters configured, we can now create our RBS Pipeline by instantiating the RBSPipeline class:

[15]:
rbsp = RBSPipeline(
    config=config,
    final_decision=final_decision,
    opt_func=fs.fit
)

We can then apply the pipeline to the dataset using the .predict() method:

[16]:
y_pred = rbsp.predict(
    X_rules=X_rules,
    y=y_train
)

Outputs

The .predict() method returns the prediction of the pipeline by applying the pipeline to the given dataset.

Useful attributes created by running the .predict() method are:

  • pipeline_opt_metric (float): The result of the opt_func function when the pipeline is applied.

[17]:
rbsp.pipeline_opt_metric
[17]:
0.14503816793893132

We can also use the .calc_performance() method to generate some performance metrics for the pipeline:

[18]:
rbsp.calc_performance(
    y_true=y_train,
    y_pred=y_pred
)
[19]:
rbsp.pipeline_perf
[19]:
Precision Recall PercDataFlagged
1 1.000000 0.078189 0.002136
0 0.974761 1.000000 0.997864
[20]:
rbsp.conf_matrix
[20]:
1 0
1 19.0 0.0
0 224.0 8651.0