Rule-Based System (RBS) Pipeline Example¶
This notebook contains an example of how to create an RBS Pipeline.
An RBS Pipeline allows a user to configure a logical flow for decisioning events. Each stage in the pipeline consists of a set of rules which are linked to a decision. The decision that is applied to each event is dictated by the rule(s) that trigger first.
For example, in the case of approving and rejecting transactions for a e-commerce transaction use case, you might have 3 approve rules and 3 reject rules. These rules could be used in an RBS Pipeline to approve and reject transactions like so:
If any approve rules trigger, approve the transaction.
If no approve rules trigger, but any reject rules trigger, reject the transaction.
If no rules trigger, approve any remaining transactions.
In this notebook, we’ll see how we can create this RBS Pipeline.
Requirements¶
To run, you’ll need the following:
A set of rules that you want to use in the RBS (in this example, we’ll generate these).
A labelled, processed dataset (nulls imputed, categorical features encoded).
Import packages¶
[1]:
from iguanas.rule_generation import RuleGeneratorDT
from iguanas.rbs import RBSPipeline
from iguanas.metrics.classification import FScore
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
Read in data¶
Let’s read in some labelled, processed dummy data:
[2]:
X_train = pd.read_csv(
'dummy_data/X_train.csv',
index_col='eid'
)
y_train = pd.read_csv(
'dummy_data/y_train.csv',
index_col='eid'
).squeeze()
X_test = pd.read_csv(
'dummy_data/X_test.csv',
index_col='eid'
)
y_test = pd.read_csv(
'dummy_data/y_test.csv',
index_col='eid'
).squeeze()
Generate rules¶
Let’s first generate some rules (both for approving and rejecting transactions) that we’ll use later in our RBS Pipeline.
Note that in this dataset, positive cases in the target column refers to a fraudulent transaction, so we’ll need to flip y when generating approve rules.
Reject rules¶
[3]:
fs = FScore(beta=1)
[4]:
params = {
'n_total_conditions': 4,
'opt_func': fs.fit,
'tree_ensemble': RandomForestClassifier(n_estimators=5, random_state=0, bootstrap=False),
'precision_threshold': 0,
'num_cores': 1,
'target_feat_corr_types': 'Infer',
'verbose': 0,
'rule_name_prefix': 'RejectRule'
}
[5]:
rg_reject = RuleGeneratorDT(**params)
[6]:
X_rules_reject = rg_reject.fit(
X=X_train,
y=y_train,
sample_weight=None
)
Approve rules¶
[7]:
params = {
'n_total_conditions': 4,
'opt_func': fs.fit,
'tree_ensemble': RandomForestClassifier(n_estimators=2, random_state=0, bootstrap=False),
'precision_threshold': 0,
'num_cores': 1,
'target_feat_corr_types': 'Infer',
'verbose': 0,
'rule_name_prefix': 'ApproveRule'
}
[8]:
rg_approve = RuleGeneratorDT(**params)
[9]:
X_rules_approve = rg_approve.fit(
X=X_train,
y=(1-y_train), # We flip y here so non-fraudulent transactions become the target
sample_weight=None
)
Now let’s combine the binary columns of the approve and reject rules into one dataframe:
[10]:
X_rules = pd.concat([X_rules_reject, X_rules_approve], axis=1)
[11]:
X_rules.head()
[11]:
Rule | RejectRule_6 | RejectRule_13 | RejectRule_12 | RejectRule_10 | RejectRule_9 | RejectRule_4 | RejectRule_0 | RejectRule_7 | RejectRule_8 | RejectRule_5 | ... | RejectRule_11 | RejectRule_1 | RejectRule_3 | RejectRule_2 | ApproveRule_0 | ApproveRule_1 | ApproveRule_2 | ApproveRule_4 | ApproveRule_5 | ApproveRule_3 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
eid | |||||||||||||||||||||
867-8837095-9305559 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
974-5306287-3527394 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
584-0112844-9158928 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
956-4190732-7014837 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
349-7005645-8862067 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
5 rows × 21 columns
[12]:
X_rules_reject.shape[1], X_rules_approve.shape[1]
[12]:
(15, 6)
Setting up the RBS Pipeline¶
Now, let’s set up our RBS Pipeline using the rules we’ve generated. To reiterate our approach:
If any approve rules trigger, approve the transaction.
If no approve rules trigger, but any reject rules trigger, reject the transaction.
If no rules trigger, approve any remaining transactions.
To set up the pipeline using the logic above, we first need to create the config parameter. This is just a list which outlines the stages of the pipeline. Each stage should be defined using a single-element dictionary, where the key corresponds to the decision at that stage (either 0 or 1), and the value is a list that dictates which rules should trigger for that decision to be made.
In our example, the config will be:
[13]:
config = [
{0: X_rules_approve.columns.tolist()},
{1: X_rules_reject.columns.tolist()},
]
Here, the first stage is configured via the dictionary in the first element of the list. This says to apply a decision of 0 (i.e. approve) to transactions where the approve rules have triggered. The second stage is configured via the dictionary in the second element of the list. This says to apply a decision of 1 (i.e. reject) to transactions where the reject rules have triggered (and no approve rules have triggered).
We also need to specify the final decision to be made if no rules are triggered - this is set via the final_decision parameter. In our case this should be 0, as we want to approve any remaining transactions:
[14]:
final_decision = 0
With these parameters configured, we can now create our RBS Pipeline by instantiating the RBSPipeline class:
[15]:
rbsp = RBSPipeline(
config=config,
final_decision=final_decision,
opt_func=fs.fit
)
We can then apply the pipeline to the dataset using the .predict() method:
[16]:
y_pred = rbsp.predict(
X_rules=X_rules,
y=y_train
)
Outputs¶
The .predict() method returns the prediction of the pipeline by applying the pipeline to the given dataset.
Useful attributes created by running the .predict() method are:
pipeline_opt_metric (float): The result of the
opt_func
function when the pipeline is applied.
[17]:
rbsp.pipeline_opt_metric
[17]:
0.14503816793893132
We can also use the .calc_performance() method to generate some performance metrics for the pipeline:
[18]:
rbsp.calc_performance(
y_true=y_train,
y_pred=y_pred
)
[19]:
rbsp.pipeline_perf
[19]:
Precision | Recall | PercDataFlagged | |
---|---|---|---|
1 | 1.000000 | 0.078189 | 0.002136 |
0 | 0.974761 | 1.000000 | 0.997864 |
[20]:
rbsp.conf_matrix
[20]:
1 | 0 | |
---|---|---|
1 | 19.0 | 0.0 |
0 | 224.0 | 8651.0 |