Rule-Based System (RBS) Optimiser Example¶
This notebook contains an example of how the RBS Optimiser can be used to optimise which rules are used to generate decisions as part of an RBS Pipeline.
An RBS Pipeline allows a user to configure a logical flow for decisioning events. Each stage in the pipeline consists of a set of rules which are linked to a decision. The decision that is applied to each event is dictated by the rule(s) that trigger first.
For example, in the case of approving and rejecting transactions for a e-commerce transaction use case, you might have 3 approve rules and 3 reject rules. These rules could be used in an RBS Pipeline to approve and reject transactions like so:
If any approve rules trigger, approve the transaction.
If no approve rules trigger, but any reject rules trigger, reject the transaction.
If no rules trigger, approve any remaining transactions.
In this notebook, we’ll see how we can create and optimise this RBS Pipeline.
Requirements¶
To run, you’ll need the following:
A set of rules that you want to use in the RBS (in this example, we’ll generate these).
A labelled, processed dataset (nulls imputed, categorical features encoded).
Import packages¶
[1]:
from iguanas.rule_generation import RuleGeneratorDT
from iguanas.rbs import RBSPipeline, RBSOptimiser
from iguanas.metrics.classification import FScore
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
Read in data¶
Let’s read in some labelled, processed dummy data:
[2]:
X_train = pd.read_csv(
'dummy_data/X_train.csv',
index_col='eid'
)
y_train = pd.read_csv(
'dummy_data/y_train.csv',
index_col='eid'
).squeeze()
X_test = pd.read_csv(
'dummy_data/X_test.csv',
index_col='eid'
)
y_test = pd.read_csv(
'dummy_data/y_test.csv',
index_col='eid'
).squeeze()
Generate rules¶
Let’s first generate some rules (both for approving and rejecting transactions) that we’ll use later in our RBS Pipeline.
Note that in this dataset, positive cases in the target column refers to a fraudulent transaction, so we’ll need to flip y when generating approve rules.
Reject rules¶
[3]:
fs = FScore(beta=1)
[4]:
params = {
'n_total_conditions': 4,
'opt_func': fs.fit,
'tree_ensemble': RandomForestClassifier(n_estimators=5, random_state=0, bootstrap=False),
'precision_threshold': 0,
'num_cores': 1,
'target_feat_corr_types': 'Infer',
'verbose': 0,
'rule_name_prefix': 'RejectRule'
}
[5]:
rg_reject = RuleGeneratorDT(**params)
[6]:
X_rules_reject = rg_reject.fit(
X=X_train,
y=y_train,
sample_weight=None
)
Approve rules¶
[7]:
params = {
'n_total_conditions': 4,
'opt_func': fs.fit,
'tree_ensemble': RandomForestClassifier(n_estimators=2, random_state=0, bootstrap=False),
'precision_threshold': 0,
'num_cores': 1,
'target_feat_corr_types': 'Infer',
'verbose': 0,
'rule_name_prefix': 'ApproveRule'
}
[8]:
rg_approve = RuleGeneratorDT(**params)
[9]:
X_rules_approve = rg_approve.fit(
X=X_train,
y=(1-y_train), # We flip y here so non-fraudulent transactions become the target
sample_weight=None
)
Now let’s combine the binary columns of the approve and reject rules into one dataframe:
[10]:
X_rules = pd.concat([X_rules_reject, X_rules_approve], axis=1)
[11]:
X_rules.head()
[11]:
Rule | RejectRule_6 | RejectRule_13 | RejectRule_12 | RejectRule_10 | RejectRule_9 | RejectRule_4 | RejectRule_0 | RejectRule_7 | RejectRule_8 | RejectRule_5 | ... | RejectRule_11 | RejectRule_1 | RejectRule_3 | RejectRule_2 | ApproveRule_0 | ApproveRule_1 | ApproveRule_2 | ApproveRule_4 | ApproveRule_5 | ApproveRule_3 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
eid | |||||||||||||||||||||
867-8837095-9305559 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
974-5306287-3527394 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
584-0112844-9158928 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
956-4190732-7014837 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
349-7005645-8862067 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
5 rows × 21 columns
[12]:
X_rules_reject.shape[1], X_rules_approve.shape[1]
[12]:
(15, 6)
Setting up the RBS Pipeline¶
Now, let’s set up our RBS Pipeline using the rules we’ve generated. To reiterate our approach:
If any approve rules trigger, approve the transaction.
If no approve rules trigger, but any reject rules trigger, reject the transaction.
If no rules trigger, approve any remaining transactions.
To set up the pipeline using the logic above, we first need to create the config parameter. This is just a list which outlines the stages of the pipeline. Each stage should be defined using a single-element dictionary, where the key corresponds to the decision at that stage (either 0 or 1), and the value is a list that dictates which rules should trigger for that decision to be made.
In our example, the config will be:
[13]:
config = [
{0: X_rules_approve.columns.tolist()},
{1: X_rules_reject.columns.tolist()},
]
Here, the first stage is configured via the dictionary in the first element of the list. This says to apply a decision of 0 (i.e. approve) to transactions where the approve rules have triggered. The second stage is configured via the dictionary in the second element of the list. This says to apply a decision of 1 (i.e. reject) to transactions where the reject rules have triggered (and no approve rules have triggered).
We also need to specify the final decision to be made if no rules are triggered - this is set via the final_decision parameter. In our case this should be 0, as we want to approve any remaining transactions:
[14]:
final_decision = 0
With these parameters configured, we can now create our RBS Pipeline by instantiating the RBSPipeline class:
[15]:
rbsp = RBSPipeline(
config=config,
final_decision=final_decision,
opt_func=fs.fit
)
We can then apply the pipeline to the dataset using the .predict() method:
[16]:
y_pred = rbsp.predict(
X_rules=X_rules,
y=y_train
)
Outputs¶
The .predict() method returns the prediction of the pipeline by applying the pipeline to the given dataset.
Useful attributes created by running the .predict() method are:
pipeline_opt_metric (float): The result of the
opt_func
function when the pipeline is applied.
[17]:
rbsp.pipeline_opt_metric
[17]:
0.14503816793893132
We can also use the .calc_performance() method to generate some performance metrics for the pipeline:
[18]:
rbsp.calc_performance(
y_true=y_train,
y_pred=y_pred
)
[19]:
rbsp.pipeline_perf
[19]:
Precision | Recall | PercDataFlagged | |
---|---|---|---|
1 | 1.000000 | 0.078189 | 0.002136 |
0 | 0.974761 | 1.000000 | 0.997864 |
[20]:
rbsp.conf_matrix
[20]:
1 | 0 | |
---|---|---|
1 | 19.0 | 0.0 |
0 | 224.0 | 8651.0 |
Optimising the RBS Pipeline¶
Now that we have our basic RBS Pipeline set up, we can optimise it using the RBS Optimiser. Here, we just pass the instatiated pipeline class to the pipeline parameter in the RBSOptimiser class:
[21]:
rbso = RBSOptimiser(
pipeline=rbsp,
n_iter=60,
verbose=1
)
Then we run the .fit_predict() method to optimise the pipeline using the given dataset, then apply it to the dataset:
[22]:
y_pred = rbso.fit_predict(
X_rules=X_rules,
y=y_train
)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:01<00:00, 45.30trial/s, best loss: -0.9959016393442623]
Outputs¶
The .fit_predict() method optimises the pipeline and returns the prediction of the optimised pipeline by applying it to the given dataset.
Useful attributes created by running the .fit_predict() method are:
config (List[str]): The optimised pipeline configuration, where each element aligns to a stage in the pipeline. Each element is a dictionary, where the key is the decision made at that stage (either 0 or 1) and the value is a list of the rules that must trigger to give that decision.
pipeline_opt_metric (float): The result of the
opt_func
function when the pipeline is applied.
[23]:
rbso.config
[23]:
[{0: ['ApproveRule_0']},
{1: ['RejectRule_13',
'RejectRule_12',
'RejectRule_10',
'RejectRule_9',
'RejectRule_7',
'RejectRule_8']}]
[24]:
rbso.pipeline_opt_metric
[24]:
0.9959016393442623
We can also use the .calc_performance() method to generate some performance metrics for the pipeline:
[25]:
rbso.calc_performance(
y_true=y_train,
y_pred=y_pred
)
[26]:
rbso.pipeline_perf
[26]:
Precision | Recall | PercDataFlagged | |
---|---|---|---|
1 | 0.991837 | 1.000000 | 0.027547 |
0 | 1.000000 | 0.999769 | 0.972453 |
[27]:
rbso.conf_matrix
[27]:
1 | 0 | |
---|---|---|
1 | 243.0 | 2.0 |
0 | 0.0 | 8649.0 |
By comparing these performance metrics to those of the original pipeline, we can see that the RBS Optimiser has indeed improved the performance of the original RBS Pipeline:
[28]:
print(f'Original RBS Pipeline F1 score: {rbsp.pipeline_opt_metric}')
print(f'Optimised RBS Pipeline F1 score: {rbso.pipeline_opt_metric}')
Original RBS Pipeline F1 score: 0.14503816793893132
Optimised RBS Pipeline F1 score: 0.9959016393442623
[29]:
print('Original pipeline performance:')
rbsp.pipeline_perf
Original pipeline performance:
[29]:
Precision | Recall | PercDataFlagged | |
---|---|---|---|
1 | 1.000000 | 0.078189 | 0.002136 |
0 | 0.974761 | 1.000000 | 0.997864 |
[30]:
print('Optimised pipeline performance:')
rbso.pipeline_perf
Optimised pipeline performance:
[30]:
Precision | Recall | PercDataFlagged | |
---|---|---|---|
1 | 0.991837 | 1.000000 | 0.027547 |
0 | 1.000000 | 0.999769 | 0.972453 |
Optimising the RBS Pipeline (without a config)¶
In the previous example, we instantiated a pipeline with a config before optimising.
However, if we don’t know what structure the config should have, or don’t have any requirements for its structure, we can use the RBS Optimiser to generate a new config from scratch, which will optimise the overall performance of the RBS Pipeline.
To do this, we follow a similar process as before - the only difference being that we instantiate the RBS Pipeline with an empty dictionary for the config parameter:
[31]:
rbsp = RBSPipeline(
config=[], # Empty config
final_decision=final_decision,
opt_func=fs.fit
)
We feed this pipeline into the RBS Optimiser as before, but this time provide an extra parameter - rule_types - which is just a dictionary showing which decision (0 or 1) should be linked to each set of rules:
[32]:
rbso = RBSOptimiser(
pipeline=rbsp,
n_iter=15,
rule_types={
0: X_rules_approve.columns.tolist(),
1: X_rules_reject.columns.tolist(),
},
verbose=1
)
Then we run the .fit_predict() method to optimise the pipeline using the given dataset, then apply it to the dataset:
[33]:
y_pred = rbso.fit_predict(
X_rules=X_rules,
y=y_train
)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 16.29trial/s, best loss: -0.9959016393442623]
Outputs¶
The .fit_predict() method optimises the pipeline and returns the prediction of the optimised pipeline by applying it to the given dataset.
Useful attributes created by running the .fit_predict() method are:
config (List[str]): The optimised pipeline configuration, where each element aligns to a stage in the pipeline. Each element is a dictionary, where the key is the decision made at that stage (either 0 or 1) and the value is a list of the rules that must trigger to give that decision.
pipeline_opt_metric (float): The result of the
opt_func
function when the pipeline is applied.
[34]:
rbso.config
[34]:
[{0: ['ApproveRule_0', 'ApproveRule_1']},
{1: ['RejectRule_4', 'RejectRule_8', 'RejectRule_13']},
{0: ['ApproveRule_3']},
{1: ['RejectRule_14', 'RejectRule_7', 'RejectRule_1']},
{0: ['ApproveRule_2']},
{1: ['RejectRule_6']}]
[35]:
rbso.pipeline_opt_metric
[35]:
0.9959016393442623
We can also use the .calc_performance() method to generate some performance metrics for the pipeline:
[36]:
rbso.calc_performance(
y_true=y_train,
y_pred=y_pred
)
[37]:
rbso.pipeline_perf
[37]:
Precision | Recall | PercDataFlagged | |
---|---|---|---|
1 | 0.991837 | 1.000000 | 0.027547 |
0 | 1.000000 | 0.999769 | 0.972453 |
[38]:
rbso.conf_matrix
[38]:
1 | 0 | |
---|---|---|
1 | 243.0 | 2.0 |
0 | 0.0 | 8649.0 |