Rule Scorer Example¶
This notebook contains an example of how the Rule Scorer can be used to generate scores for a set of rules based on a labelled dataset.
Requirements¶
To run, you’ll need the following:
A rule set (specifically the binary columns of the rules as applied to a dataset).
The binary target column associated with the above dataset.
Import packages¶
[1]:
from iguanas.rule_scoring import RuleScorer, PerformanceScorer, ConstantScaler
from iguanas.metrics.classification import Precision
import pandas as pd
Read in data¶
Let’s read in some dummy rules (stored as binary columns) and the target column.
[2]:
X_rules_train = pd.read_csv(
'dummy_data/X_rules_train.csv',
index_col='eid'
)
y_train = pd.read_csv(
'dummy_data/y_train.csv',
index_col='eid'
).squeeze()
X_rules_test = pd.read_csv(
'dummy_data/X_rules_test.csv',
index_col='eid'
)
y_test = pd.read_csv(
'dummy_data//y_test.csv',
index_col='eid'
).squeeze()
Generate scores¶
Set up class parameters¶
Now we can set our class parameters for the Rule Scorer. Here we pass an instantiated scoring class (which generates the raw scores) and an instantiated scaling class (which scales the scores to be more readable - this is optional). The scoring classes are located in the rule_scoring_methods module; the scaling classes are located in the rule_score_scalers module. See the class docstrings for more information on each type of scoring/scaling class.
In this example, we’ll use the PerformanceScorer class for scoring the rules (based on the precision score) and the ConstantScaler class for scaling. Note that we’re using the Precision class from the metrics.classification module rather than Sklearn’s precision_score function, as the former is ~100 times faster on larger datasets.
Please see the class docstring for more information on each parameter.
[3]:
precision_score = Precision()
[4]:
params = {
'scoring_class': PerformanceScorer(performance_func=precision_score.fit),
'scaling_class': ConstantScaler(limit=-100)
}
Instantiate class and run fit method¶
Once the parameters have been set, we can run the .fit() method to generate scores.
[5]:
rs = RuleScorer(**params)
rs.fit(
X_rules=X_rules_train,
y=y_train
)
Outputs¶
The .fit() method does not return anything. However it does create the following attribute:
rule_scores: The generated scores for each rule.
[6]:
rs.rule_scores.head()
[6]:
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1 -99
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_fraud_transactions_per_account_number_30day>=1 -99
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&order_total>50.87 -100
fraud-account_number_num_fraud_transactions_per_account_number_7day>=1&order_total>50.87 -100
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&order_total>60.87 -100
dtype: int64
Apply rules to a separate dataset¶
Use the .transform() method to apply the generated rules to another dataset.
[7]:
X_scores_test = rs.transform(X_rules=X_rules_test)
Outputs¶
The .transform() method returns a dataframe giving the scores of the rules as applied to the dataset.
[8]:
X_scores_test.head()
[8]:
fraud-account_number_avg_order_total_per_account_number_1day<=152.2125&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_order_items_per_account_number_30day>=4&is_existing_user_1==True | fraud-account_number_avg_order_total_per_account_number_1day<=22.61&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_fraud_transactions_per_account_number_90day>=1&order_total<=50.87 | fraud-account_number_avg_order_total_per_account_number_1day<=23.61&account_number_avg_order_total_per_account_number_7day<=52.5425&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_fraud_transactions_per_account_number_lifetime>=1 | fraud-account_number_avg_order_total_per_account_number_1day<=309.71251&account_number_avg_order_total_per_account_number_90day<=319.125&account_number_num_fraud_transactions_per_account_number_1day>=1 | fraud-account_number_avg_order_total_per_account_number_1day<=617.69&account_number_num_distinct_transaction_per_account_number_1day<=2&account_number_num_fraud_transactions_per_account_number_1day>=1 | fraud-account_number_avg_order_total_per_account_number_1day<=617.69&account_number_num_distinct_transaction_per_account_number_1day>=3&account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_sum_order_total_per_account_number_90day>980.63998 | fraud-account_number_avg_order_total_per_account_number_1day<=617.69&account_number_num_fraud_transactions_per_account_number_1day>=1 | fraud-account_number_avg_order_total_per_account_number_1day<=617.69&account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_order_items_per_account_number_30day<=8&account_number_num_order_items_per_account_number_7day<=5 | fraud-account_number_avg_order_total_per_account_number_1day<=617.69&account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_order_items_per_account_number_30day<=8&account_number_num_order_items_per_account_number_7day>=6 | fraud-account_number_avg_order_total_per_account_number_1day<=617.69&account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_order_items_per_account_number_30day>=9&account_number_num_order_items_per_account_number_90day>=11 | ... | fraud-account_number_num_fraud_transactions_per_account_number_lifetime>=1&order_total>916.82501 | fraud-account_number_num_order_items_per_account_number_1day<=5&account_number_num_order_items_per_account_number_90day<=4&account_number_sum_order_total_per_account_number_30day>916.82501&order_total<=1063.53003 | fraud-account_number_num_order_items_per_account_number_1day>=5&account_number_sum_order_total_per_account_number_30day>916.82501 | fraud-account_number_sum_order_total_per_account_number_1day>1407.375&account_number_sum_order_total_per_account_number_7day>622.595&num_order_items>=3 | fraud-account_number_sum_order_total_per_account_number_1day>1924.40997&account_number_sum_order_total_per_account_number_7day>916.82501&is_billing_shipping_city_same_0==False | fraud-account_number_sum_order_total_per_account_number_1day>916.82501&is_billing_shipping_city_same_1==False | fraud-account_number_sum_order_total_per_account_number_1day>971.565&account_number_sum_order_total_per_account_number_7day>916.82501&is_billing_shipping_city_same_0==True | fraud-account_number_sum_order_total_per_account_number_30day>428.17&account_number_sum_order_total_per_account_number_90day<=916.82501&is_billing_shipping_city_same_0==True&num_order_items<=2 | fraud-account_number_sum_order_total_per_account_number_30day>916.82501&is_existing_user_1==False | fraud-account_number_sum_order_total_per_account_number_7day<=930.10001&account_number_sum_order_total_per_account_number_90day>512.45001&is_billing_shipping_city_same_1==True&is_existing_user_1==False | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
eid | |||||||||||||||||||||
975-8351797-7122581 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
785-6259585-7858053 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
057-4039373-1790681 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
095-5263240-3834186 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
980-3802574-0009480 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 299 columns
Generate rule score and apply them to the training set (in one step)¶
You can also use the fit_transform() method to generate scores and apply them to the training set.
[9]:
X_scores_train = rs.fit_transform(
X_rules=X_rules_train,
y=y_train
)
Outputs¶
The .transform() method returns a dataframe giving the scores of the rules as applied to the dataset. It also creates the following attribute:
rule_scores: The generated scores for each rule.
[10]:
rs.rule_scores.head()
[10]:
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1 -99
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_fraud_transactions_per_account_number_30day>=1 -99
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&order_total>50.87 -100
fraud-account_number_num_fraud_transactions_per_account_number_7day>=1&order_total>50.87 -100
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&order_total>60.87 -100
dtype: int64
[11]:
X_scores_train.head()
[11]:
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1 | fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_fraud_transactions_per_account_number_30day>=1 | fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&order_total>50.87 | fraud-account_number_num_fraud_transactions_per_account_number_7day>=1&order_total>50.87 | fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&order_total>60.87 | fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_order_items_per_account_number_90day<=8 | fraud-account_number_num_fraud_transactions_per_account_number_7day>=1&account_number_num_fraud_transactions_per_account_number_90day>=1&account_number_num_order_items_per_account_number_30day<=6 | fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_fraud_transactions_per_account_number_90day>=1&account_number_num_order_items_per_account_number_lifetime<=7 | fraud-account_number_num_fraud_transactions_per_account_number_7day>=1&account_number_num_order_items_per_account_number_lifetime<=7 | fraud-account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_fraud_transactions_per_account_number_7day>=1&account_number_num_order_items_per_account_number_7day<=5 | ... | fraud-account_number_num_distinct_transaction_per_account_number_30day>=3&account_number_num_fraud_transactions_per_account_number_7day>=1&account_number_sum_order_total_per_account_number_30day>1020.59499&num_order_items>=3 | fraud-account_number_num_distinct_transaction_per_account_number_90day>=2&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_order_items_per_account_number_90day<=1&is_billing_shipping_city_same_1==False | fraud-account_number_avg_order_total_per_account_number_1day<=152.2125&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_order_items_per_account_number_30day>=4&is_existing_user_1==True | fraud-account_number_num_distinct_transaction_per_account_number_7day<=3&account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_order_items_per_account_number_lifetime>=8 | fraud-account_number_avg_order_total_per_account_number_90day<=324.37083&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_order_items_per_account_number_1day<=2&account_number_sum_order_total_per_account_number_30day>916.82501 | fraud-account_number_num_fraud_transactions_per_account_number_1day<=1&account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_order_items_per_account_number_30day>=6&is_billing_shipping_city_same_1==False | fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_order_items_per_account_number_lifetime>=6&account_number_sum_order_total_per_account_number_90day<=1032.375&is_billing_shipping_city_same_1==False | fraud-account_number_avg_order_total_per_account_number_30day<=335.21333&account_number_num_order_items_per_account_number_1day>=3&account_number_sum_order_total_per_account_number_90day<=1123.36505&account_number_sum_order_total_per_account_number_90day>916.82501 | fraud-account_number_avg_order_total_per_account_number_7day<=319.125&account_number_num_distinct_transaction_per_account_number_1day>=3&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_sum_order_total_per_account_number_30day>622.595 | fraud-account_number_avg_order_total_per_account_number_7day>916.82501&account_number_num_distinct_transaction_per_account_number_90day>=2&account_number_sum_order_total_per_account_number_90day>2588.9801 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
eid | |||||||||||||||||||||
867-8837095-9305559 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
974-5306287-3527394 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
584-0112844-9158928 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
956-4190732-7014837 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
349-7005645-8862067 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 299 columns