Rule Scorer Example

This notebook contains an example of how the Rule Scorer can be used to generate scores for a set of rules based on a labelled dataset.

Requirements

To run, you’ll need the following:

  • A rule set (specifically the binary columns of the rules as applied to a dataset).

  • The binary target column associated with the above dataset.


Import packages

[1]:
from iguanas.rule_scoring import RuleScorer, PerformanceScorer, ConstantScaler
from iguanas.metrics.classification import Precision

import pandas as pd

Read in data

Let’s read in some dummy rules (stored as binary columns) and the target column.

[2]:
X_rules_train = pd.read_csv(
    'dummy_data/X_rules_train.csv',
    index_col='eid'
)
y_train = pd.read_csv(
    'dummy_data/y_train.csv',
    index_col='eid'
).squeeze()
X_rules_test = pd.read_csv(
    'dummy_data/X_rules_test.csv',
    index_col='eid'
)
y_test = pd.read_csv(
    'dummy_data//y_test.csv',
    index_col='eid'
).squeeze()

Generate scores

Set up class parameters

Now we can set our class parameters for the Rule Scorer. Here we pass an instantiated scoring class (which generates the raw scores) and an instantiated scaling class (which scales the scores to be more readable - this is optional). The scoring classes are located in the rule_scoring_methods module; the scaling classes are located in the rule_score_scalers module. See the class docstrings for more information on each type of scoring/scaling class.

In this example, we’ll use the PerformanceScorer class for scoring the rules (based on the precision score) and the ConstantScaler class for scaling. Note that we’re using the Precision class from the metrics.classification module rather than Sklearn’s precision_score function, as the former is ~100 times faster on larger datasets.

Please see the class docstring for more information on each parameter.

[3]:
precision_score = Precision()
[4]:
params = {
    'scoring_class': PerformanceScorer(performance_func=precision_score.fit),
    'scaling_class': ConstantScaler(limit=-100)
}

Instantiate class and run fit method

Once the parameters have been set, we can run the .fit() method to generate scores.

[5]:
rs = RuleScorer(**params)
rs.fit(
    X_rules=X_rules_train,
    y=y_train
)

Outputs

The .fit() method does not return anything. However it does create the following attribute:

  • rule_scores: The generated scores for each rule.

[6]:
rs.rule_scores.head()
[6]:
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1                                                                      -99
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_fraud_transactions_per_account_number_30day>=1    -99
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&order_total>50.87                                                   -100
fraud-account_number_num_fraud_transactions_per_account_number_7day>=1&order_total>50.87                                                   -100
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&order_total>60.87                                                   -100
dtype: int64

Apply rules to a separate dataset

Use the .transform() method to apply the generated rules to another dataset.

[7]:
X_scores_test = rs.transform(X_rules=X_rules_test)

Outputs

The .transform() method returns a dataframe giving the scores of the rules as applied to the dataset.

[8]:
X_scores_test.head()
[8]:
fraud-account_number_avg_order_total_per_account_number_1day<=152.2125&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_order_items_per_account_number_30day>=4&is_existing_user_1==True fraud-account_number_avg_order_total_per_account_number_1day<=22.61&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_fraud_transactions_per_account_number_90day>=1&order_total<=50.87 fraud-account_number_avg_order_total_per_account_number_1day<=23.61&account_number_avg_order_total_per_account_number_7day<=52.5425&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_fraud_transactions_per_account_number_lifetime>=1 fraud-account_number_avg_order_total_per_account_number_1day<=309.71251&account_number_avg_order_total_per_account_number_90day<=319.125&account_number_num_fraud_transactions_per_account_number_1day>=1 fraud-account_number_avg_order_total_per_account_number_1day<=617.69&account_number_num_distinct_transaction_per_account_number_1day<=2&account_number_num_fraud_transactions_per_account_number_1day>=1 fraud-account_number_avg_order_total_per_account_number_1day<=617.69&account_number_num_distinct_transaction_per_account_number_1day>=3&account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_sum_order_total_per_account_number_90day>980.63998 fraud-account_number_avg_order_total_per_account_number_1day<=617.69&account_number_num_fraud_transactions_per_account_number_1day>=1 fraud-account_number_avg_order_total_per_account_number_1day<=617.69&account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_order_items_per_account_number_30day<=8&account_number_num_order_items_per_account_number_7day<=5 fraud-account_number_avg_order_total_per_account_number_1day<=617.69&account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_order_items_per_account_number_30day<=8&account_number_num_order_items_per_account_number_7day>=6 fraud-account_number_avg_order_total_per_account_number_1day<=617.69&account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_order_items_per_account_number_30day>=9&account_number_num_order_items_per_account_number_90day>=11 ... fraud-account_number_num_fraud_transactions_per_account_number_lifetime>=1&order_total>916.82501 fraud-account_number_num_order_items_per_account_number_1day<=5&account_number_num_order_items_per_account_number_90day<=4&account_number_sum_order_total_per_account_number_30day>916.82501&order_total<=1063.53003 fraud-account_number_num_order_items_per_account_number_1day>=5&account_number_sum_order_total_per_account_number_30day>916.82501 fraud-account_number_sum_order_total_per_account_number_1day>1407.375&account_number_sum_order_total_per_account_number_7day>622.595&num_order_items>=3 fraud-account_number_sum_order_total_per_account_number_1day>1924.40997&account_number_sum_order_total_per_account_number_7day>916.82501&is_billing_shipping_city_same_0==False fraud-account_number_sum_order_total_per_account_number_1day>916.82501&is_billing_shipping_city_same_1==False fraud-account_number_sum_order_total_per_account_number_1day>971.565&account_number_sum_order_total_per_account_number_7day>916.82501&is_billing_shipping_city_same_0==True fraud-account_number_sum_order_total_per_account_number_30day>428.17&account_number_sum_order_total_per_account_number_90day<=916.82501&is_billing_shipping_city_same_0==True&num_order_items<=2 fraud-account_number_sum_order_total_per_account_number_30day>916.82501&is_existing_user_1==False fraud-account_number_sum_order_total_per_account_number_7day<=930.10001&account_number_sum_order_total_per_account_number_90day>512.45001&is_billing_shipping_city_same_1==True&is_existing_user_1==False
eid
975-8351797-7122581 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
785-6259585-7858053 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
057-4039373-1790681 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
095-5263240-3834186 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
980-3802574-0009480 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 299 columns


Generate rule score and apply them to the training set (in one step)

You can also use the fit_transform() method to generate scores and apply them to the training set.

[9]:
X_scores_train = rs.fit_transform(
    X_rules=X_rules_train,
    y=y_train
)

Outputs

The .transform() method returns a dataframe giving the scores of the rules as applied to the dataset. It also creates the following attribute:

  • rule_scores: The generated scores for each rule.

[10]:
rs.rule_scores.head()
[10]:
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1                                                                      -99
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_fraud_transactions_per_account_number_30day>=1    -99
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&order_total>50.87                                                   -100
fraud-account_number_num_fraud_transactions_per_account_number_7day>=1&order_total>50.87                                                   -100
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&order_total>60.87                                                   -100
dtype: int64
[11]:
X_scores_train.head()
[11]:
fraud-account_number_num_fraud_transactions_per_account_number_1day>=1 fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_fraud_transactions_per_account_number_30day>=1 fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&order_total>50.87 fraud-account_number_num_fraud_transactions_per_account_number_7day>=1&order_total>50.87 fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&order_total>60.87 fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_order_items_per_account_number_90day<=8 fraud-account_number_num_fraud_transactions_per_account_number_7day>=1&account_number_num_fraud_transactions_per_account_number_90day>=1&account_number_num_order_items_per_account_number_30day<=6 fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_fraud_transactions_per_account_number_90day>=1&account_number_num_order_items_per_account_number_lifetime<=7 fraud-account_number_num_fraud_transactions_per_account_number_7day>=1&account_number_num_order_items_per_account_number_lifetime<=7 fraud-account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_fraud_transactions_per_account_number_7day>=1&account_number_num_order_items_per_account_number_7day<=5 ... fraud-account_number_num_distinct_transaction_per_account_number_30day>=3&account_number_num_fraud_transactions_per_account_number_7day>=1&account_number_sum_order_total_per_account_number_30day>1020.59499&num_order_items>=3 fraud-account_number_num_distinct_transaction_per_account_number_90day>=2&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_order_items_per_account_number_90day<=1&is_billing_shipping_city_same_1==False fraud-account_number_avg_order_total_per_account_number_1day<=152.2125&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_order_items_per_account_number_30day>=4&is_existing_user_1==True fraud-account_number_num_distinct_transaction_per_account_number_7day<=3&account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_order_items_per_account_number_lifetime>=8 fraud-account_number_avg_order_total_per_account_number_90day<=324.37083&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_num_order_items_per_account_number_1day<=2&account_number_sum_order_total_per_account_number_30day>916.82501 fraud-account_number_num_fraud_transactions_per_account_number_1day<=1&account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_order_items_per_account_number_30day>=6&is_billing_shipping_city_same_1==False fraud-account_number_num_fraud_transactions_per_account_number_1day>=1&account_number_num_order_items_per_account_number_lifetime>=6&account_number_sum_order_total_per_account_number_90day<=1032.375&is_billing_shipping_city_same_1==False fraud-account_number_avg_order_total_per_account_number_30day<=335.21333&account_number_num_order_items_per_account_number_1day>=3&account_number_sum_order_total_per_account_number_90day<=1123.36505&account_number_sum_order_total_per_account_number_90day>916.82501 fraud-account_number_avg_order_total_per_account_number_7day<=319.125&account_number_num_distinct_transaction_per_account_number_1day>=3&account_number_num_fraud_transactions_per_account_number_30day>=1&account_number_sum_order_total_per_account_number_30day>622.595 fraud-account_number_avg_order_total_per_account_number_7day>916.82501&account_number_num_distinct_transaction_per_account_number_90day>=2&account_number_sum_order_total_per_account_number_90day>2588.9801
eid
867-8837095-9305559 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
974-5306287-3527394 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
584-0112844-9158928 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
956-4190732-7014837 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
349-7005645-8862067 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 299 columns