AgglomerativeClusteringReducer Example¶
This notebook contains an example of how the AgglomerativeClusteringReducer class can be used to remove correlated rules from a dataset. It can also be used to remove correlated features from a rule set.
Requirements¶
To run, you’ll need the following:
A dataset or rule set (in the case of a rule set, you need to provide the binary columns of the rules as applied to a dataset)
Import packages¶
[1]:
from iguanas.correlation_reduction import AgglomerativeClusteringReducer
from iguanas.metrics.pairwise import CosineSimilarity
import pandas as pd
import numpy as np
Read in data¶
Let’s read in some dummy data.
[2]:
X_train = pd.read_csv(
'dummy_data/X_train.csv',
index_col='eid'
)
X_test = pd.read_csv(
'dummy_data/X_test.csv',
index_col='eid'
)
Transform the dataset (or another dataset)¶
Use the .transform() method to reduce the original dataset (or a separate dataset) by removing the correlated columns.
[7]:
X_train_reduced = agg.transform(X_train)
[8]:
X_train.shape, X_train_reduced.shape
[8]:
((8894, 32), (8894, 9))
[9]:
X_test_reduced = agg.transform(X_test)
[10]:
X_test.shape, X_test_reduced.shape
[10]:
((4382, 34), (4382, 9))
Outputs¶
The .transform() method returns the original dataset with the correlated columns removed.
[11]:
X_train_reduced.head()
[11]:
account_number_num_fraud_transactions_per_account_number_7day | account_number_num_order_items_per_account_number_lifetime | account_number_avg_order_total_per_account_number_30day | account_number_num_distinct_transaction_per_account_number_7day | is_existing_user_0 | status_Pending | is_billing_shipping_city_same_0 | num_order_items_IsNull | order_total_IsNull | |
---|---|---|---|---|---|---|---|---|---|
eid | |||||||||
867-8837095-9305559 | 0 | 0 | 0.0 | 1 | 0 | 0 | 0 | 0 | 1 |
974-5306287-3527394 | 0 | 0 | 0.0 | 1 | 0 | 0 | 0 | 0 | 1 |
584-0112844-9158928 | 0 | 0 | 0.0 | 1 | 0 | 0 | 0 | 0 | 1 |
956-4190732-7014837 | 0 | 0 | 0.0 | 1 | 0 | 0 | 0 | 0 | 1 |
349-7005645-8862067 | 0 | 0 | 0.0 | 1 | 0 | 0 | 0 | 0 | 1 |
[12]:
X_test_reduced.head()
[12]:
account_number_num_fraud_transactions_per_account_number_7day | account_number_num_order_items_per_account_number_lifetime | account_number_avg_order_total_per_account_number_30day | account_number_num_distinct_transaction_per_account_number_7day | is_existing_user_0 | status_Pending | is_billing_shipping_city_same_0 | num_order_items_IsNull | order_total_IsNull | |
---|---|---|---|---|---|---|---|---|---|
eid | |||||||||
975-8351797-7122581 | 0 | 2 | 29.00 | 1 | 1 | 0 | 0 | 0 | 0 |
785-6259585-7858053 | 0 | 0 | 0.00 | 1 | 0 | 0 | 0 | 0 | 1 |
057-4039373-1790681 | 0 | 2 | 192.95 | 1 | 0 | 0 | 0 | 0 | 0 |
095-5263240-3834186 | 0 | 0 | 0.00 | 1 | 0 | 0 | 0 | 0 | 1 |
980-3802574-0009480 | 0 | 2 | 9.00 | 1 | 0 | 0 | 0 | 0 | 0 |