iguanas.rule_selection.GreedyFilter

class iguanas.rule_selection.GreedyFilter(opt_func: Callable, rule_descriptions=None, sorting_col='Precision', verbose=0)[source]

Sorts rules by a given metric, calculates the combined performance of the top n rules, then filters to the rules which give the best combined performance.

Parameters
opt_funcCallable

The method/function used to calculate the performance of the top n rules (e.g. Fbeta score).

rule_descriptionsPandasDataFrameType, optional

The standard performance metrics dataframe associated with the rules (if available). If not given, it will be calculated from X_rules. Defaults to None.

sorting_colstr, optional

Specifies the column within rule_descriptions to sort the rules by. Defaults to ‘Precision’.

verboseint, optional

Controls the verbosity - the higher, the more messages. >0 : shows the progress of the filtering process. Defaults to 0.

Attributes
rules_to_keepList[str]

List of rules which give the best combined performance.

fit(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame, y=~ pandas.core.series.Series, sample_weight=None) None[source]

Sorts rules by a given metric, calculates the combined performance of the top n rules, then calculates the rules which give the best combined performance.

Parameters
X_rulesPandasDataFrameType

The binary columns of the rules applied to a dataset.

yPandasSeriesType

The binary target column.

sample_weightPandasSeriesType, optional

Row-wise weights to apply. Defaults to None.

transform(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame) iguanas.utils.typing.pandas.core.frame.DataFrame[source]

Reduces the rule set by keeping the rules which give the best combined performance.

Parameters
X_rulesPandasDataFrameType

The binary columns of the rules applied to a dataset.

Returns
PandasDataFrameType

The binary columns of the rules which give the best combined performance.

fit_transform(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) iguanas.utils.typing.pandas.core.frame.DataFrame[source]

Sorts rules by a given metric, calculates the combined performance of the top n rules, then keeps only the rules which give the best combined performance.

Parameters
X_rulesPandasDataFrameType

The binary columns of the rules applied to a dataset.

yPandasSeriesType

The binary target column.

sample_weightPandasSeriesType, optional

Row-wise weights to apply. Defaults to None.

Returns
PandasDataFrameType

The binary columns of the rules which give the best combined performance.

plot_top_n_performance_on_train(figsize=(10, 5), title='`opt_func` performance of the top n rules on the training set') seaborn.relational.lineplot[source]

Plot the combined performance of the top n rules (as calculated using the .fit() method).

Parameters
figsizeTuple[int, int], optional

Defines the size of the plot (length, height). Defaults to (10, 5).

verboseint, optional

Controls the verbosity - the higher, the more messages. >0 : shows the progress of calculating the combined performance of the top n rules. Defaults to 0.

titlestr, optional

The plot title. Defaults to ‘opt_func performance of the top n rules on the training set’

Returns
sns.lineplot

Shows the combined performance of the top n rules.

plot_top_n_performance(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None, figsize=(10, 5), verbose=0, title='`opt_func` performance of the top n rules') seaborn.relational.lineplot[source]

Plot the combined performance of the top n rules (as calculated using the .fit() method) using the provided rule binary columns.

Parameters
X_rulesPandasDataFrameType

The binary columns of the rules applied to a dataset.

yPandasSeriesType

The binary target column.

sample_weightPandasSeriesType, optional

Row-wise weights to apply. Defaults to None.

figsizeTuple[int, int], optional

Defines the size of the plot (length, height). Defaults to (10, 5).

verboseint, optional

Controls the verbosity - the higher, the more messages. >0 : shows the progress of calculating the combined performance of the top n rules. Defaults to 0.

titlestr, optional

The plot title. Defaults to ‘opt_func performance of the top n rules’

Returns
sns.lineplot

Shows the combined performance of the top n rules, calculated using the provided rule binary columns.