iguanas.rule_selection
.GreedyFilter¶
- class iguanas.rule_selection.GreedyFilter(opt_func: Callable, rule_descriptions=None, sorting_col='Precision', verbose=0)[source]¶
Sorts rules by a given metric, calculates the combined performance of the top n rules, then filters to the rules which give the best combined performance.
- Parameters
- opt_funcCallable
The method/function used to calculate the performance of the top n rules (e.g. Fbeta score).
- rule_descriptionsPandasDataFrameType, optional
The standard performance metrics dataframe associated with the rules (if available). If not given, it will be calculated from X_rules. Defaults to None.
- sorting_colstr, optional
Specifies the column within rule_descriptions to sort the rules by. Defaults to ‘Precision’.
- verboseint, optional
Controls the verbosity - the higher, the more messages. >0 : shows the progress of the filtering process. Defaults to 0.
- Attributes
- rules_to_keepList[str]
List of rules which give the best combined performance.
- fit(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame, y=~ pandas.core.series.Series, sample_weight=None) None [source]¶
Sorts rules by a given metric, calculates the combined performance of the top n rules, then calculates the rules which give the best combined performance.
- Parameters
- X_rulesPandasDataFrameType
The binary columns of the rules applied to a dataset.
- yPandasSeriesType
The binary target column.
- sample_weightPandasSeriesType, optional
Row-wise weights to apply. Defaults to None.
- transform(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame) iguanas.utils.typing.pandas.core.frame.DataFrame [source]¶
Reduces the rule set by keeping the rules which give the best combined performance.
- Parameters
- X_rulesPandasDataFrameType
The binary columns of the rules applied to a dataset.
- Returns
- PandasDataFrameType
The binary columns of the rules which give the best combined performance.
- fit_transform(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) iguanas.utils.typing.pandas.core.frame.DataFrame [source]¶
Sorts rules by a given metric, calculates the combined performance of the top n rules, then keeps only the rules which give the best combined performance.
- Parameters
- X_rulesPandasDataFrameType
The binary columns of the rules applied to a dataset.
- yPandasSeriesType
The binary target column.
- sample_weightPandasSeriesType, optional
Row-wise weights to apply. Defaults to None.
- Returns
- PandasDataFrameType
The binary columns of the rules which give the best combined performance.
- plot_top_n_performance_on_train(figsize=(10, 5), title='`opt_func` performance of the top n rules on the training set') seaborn.relational.lineplot [source]¶
Plot the combined performance of the top n rules (as calculated using the .fit() method).
- Parameters
- figsizeTuple[int, int], optional
Defines the size of the plot (length, height). Defaults to (10, 5).
- verboseint, optional
Controls the verbosity - the higher, the more messages. >0 : shows the progress of calculating the combined performance of the top n rules. Defaults to 0.
- titlestr, optional
The plot title. Defaults to ‘opt_func performance of the top n rules on the training set’
- Returns
- sns.lineplot
Shows the combined performance of the top n rules.
- plot_top_n_performance(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None, figsize=(10, 5), verbose=0, title='`opt_func` performance of the top n rules') seaborn.relational.lineplot [source]¶
Plot the combined performance of the top n rules (as calculated using the .fit() method) using the provided rule binary columns.
- Parameters
- X_rulesPandasDataFrameType
The binary columns of the rules applied to a dataset.
- yPandasSeriesType
The binary target column.
- sample_weightPandasSeriesType, optional
Row-wise weights to apply. Defaults to None.
- figsizeTuple[int, int], optional
Defines the size of the plot (length, height). Defaults to (10, 5).
- verboseint, optional
Controls the verbosity - the higher, the more messages. >0 : shows the progress of calculating the combined performance of the top n rules. Defaults to 0.
- titlestr, optional
The plot title. Defaults to ‘opt_func performance of the top n rules’
- Returns
- sns.lineplot
Shows the combined performance of the top n rules, calculated using the provided rule binary columns.