iguanas.rule_selection.GridSearchCV

class iguanas.rule_selection.GridSearchCV(rule_class: Union[iguanas.rule_generation.rule_generator_dt.RuleGeneratorDT, iguanas.rule_generation.rule_generator_opt.RuleGeneratorOpt, iguanas.rule_optimisation.bayesian_optimiser.BayesianOptimiser, iguanas.rule_optimisation.direct_search_optimiser.DirectSearchOptimiser], param_grid: Dict[str, List], greedy_filter_opt_func: Callable, cv: int, refilter=True, num_cores=1, verbose=0)[source]

Searches across the provided parameter space to find the parameter set that produces the best overall rule performance. The overall rule performance is calculated using the GreedyFilter class - this sorts the rules by the precision then calculates the combined performance of the top n rules. The maximum combined performance is recorded for that parameter set.

This process is repeated for each stratified fold. The mean performance across the folds for each parameter set is recorded, and the parameter set that gives the highest mean performance is assumed to be the best. The rules are then retrained using these parameters and the complete dataset.

Parameters
rule_classUnion[RuleGeneratorDT, RuleGeneratorOpt, BayesianOptimiser, DirectSearchOptimiser]

The rule generator or optimiser class that will be used to generate or optimise rules.

param_gridDict[str, List]

A list of parameter values (values) for each parameter (keys) in the provided rule_class.

greedy_filter_opt_funcCallable

The method/function (e.g. Fbeta score) used to calculate the performance of the top n rules when the GreedyFilter class is applied to the rule set.

cvint

The number of stratified folds to create from the dataset.

refilterbool, optional

When refitting the rules using the best parameters and the complete dataset, this parameter dictates whether the GreedyFilter class should be applied to the rules post-fitting.

num_coresint, optional

The number of cores to use when iterating through the different folds & parameter sets. Defaults to 1.

verboseint, optional

Controls the verbosity - the higher, the more messages. >0 : shows the overall progress of each fold; >1 : gives information on, and the progress of, the current parameter set being tested. Note that setting verbose > 1 only works when num_cores = 1. Defaults to 0.

Attributes
rule_stringsDict[str, str]

The rules which achieved the best combined performance, defined using the standard Iguanas string format (values) and their names (keys).

rule_descriptionsPandasDataFrameType

A dataframe showing the logic of the rules and their performance metrics on the given dataset.

param_results_per_foldPandasDataFrameType

Shows the best combined rule performance observed for each parameter set and fold.

param_results_aggregatedPandasDataFrameType

Shows the mean and the standard deviation of the best combined rule performance, calculated across the folds, for each parameter set.

best_perffloat

The best combined rule performance achieved.

best_paramsdict

The parameter set that achieved the best combined rule performance.

fit(X: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) None[source]

Searches across the provided parameter space to find the parameter set that produces the best overall rule performance. The overall rule performance is calculated using the GreedyFilter class - this sorts the rules by the precision then calculates the combined performance of the top n rules. The maximum combined performance is recorded for that parameter set.

This process is repeated for each stratified fold. The mean performance across the folds for each parameter set is recorded, and the parameter set that gives the highest mean performance is assumed to be the best. The rules are then retrained using these parameters and the complete dataset.

Parameters
XPandasDataFrameType

The feature set.

yPandasSeriesType

The binary target column.

sample_weightPandasSeriesType, optional

Row-wise weights to apply. Defaults to None.

plot_top_n_performance_by_fold(figsize=(10, 5)) seaborn.relational.lineplot[source]

Plot the combined performance of the top n rules (as calculated using the .fit() method) for each parameter set and fold that was used for fitting the rules.

Parameters
figsizeTuple[int, int], optional

Defines the size of the plot (length, height). Defaults to (10, 5).

Returns
sns.lineplot

The combined performance of the top n rules for each parameter set and fold.

as_rule_dicts() Dict[str, dict]

Converts rules into the standard Iguanas dictionary format.

Returns
Dict[str, dict]

Rules in the standard Iguanas dictionary format.

as_rule_lambdas(as_numpy: bool, with_kwargs: bool) Dict[str, Callable[[dict], str]]

Converts rules into the standard Iguanas lambda expression format.

Parameters
as_numpybool

If True, the conditions in the string format will uses Numpy rather than Pandas. These rules are generally evaluated more quickly on larger dataset stored as Pandas DataFrames.

with_kwargsbool

If True, the string in the lambda expression is created such that the inputs are keyword arguments. If False, the inputs are positional arguments.

Returns
Dict[str, Callable[[dict], str]]

Rules in the standard Iguanas lambda expression format.

as_rule_strings(as_numpy: bool) Dict[str, str]

Converts rules into the standard Iguanas string format.

Parameters
as_numpybool

If True, the conditions in the string format will uses Numpy rather than Pandas. These rules are generally evaluated more quickly on larger dataset stored as Pandas DataFrames.

Returns
Dict[str, str]

Rules in the standard Iguanas string format.

filter_rules(include=None, exclude=None) None

Filters the rules by their names.

Parameters
includeList[str], optional

The list of rule names to keep. Defaults to None.

excludeList[str], optional

The list of rule names to drop. Defaults to None.

Raises
Exception

include and exclude cannot contain similar values.

get_rule_features() Dict[str, set]

Returns the set of unique features present in each rule.

Returns
Dict[str, set]

Set of unique features (values) in each rule (keys).

transform(X: Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame], y=None, sample_weight=None) Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame]

Applies the set of rules to a dataset, X. If y is provided, the performance metrics for each rule will also be calculated.

Parameters
XUnion[PandasDataFrameType, KoalasDataFrameType]

The feature set on which the rules should be applied.

yUnion[PandasSeriesType, KoalasSeriesType], optional

The target column. Defaults to None.

sample_weightUnion[PandasSeriesType, KoalasSeriesType], optional

Record-wise weights to apply. Defaults to None.

Returns
Union[PandasDataFrameType, KoalasDataFrameType]

The binary columns of the rules.