iguanas.rule_selection
.GridSearchCV¶
- class iguanas.rule_selection.GridSearchCV(rule_class: Union[iguanas.rule_generation.rule_generator_dt.RuleGeneratorDT, iguanas.rule_generation.rule_generator_opt.RuleGeneratorOpt, iguanas.rule_optimisation.bayesian_optimiser.BayesianOptimiser, iguanas.rule_optimisation.direct_search_optimiser.DirectSearchOptimiser], param_grid: Dict[str, List], greedy_filter_opt_func: Callable, cv: int, refilter=True, num_cores=1, verbose=0)[source]¶
Searches across the provided parameter space to find the parameter set that produces the best overall rule performance. The overall rule performance is calculated using the GreedyFilter class - this sorts the rules by the precision then calculates the combined performance of the top n rules. The maximum combined performance is recorded for that parameter set.
This process is repeated for each stratified fold. The mean performance across the folds for each parameter set is recorded, and the parameter set that gives the highest mean performance is assumed to be the best. The rules are then retrained using these parameters and the complete dataset.
- Parameters
- rule_classUnion[RuleGeneratorDT, RuleGeneratorOpt, BayesianOptimiser, DirectSearchOptimiser]
The rule generator or optimiser class that will be used to generate or optimise rules.
- param_gridDict[str, List]
A list of parameter values (values) for each parameter (keys) in the provided rule_class.
- greedy_filter_opt_funcCallable
The method/function (e.g. Fbeta score) used to calculate the performance of the top n rules when the GreedyFilter class is applied to the rule set.
- cvint
The number of stratified folds to create from the dataset.
- refilterbool, optional
When refitting the rules using the best parameters and the complete dataset, this parameter dictates whether the GreedyFilter class should be applied to the rules post-fitting.
- num_coresint, optional
The number of cores to use when iterating through the different folds & parameter sets. Defaults to 1.
- verboseint, optional
Controls the verbosity - the higher, the more messages. >0 : shows the overall progress of each fold; >1 : gives information on, and the progress of, the current parameter set being tested. Note that setting verbose > 1 only works when num_cores = 1. Defaults to 0.
- Attributes
- rule_stringsDict[str, str]
The rules which achieved the best combined performance, defined using the standard Iguanas string format (values) and their names (keys).
- rule_descriptionsPandasDataFrameType
A dataframe showing the logic of the rules and their performance metrics on the given dataset.
- param_results_per_foldPandasDataFrameType
Shows the best combined rule performance observed for each parameter set and fold.
- param_results_aggregatedPandasDataFrameType
Shows the mean and the standard deviation of the best combined rule performance, calculated across the folds, for each parameter set.
- best_perffloat
The best combined rule performance achieved.
- best_paramsdict
The parameter set that achieved the best combined rule performance.
- fit(X: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) None [source]¶
Searches across the provided parameter space to find the parameter set that produces the best overall rule performance. The overall rule performance is calculated using the GreedyFilter class - this sorts the rules by the precision then calculates the combined performance of the top n rules. The maximum combined performance is recorded for that parameter set.
This process is repeated for each stratified fold. The mean performance across the folds for each parameter set is recorded, and the parameter set that gives the highest mean performance is assumed to be the best. The rules are then retrained using these parameters and the complete dataset.
- Parameters
- XPandasDataFrameType
The feature set.
- yPandasSeriesType
The binary target column.
- sample_weightPandasSeriesType, optional
Row-wise weights to apply. Defaults to None.
- plot_top_n_performance_by_fold(figsize=(10, 5)) seaborn.relational.lineplot [source]¶
Plot the combined performance of the top n rules (as calculated using the .fit() method) for each parameter set and fold that was used for fitting the rules.
- Parameters
- figsizeTuple[int, int], optional
Defines the size of the plot (length, height). Defaults to (10, 5).
- Returns
- sns.lineplot
The combined performance of the top n rules for each parameter set and fold.
- as_rule_dicts() Dict[str, dict] ¶
Converts rules into the standard Iguanas dictionary format.
- Returns
- Dict[str, dict]
Rules in the standard Iguanas dictionary format.
- as_rule_lambdas(as_numpy: bool, with_kwargs: bool) Dict[str, Callable[[dict], str]] ¶
Converts rules into the standard Iguanas lambda expression format.
- Parameters
- as_numpybool
If True, the conditions in the string format will uses Numpy rather than Pandas. These rules are generally evaluated more quickly on larger dataset stored as Pandas DataFrames.
- with_kwargsbool
If True, the string in the lambda expression is created such that the inputs are keyword arguments. If False, the inputs are positional arguments.
- Returns
- Dict[str, Callable[[dict], str]]
Rules in the standard Iguanas lambda expression format.
- as_rule_strings(as_numpy: bool) Dict[str, str] ¶
Converts rules into the standard Iguanas string format.
- Parameters
- as_numpybool
If True, the conditions in the string format will uses Numpy rather than Pandas. These rules are generally evaluated more quickly on larger dataset stored as Pandas DataFrames.
- Returns
- Dict[str, str]
Rules in the standard Iguanas string format.
- filter_rules(include=None, exclude=None) None ¶
Filters the rules by their names.
- Parameters
- includeList[str], optional
The list of rule names to keep. Defaults to None.
- excludeList[str], optional
The list of rule names to drop. Defaults to None.
- Raises
- Exception
include and exclude cannot contain similar values.
- get_rule_features() Dict[str, set] ¶
Returns the set of unique features present in each rule.
- Returns
- Dict[str, set]
Set of unique features (values) in each rule (keys).
- transform(X: Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame], y=None, sample_weight=None) Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame] ¶
Applies the set of rules to a dataset, X. If y is provided, the performance metrics for each rule will also be calculated.
- Parameters
- XUnion[PandasDataFrameType, KoalasDataFrameType]
The feature set on which the rules should be applied.
- yUnion[PandasSeriesType, KoalasSeriesType], optional
The target column. Defaults to None.
- sample_weightUnion[PandasSeriesType, KoalasSeriesType], optional
Record-wise weights to apply. Defaults to None.
- Returns
- Union[PandasDataFrameType, KoalasDataFrameType]
The binary columns of the rules.