iguanas.rule_optimisation.BayesianOptimiser

class iguanas.rule_optimisation.BayesianOptimiser(rule_lambdas: Dict[str, Callable], lambda_kwargs: Dict[str, Dict[str, float]], opt_func: Callable, n_iter: int, algorithm=<function suggest>, num_cores=1, verbose=0, **kwargs)[source]

Optimises a set of rules (given in the standard Iguanas lambda expression format) using Bayesian Optimisation.

Parameters
rule_lambdasDict[str, Callable[[Dict], str]]

Set of rules defined using the standard Iguanas lambda expression format (values) and their names (keys).

lambda_kwargsDict[str, Dict[str, float]]

For each rule (keys), a dictionary containing the features used in the rule (keys) and the current values (values).

opt_funcCallable

The optimisation function used to calculate the metric which the rules are optimised for (e.g. F1 score).

n_iterint

The number of iterations that the optimiser should perform.

algorithmCallable, optional

The algorithm leveraged by hyperopt’s fmin function, which optimises the rules. Defaults to tpe.suggest, which corresponds to Tree-of-Parzen-Estimator.

num_coresint, optional

The number of cores to use when optimising the rule thresholds. Defaults to 1.

verboseint, optional

Controls the verbosity - the higher, the more messages. >0 : shows the overall progress of the optimisation process; >1 : shows the progress of the optimisation of each rule, as well as the overall optimisation process. Note that setting verbose > 1 only works when num_cores = 1. Defaults to 0.

**argstuple , optional

Any additional keyword arguments to pass to hyperopt’s fmin function.

Attributes
rule_stringsDict[str, str]

The optimised rules stored in the standard Iguanas string format (values) and their names (keys).

rule_descriptionsPandasDataFrameType

A dataframe showing the logic of the rules and their performance metrics on the given dataset.

rule_names_missing_featuresList[str]

Names of rules which use features that are not present in the dataset (and therefore can’t be optimised or applied).

rule_names_no_opt_conditionsList[str]

Names of rules which have no optimisable conditions (e.g. rules that only contain string-based conditions).

rule_names_zero_var_featuresList[str]

Names of rules which exclusively contain zero variance features (based on X), so cannot be optimised.

opt_rule_performancesDict[str, float]

The optimisation metric (values) calculated for each optimised rule (keys).

orig_rule_performancesDict[str, float]

The optimisation metric (values) calculated for each original rule (keys).

non_optimisable_rulesRules

A Rules object containing the rules which could not be optimised.

fit(X: iguanas.utils.typing.pandas.core.frame.DataFrame, y=None, sample_weight=None) iguanas.utils.typing.pandas.core.frame.DataFrame[source]

Optimises a set of rules (given in the standard Iguanas lambda expression format) using Bayesian Optimisation.

Parameters
XPandasDataFrameType

The feature set.

yPandasSeriesType

The binary target column. Not required if optimising rules on unlabelled data. Defaults to None.

sample_weightPandasSeriesType, optional

Record-wise weights to apply. Defaults to None.

Returns
PandasDataFrameType

The binary columns of the optimised rules on the fitted dataset.

as_rule_dicts() Dict[str, dict]

Converts rules into the standard Iguanas dictionary format.

Returns
Dict[str, dict]

Rules in the standard Iguanas dictionary format.

as_rule_lambdas(as_numpy: bool, with_kwargs: bool) Dict[str, Callable[[dict], str]]

Converts rules into the standard Iguanas lambda expression format.

Parameters
as_numpybool

If True, the conditions in the string format will uses Numpy rather than Pandas. These rules are generally evaluated more quickly on larger dataset stored as Pandas DataFrames.

with_kwargsbool

If True, the string in the lambda expression is created such that the inputs are keyword arguments. If False, the inputs are positional arguments.

Returns
Dict[str, Callable[[dict], str]]

Rules in the standard Iguanas lambda expression format.

as_rule_strings(as_numpy: bool) Dict[str, str]

Converts rules into the standard Iguanas string format.

Parameters
as_numpybool

If True, the conditions in the string format will uses Numpy rather than Pandas. These rules are generally evaluated more quickly on larger dataset stored as Pandas DataFrames.

Returns
Dict[str, str]

Rules in the standard Iguanas string format.

filter_rules(include=None, exclude=None) None

Filters the rules by their names.

Parameters
includeList[str], optional

The list of rule names to keep. Defaults to None.

excludeList[str], optional

The list of rule names to drop. Defaults to None.

Raises
Exception

include and exclude cannot contain similar values.

get_rule_features() Dict[str, set]

Returns the set of unique features present in each rule.

Returns
Dict[str, set]

Set of unique features (values) in each rule (keys).

classmethod plot_performance_uplift(orig_rule_performances: Dict[str, float], opt_rule_performances: Dict[str, float], figsize=(20, 10)) seaborn.relational.scatterplot

Generates a scatterplot showing the performance of each rule before and after optimisation.

Parameters
orig_rule_performancesDict[str, float]

The performance metric of each rule prior to optimisation.

opt_rule_performancesDict[str, float]

The performance metric of each rule after optimisation.

figsizetuple, optional

The width and height of the scatterplot. Defaults to (20, 10).

Returns
sns.scatterplot

Compares the performance of each rule before and after optimisation.

classmethod plot_performance_uplift_distribution(orig_rule_performances: Dict[str, float], opt_rule_performances: Dict[str, float], figsize=(8, 10)) seaborn.categorical.boxplot

Generates a boxplot showing the distribution of performance uplifts (original rules vs optimised rules).

Parameters
orig_rule_performancesDict[str, float]

The performance metric of each rule prior to optimisation.

opt_rule_performancesDict[str, float]

The performance metric of each rule after optimisation.

figsizetuple, optional

The width and height of the boxplot. Defaults to (20, 10).

Returns
sns.boxplot

Shows the distribution of performance uplifts (original rules vs optimised rules).

transform(X: Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame], y=None, sample_weight=None) Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame]

Applies the set of rules to a dataset, X. If y is provided, the performance metrics for each rule will also be calculated.

Parameters
XUnion[PandasDataFrameType, KoalasDataFrameType]

The feature set on which the rules should be applied.

yUnion[PandasSeriesType, KoalasSeriesType], optional

The target column. Defaults to None.

sample_weightUnion[PandasSeriesType, KoalasSeriesType], optional

Record-wise weights to apply. Defaults to None.

Returns
Union[PandasDataFrameType, KoalasDataFrameType]

The binary columns of the rules.