iguanas.rule_optimisation.DirectSearchOptimiser

class iguanas.rule_optimisation.DirectSearchOptimiser(rule_lambdas: Dict[str, Callable], lambda_kwargs: Dict[str, Dict[str, float]], opt_func: Callable, x0=None, method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=None, tol=None, callback=None, options=None, verbose=0)[source]

Optimises a set of rules (given in the standard Iguanas lambda expression format) using Direct Search-type algorithms.

Parameters
rule_lambdasDict[str, Callable]

Set of rules defined using the standard Iguanas lambda expression format (values) and their names (keys).

lambda_kwargsDict[str, Dict[str, float]]

For each rule (keys), a dictionary containing the features used in the rule (keys) and the current values (values).

opt_funcCallable

The optimisation function used to calculate the metric which the rules are optimised for (e.g. F1 score).

x0dict, optional

Dictionary of the initial guess (values) for each rule (keys). If None, defaults to the current values used in each rule (taken from the lambda_kwargs parameter). See scipy.optimize.minimize() documentation for more information. Defaults to None.

methodstr, optional

Type of solver. See scipy.optimize.minimize() documentation for more information. Defaults to None.

jacdict, optional

Dictionary of the method for computing the gradient vector (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.

hessdict, optional

Dictionary of the method for computing the Hessian matrix (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.

hesspdict, optional

Dictionary of the Hessian of objective function times an arbitrary vector p (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.

boundsdict, optional

Dictionary of the bounds on variables (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.

constraintsdict, optional

Dictionary of the constraints definition (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.

toldict, optional

Dictionary of the tolerance for termination (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.

callbackdict, optional

Dictionary of the callbacks (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.

optionsdict, optional

Dictionary of the solver options (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.

verboseint, optional

Controls the verbosity - the higher, the more messages. >0 : shows the overall progress of the optimisation process. Defaults to 0.

Attributes
rule_stringsDict[str, str]

The optimised rules stored in the standard Iguanas string format (values) and their names (keys).

rule_descriptionsPandasDataFrameType

A dataframe showing the logic of the rules and their performance metrics on the given dataset.

rule_names_missing_featuresList[str]

Names of rules which use features that are not present in the dataset (and therefore can’t be optimised or applied).

rule_names_no_opt_conditionsList[str]

Names of rules which have no optimisable conditions (e.g. rules that only contain string-based conditions).

rule_names_zero_var_featuresList[str]

Names of rules which exclusively contain zero variance features (based on X), so cannot be optimised.

opt_rule_performancesDict[str, float]

The optimisation metric (values) calculated for each optimised rule (keys).

orig_rule_performancesDict[str, float]

The optimisation metric (values) calculated for each original rule (keys).

non_optimisable_rulesRules

A Rules object containing the rules which could not be optimised.

fit(X: iguanas.utils.typing.pandas.core.frame.DataFrame, y=None, sample_weight=None) iguanas.utils.typing.pandas.core.frame.DataFrame[source]

Optimises a set of rules (given in the standard Iguanas lambda expression format) using Direct Search-type algorithms.

Parameters
XPandasDataFrameType

The feature set.

yPandasSeriesType

The binary target column. Not required if optimising rules on unlabelled data. Defaults to None.

sample_weightPandasSeriesType, optional

Record-wise weights to apply. Defaults to None.

Returns
PandasDataFrameType

The binary columns of the optimised rules on the fitted dataset.

classmethod create_bounds(X: iguanas.utils.typing.pandas.core.frame.DataFrame, lambda_kwargs: Dict[str, float]) Dict[str, numpy.ndarray][source]

Creates the bounds parameter using the min and max of each feature in each rule.

Parameters
XPandasDataFrameType

The feature set.

lambda_kwargsDict[str, Dict[str, float]]

For each rule (keys), a dictionary containing the features used in the rule (keys) and the current values (values).

Returns
Dict[str, np.ndarray]

The bounds for each feature (values) in each rule (keys).

classmethod create_x0(X: iguanas.utils.typing.pandas.core.frame.DataFrame, lambda_kwargs: Dict[str, dict]) Dict[str, numpy.ndarray][source]

Creates the x0 parameter using the mid-range value of each feature in each rule.

Parameters
XPandasDataFrameType

The feature set.

lambda_kwargsDict[str, Dict[str, float]]

For each rule (keys), a dictionary containing the features used in the rule (keys) and the current values (values).

Returns
Dict[str, np.ndarray]

The x0 for each feature (values) in each rule (keys).

classmethod create_initial_simplexes(X: iguanas.utils.typing.pandas.core.frame.DataFrame, lambda_kwargs: Dict[str, dict], shape: str) Dict[str, numpy.ndarray][source]

Creates the initial_simplex parameter for each rule.

Parameters
XPandasDataFrameType

The feature set.

lambda_kwargsDict[str, dict]

For each rule (keys), a dictionary containing the features used in the rule (keys) and the current values (values).

shapestr

Name of specified simplex structure. Can be ‘Origin-based’ (simplex begins at origin and extends to feature maximums), ‘Minimum-based’ (simplex begins at feature minimums and extends to feature maximums) or ‘Random-based’ (randomly assigned simplex between feature minimums and feature maximums).

Returns
Dict[str, np.ndarray]

The initial simplex (values) for each rule (keys).

as_rule_dicts() Dict[str, dict]

Converts rules into the standard Iguanas dictionary format.

Returns
Dict[str, dict]

Rules in the standard Iguanas dictionary format.

as_rule_lambdas(as_numpy: bool, with_kwargs: bool) Dict[str, Callable[[dict], str]]

Converts rules into the standard Iguanas lambda expression format.

Parameters
as_numpybool

If True, the conditions in the string format will uses Numpy rather than Pandas. These rules are generally evaluated more quickly on larger dataset stored as Pandas DataFrames.

with_kwargsbool

If True, the string in the lambda expression is created such that the inputs are keyword arguments. If False, the inputs are positional arguments.

Returns
Dict[str, Callable[[dict], str]]

Rules in the standard Iguanas lambda expression format.

as_rule_strings(as_numpy: bool) Dict[str, str]

Converts rules into the standard Iguanas string format.

Parameters
as_numpybool

If True, the conditions in the string format will uses Numpy rather than Pandas. These rules are generally evaluated more quickly on larger dataset stored as Pandas DataFrames.

Returns
Dict[str, str]

Rules in the standard Iguanas string format.

filter_rules(include=None, exclude=None) None

Filters the rules by their names.

Parameters
includeList[str], optional

The list of rule names to keep. Defaults to None.

excludeList[str], optional

The list of rule names to drop. Defaults to None.

Raises
Exception

include and exclude cannot contain similar values.

get_rule_features() Dict[str, set]

Returns the set of unique features present in each rule.

Returns
Dict[str, set]

Set of unique features (values) in each rule (keys).

classmethod plot_performance_uplift(orig_rule_performances: Dict[str, float], opt_rule_performances: Dict[str, float], figsize=(20, 10)) seaborn.relational.scatterplot

Generates a scatterplot showing the performance of each rule before and after optimisation.

Parameters
orig_rule_performancesDict[str, float]

The performance metric of each rule prior to optimisation.

opt_rule_performancesDict[str, float]

The performance metric of each rule after optimisation.

figsizetuple, optional

The width and height of the scatterplot. Defaults to (20, 10).

Returns
sns.scatterplot

Compares the performance of each rule before and after optimisation.

classmethod plot_performance_uplift_distribution(orig_rule_performances: Dict[str, float], opt_rule_performances: Dict[str, float], figsize=(8, 10)) seaborn.categorical.boxplot

Generates a boxplot showing the distribution of performance uplifts (original rules vs optimised rules).

Parameters
orig_rule_performancesDict[str, float]

The performance metric of each rule prior to optimisation.

opt_rule_performancesDict[str, float]

The performance metric of each rule after optimisation.

figsizetuple, optional

The width and height of the boxplot. Defaults to (20, 10).

Returns
sns.boxplot

Shows the distribution of performance uplifts (original rules vs optimised rules).

transform(X: Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame], y=None, sample_weight=None) Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame]

Applies the set of rules to a dataset, X. If y is provided, the performance metrics for each rule will also be calculated.

Parameters
XUnion[PandasDataFrameType, KoalasDataFrameType]

The feature set on which the rules should be applied.

yUnion[PandasSeriesType, KoalasSeriesType], optional

The target column. Defaults to None.

sample_weightUnion[PandasSeriesType, KoalasSeriesType], optional

Record-wise weights to apply. Defaults to None.

Returns
Union[PandasDataFrameType, KoalasDataFrameType]

The binary columns of the rules.