iguanas.rule_optimisation
.DirectSearchOptimiser¶
- class iguanas.rule_optimisation.DirectSearchOptimiser(rule_lambdas: Dict[str, Callable], lambda_kwargs: Dict[str, Dict[str, float]], opt_func: Callable, x0=None, method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=None, tol=None, callback=None, options=None, verbose=0)[source]¶
Optimises a set of rules (given in the standard Iguanas lambda expression format) using Direct Search-type algorithms.
- Parameters
- rule_lambdasDict[str, Callable]
Set of rules defined using the standard Iguanas lambda expression format (values) and their names (keys).
- lambda_kwargsDict[str, Dict[str, float]]
For each rule (keys), a dictionary containing the features used in the rule (keys) and the current values (values).
- opt_funcCallable
The optimisation function used to calculate the metric which the rules are optimised for (e.g. F1 score).
- x0dict, optional
Dictionary of the initial guess (values) for each rule (keys). If None, defaults to the current values used in each rule (taken from the lambda_kwargs parameter). See scipy.optimize.minimize() documentation for more information. Defaults to None.
- methodstr, optional
Type of solver. See scipy.optimize.minimize() documentation for more information. Defaults to None.
- jacdict, optional
Dictionary of the method for computing the gradient vector (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.
- hessdict, optional
Dictionary of the method for computing the Hessian matrix (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.
- hesspdict, optional
Dictionary of the Hessian of objective function times an arbitrary vector p (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.
- boundsdict, optional
Dictionary of the bounds on variables (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.
- constraintsdict, optional
Dictionary of the constraints definition (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.
- toldict, optional
Dictionary of the tolerance for termination (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.
- callbackdict, optional
Dictionary of the callbacks (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.
- optionsdict, optional
Dictionary of the solver options (values) for each rule (keys). See scipy.optimize.minimize() documentation for more information. Defaults to None.
- verboseint, optional
Controls the verbosity - the higher, the more messages. >0 : shows the overall progress of the optimisation process. Defaults to 0.
- Attributes
- rule_stringsDict[str, str]
The optimised rules stored in the standard Iguanas string format (values) and their names (keys).
- rule_descriptionsPandasDataFrameType
A dataframe showing the logic of the rules and their performance metrics on the given dataset.
- rule_names_missing_featuresList[str]
Names of rules which use features that are not present in the dataset (and therefore can’t be optimised or applied).
- rule_names_no_opt_conditionsList[str]
Names of rules which have no optimisable conditions (e.g. rules that only contain string-based conditions).
- rule_names_zero_var_featuresList[str]
Names of rules which exclusively contain zero variance features (based on X), so cannot be optimised.
- opt_rule_performancesDict[str, float]
The optimisation metric (values) calculated for each optimised rule (keys).
- orig_rule_performancesDict[str, float]
The optimisation metric (values) calculated for each original rule (keys).
- non_optimisable_rulesRules
A Rules object containing the rules which could not be optimised.
- fit(X: iguanas.utils.typing.pandas.core.frame.DataFrame, y=None, sample_weight=None) iguanas.utils.typing.pandas.core.frame.DataFrame [source]¶
Optimises a set of rules (given in the standard Iguanas lambda expression format) using Direct Search-type algorithms.
- Parameters
- XPandasDataFrameType
The feature set.
- yPandasSeriesType
The binary target column. Not required if optimising rules on unlabelled data. Defaults to None.
- sample_weightPandasSeriesType, optional
Record-wise weights to apply. Defaults to None.
- Returns
- PandasDataFrameType
The binary columns of the optimised rules on the fitted dataset.
- classmethod create_bounds(X: iguanas.utils.typing.pandas.core.frame.DataFrame, lambda_kwargs: Dict[str, float]) Dict[str, numpy.ndarray] [source]¶
Creates the bounds parameter using the min and max of each feature in each rule.
- Parameters
- XPandasDataFrameType
The feature set.
- lambda_kwargsDict[str, Dict[str, float]]
For each rule (keys), a dictionary containing the features used in the rule (keys) and the current values (values).
- Returns
- Dict[str, np.ndarray]
The bounds for each feature (values) in each rule (keys).
- classmethod create_x0(X: iguanas.utils.typing.pandas.core.frame.DataFrame, lambda_kwargs: Dict[str, dict]) Dict[str, numpy.ndarray] [source]¶
Creates the x0 parameter using the mid-range value of each feature in each rule.
- Parameters
- XPandasDataFrameType
The feature set.
- lambda_kwargsDict[str, Dict[str, float]]
For each rule (keys), a dictionary containing the features used in the rule (keys) and the current values (values).
- Returns
- Dict[str, np.ndarray]
The x0 for each feature (values) in each rule (keys).
- classmethod create_initial_simplexes(X: iguanas.utils.typing.pandas.core.frame.DataFrame, lambda_kwargs: Dict[str, dict], shape: str) Dict[str, numpy.ndarray] [source]¶
Creates the initial_simplex parameter for each rule.
- Parameters
- XPandasDataFrameType
The feature set.
- lambda_kwargsDict[str, dict]
For each rule (keys), a dictionary containing the features used in the rule (keys) and the current values (values).
- shapestr
Name of specified simplex structure. Can be ‘Origin-based’ (simplex begins at origin and extends to feature maximums), ‘Minimum-based’ (simplex begins at feature minimums and extends to feature maximums) or ‘Random-based’ (randomly assigned simplex between feature minimums and feature maximums).
- Returns
- Dict[str, np.ndarray]
The initial simplex (values) for each rule (keys).
- as_rule_dicts() Dict[str, dict] ¶
Converts rules into the standard Iguanas dictionary format.
- Returns
- Dict[str, dict]
Rules in the standard Iguanas dictionary format.
- as_rule_lambdas(as_numpy: bool, with_kwargs: bool) Dict[str, Callable[[dict], str]] ¶
Converts rules into the standard Iguanas lambda expression format.
- Parameters
- as_numpybool
If True, the conditions in the string format will uses Numpy rather than Pandas. These rules are generally evaluated more quickly on larger dataset stored as Pandas DataFrames.
- with_kwargsbool
If True, the string in the lambda expression is created such that the inputs are keyword arguments. If False, the inputs are positional arguments.
- Returns
- Dict[str, Callable[[dict], str]]
Rules in the standard Iguanas lambda expression format.
- as_rule_strings(as_numpy: bool) Dict[str, str] ¶
Converts rules into the standard Iguanas string format.
- Parameters
- as_numpybool
If True, the conditions in the string format will uses Numpy rather than Pandas. These rules are generally evaluated more quickly on larger dataset stored as Pandas DataFrames.
- Returns
- Dict[str, str]
Rules in the standard Iguanas string format.
- filter_rules(include=None, exclude=None) None ¶
Filters the rules by their names.
- Parameters
- includeList[str], optional
The list of rule names to keep. Defaults to None.
- excludeList[str], optional
The list of rule names to drop. Defaults to None.
- Raises
- Exception
include and exclude cannot contain similar values.
- get_rule_features() Dict[str, set] ¶
Returns the set of unique features present in each rule.
- Returns
- Dict[str, set]
Set of unique features (values) in each rule (keys).
- classmethod plot_performance_uplift(orig_rule_performances: Dict[str, float], opt_rule_performances: Dict[str, float], figsize=(20, 10)) seaborn.relational.scatterplot ¶
Generates a scatterplot showing the performance of each rule before and after optimisation.
- Parameters
- orig_rule_performancesDict[str, float]
The performance metric of each rule prior to optimisation.
- opt_rule_performancesDict[str, float]
The performance metric of each rule after optimisation.
- figsizetuple, optional
The width and height of the scatterplot. Defaults to (20, 10).
- Returns
- sns.scatterplot
Compares the performance of each rule before and after optimisation.
- classmethod plot_performance_uplift_distribution(orig_rule_performances: Dict[str, float], opt_rule_performances: Dict[str, float], figsize=(8, 10)) seaborn.categorical.boxplot ¶
Generates a boxplot showing the distribution of performance uplifts (original rules vs optimised rules).
- Parameters
- orig_rule_performancesDict[str, float]
The performance metric of each rule prior to optimisation.
- opt_rule_performancesDict[str, float]
The performance metric of each rule after optimisation.
- figsizetuple, optional
The width and height of the boxplot. Defaults to (20, 10).
- Returns
- sns.boxplot
Shows the distribution of performance uplifts (original rules vs optimised rules).
- transform(X: Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame], y=None, sample_weight=None) Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame] ¶
Applies the set of rules to a dataset, X. If y is provided, the performance metrics for each rule will also be calculated.
- Parameters
- XUnion[PandasDataFrameType, KoalasDataFrameType]
The feature set on which the rules should be applied.
- yUnion[PandasSeriesType, KoalasSeriesType], optional
The target column. Defaults to None.
- sample_weightUnion[PandasSeriesType, KoalasSeriesType], optional
Record-wise weights to apply. Defaults to None.
- Returns
- Union[PandasDataFrameType, KoalasDataFrameType]
The binary columns of the rules.