iguanas.rbs.RBSOptimiser

class iguanas.rbs.RBSOptimiser(pipeline: iguanas.rbs.rbs_pipeline.RBSPipeline, n_iter: int, algorithm=<function suggest>, rule_types=None, verbose=0, **kwargs)[source]

Optimises the rules within an RBS Pipeline based on an optimisation function. If the config parameter is an empty dictionary, then the pipeline configuration is optimised from scratch; else, the rules included within the existing pipeline configuration are optimised.

Parameters
pipelineRBSPipeline

The RBS Pipeline to optimise.

n_iterint

The number of iterations that the optimiser should perform.

algorithmCallable, optional

The algorithm leveraged by hyperopt’s fmin function, which optimises the rules. Defaults to tpe.suggest, which corresponds to Tree-of-Parzen-Estimator.

rule_typesDict[int, List[str]], optional

The list of rules (values) that are assigned to each decision (keys), either 0 or 1. Must be given when the config parameter in the pipeline is an empty dictionary. Defaults to None.

verboseint, optional

Controls the verbosity - the higher, the more messages. >0 : shows the overall progress of the optimisation process. Defaults to 0.

Raises
ValueError

If config not provided in pipeline, rule_types must be given.

Attributes
configList[dict]

The optimised pipeline configuration, where each element aligns to a stage in the pipeline. Each element is a dictionary, where the key is the decision made at that stage (either 0 or 1) and the value is a list of the rules that must trigger to give that decision.

pipeline_opt_metricfloat

The result of the opt_func function when the pipeline is applied.

conf_matrixPandasDataFrameType

The confusion matrix for the applied pipeline. Only generated after running calc_performance.

conf_matrix_weightedPandasDataFrameType

The confusion matrix for the applied pipeline. Only generated after running calc_performance and when sample_weight is provided.

pipeline_perfPandasDataFrameType

The performance (precision, recall, percentage of data flagged) of each decision made by the pipeline. Only generated after running calc_performance.

fit(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) None[source]

Optimises the pipeline for the given dataset.

Parameters
X_rulesPandasDataFrameType

Dataset of each applied rule.

yPandasSeriesType

The target.

sample_weightPandasSeriesType, optional

Record-wise weights to apply. Defaults to None. Defaults to None.

fit_predict(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) iguanas.utils.typing.pandas.core.series.Series[source]

Optimises the pipeline for the given dataset and applies the pipeline to the dataset.

Parameters
X_rulesPandasDataFrameType

Dataset of each applied rule.

yPandasSeriesType

The target.

sample_weightPandasSeriesType, optional

Record-wise weights to apply. Defaults to None. Defaults to None.

Returns
PandasSeriesType

The prediction of the pipeline.

calc_performance(y_true: iguanas.utils.typing.pandas.core.series.Series, y_pred: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) None

Calculates the confusion matrices (non-weighted and weighted, if provided) and overall performance of the pipeline.

Note that for the confusion matrices, the index shows the predicted class; the column shows the actual class.

Parameters
y_truePandasSeriesType

The target.

y_predPandasSeriesType

The RBS pipeline prediction.

sample_weightPandasSeriesType, optional

Record-wise weights to apply. Defaults to None. Defaults to None.

predict(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) iguanas.utils.typing.pandas.core.series.Series

Applies the pipeline to the given dataset.

Parameters
X_rulesPandasDataFrameType

Dataset of each applied rule.

yPandasSeriesType

The target.

sample_weightPandasSeriesType, optional

Record-wise weights to apply. Defaults to None. Defaults to None.

Returns
PandasSeriesType

The prediction of the pipeline.