iguanas.rbs
.RBSOptimiser¶
- class iguanas.rbs.RBSOptimiser(pipeline: iguanas.rbs.rbs_pipeline.RBSPipeline, n_iter: int, algorithm=<function suggest>, rule_types=None, verbose=0, **kwargs)[source]¶
Optimises the rules within an RBS Pipeline based on an optimisation function. If the config parameter is an empty dictionary, then the pipeline configuration is optimised from scratch; else, the rules included within the existing pipeline configuration are optimised.
- Parameters
- pipelineRBSPipeline
The RBS Pipeline to optimise.
- n_iterint
The number of iterations that the optimiser should perform.
- algorithmCallable, optional
The algorithm leveraged by hyperopt’s fmin function, which optimises the rules. Defaults to tpe.suggest, which corresponds to Tree-of-Parzen-Estimator.
- rule_typesDict[int, List[str]], optional
The list of rules (values) that are assigned to each decision (keys), either 0 or 1. Must be given when the config parameter in the pipeline is an empty dictionary. Defaults to None.
- verboseint, optional
Controls the verbosity - the higher, the more messages. >0 : shows the overall progress of the optimisation process. Defaults to 0.
- Raises
- ValueError
If config not provided in pipeline, rule_types must be given.
- Attributes
- configList[dict]
The optimised pipeline configuration, where each element aligns to a stage in the pipeline. Each element is a dictionary, where the key is the decision made at that stage (either 0 or 1) and the value is a list of the rules that must trigger to give that decision.
- pipeline_opt_metricfloat
The result of the opt_func function when the pipeline is applied.
- conf_matrixPandasDataFrameType
The confusion matrix for the applied pipeline. Only generated after running calc_performance.
- conf_matrix_weightedPandasDataFrameType
The confusion matrix for the applied pipeline. Only generated after running calc_performance and when sample_weight is provided.
- pipeline_perfPandasDataFrameType
The performance (precision, recall, percentage of data flagged) of each decision made by the pipeline. Only generated after running calc_performance.
- fit(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) None [source]¶
Optimises the pipeline for the given dataset.
- Parameters
- X_rulesPandasDataFrameType
Dataset of each applied rule.
- yPandasSeriesType
The target.
- sample_weightPandasSeriesType, optional
Record-wise weights to apply. Defaults to None. Defaults to None.
- fit_predict(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) iguanas.utils.typing.pandas.core.series.Series [source]¶
Optimises the pipeline for the given dataset and applies the pipeline to the dataset.
- Parameters
- X_rulesPandasDataFrameType
Dataset of each applied rule.
- yPandasSeriesType
The target.
- sample_weightPandasSeriesType, optional
Record-wise weights to apply. Defaults to None. Defaults to None.
- Returns
- PandasSeriesType
The prediction of the pipeline.
- calc_performance(y_true: iguanas.utils.typing.pandas.core.series.Series, y_pred: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) None ¶
Calculates the confusion matrices (non-weighted and weighted, if provided) and overall performance of the pipeline.
Note that for the confusion matrices, the index shows the predicted class; the column shows the actual class.
- Parameters
- y_truePandasSeriesType
The target.
- y_predPandasSeriesType
The RBS pipeline prediction.
- sample_weightPandasSeriesType, optional
Record-wise weights to apply. Defaults to None. Defaults to None.
- predict(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) iguanas.utils.typing.pandas.core.series.Series ¶
Applies the pipeline to the given dataset.
- Parameters
- X_rulesPandasDataFrameType
Dataset of each applied rule.
- yPandasSeriesType
The target.
- sample_weightPandasSeriesType, optional
Record-wise weights to apply. Defaults to None. Defaults to None.
- Returns
- PandasSeriesType
The prediction of the pipeline.