rfe

class RFEStep(metric_value: float, features: List[str])[source]

Bases: object

metric_value: float
features: List[str]
__init__(metric_value: float, features: List[str]) None
class RFEResult(steps: List[sensai.feature_selection.rfe.RFEStep], metric_name: str, minimise: bool)[source]

Bases: object

__init__(steps: List[sensai.feature_selection.rfe.RFEStep], metric_name: str, minimise: bool)
get_sorted_steps() List[sensai.feature_selection.rfe.RFEStep]
Returns

the elimination step results, sorted from best to worst

get_selected_features() List[str]
get_num_features_array() numpy.ndarray
Returns

array containing the number of features that was considered in each step

get_metric_values_array() numpy.ndarray
Returns

array containing the metric value that resulted in each step

plot_metric_values() matplotlib.figure.Figure

Plots the metric values vs. the number of features for each step of the elimination

Returns

the figure

class RecursiveFeatureEliminationCV(cross_validator_params: sensai.evaluation.crossval.VectorModelCrossValidatorParams, min_features=1)[source]

Bases: object

Recursive feature elimination, using cross-validation to select the best set of features: In each step, the model is first evaluated using cross-validation. Then the feature importance values are aggregated across the models that were trained during cross-validation, and the least important feature is discarded. For the case where the lowest feature importance is 0, all features with 0 importance are discarded. This process is repeated until a point is reached where only minFeatures (or less) remain. The selected set of features is the one from the step where cross-validation yielded the best evaluation metric value.

Feature importance is computed at the level of model input features, i.e. after feature generation and transformation.

NOTE: This implementation differs markedly from sklearn’s RFECV, which performs an independent RFE for each fold. RFECV determines the number of features to use by determining the elimination step in each fold that yielded the best metric value on average. Because the eliminations are independent, the actual features that were being used in those step could have been completely different. Using the selected number of features n, RFECV then performs another RFE, eliminating features until n features remain and returns these features as the result.

__init__(cross_validator_params: sensai.evaluation.crossval.VectorModelCrossValidatorParams, min_features=1)
Parameters
  • cross_validator_params – the parameters for cross-validation

  • min_features – the smallest number of features that shall be evaluated during feature elimination

run(model: Union[sensai.vector_model.VectorModel, sensai.feature_importance.FeatureImportanceProvider], io_data: sensai.data.InputOutputData, metric_name: str, minimise: bool, remove_input_preprocessors=False) sensai.feature_selection.rfe.RFEResult

Runs the optimisation for the given model and data.

Parameters
  • model – the model

  • io_data – the data

  • metric_name – the metric to optimise

  • minimise – whether the metric shall be minimsed; if False, maximise.

  • remove_input_preprocessors – whether to remove input preprocessors from the model and create input data only once during the entire experiment; this is usually reasonable only if all input preprocessors are not trained on the input data or if, for any given data split/fold, the preprocessor learning outcome is likely to be largely similar.

Returns

a result object, which provides access to the selected features and data on all elimination steps

class RecursiveFeatureElimination(metric_computation: sensai.evaluation.metric_computation.MetricComputation, min_features=1)[source]

Bases: object

__init__(metric_computation: sensai.evaluation.metric_computation.MetricComputation, min_features=1)
Parameters
  • metric_computation – the method to apply for metric computation in order to determine which feature set is best

  • min_features – the smallest number of features that shall be evaluated during feature elimination

run(model_factory: Callable[[], Union[sensai.vector_model.VectorRegressionModel, sensai.vector_model.VectorClassificationModel]], minimise: bool) sensai.feature_selection.rfe.RFEResult

Runs the optimisation for the given model and data.

Parameters
  • model_factory – factory for the model to be evaluated

  • minimise – whether the metric shall be minimised; if False, maximise.

Returns

a result object, which provides access to the selected features and data on all elimination steps