vector_model

This module defines base classes for models that use pandas.DataFrames for inputs and outputs, where each data frame row represents a single model input or output. Since every row contains a vector of data (one-dimensional array), we refer to them as vector-based models. Hence the name of the module and of the central base class VectorModel.

class VectorModelBase[source]

Bases: abc.ABC, sensai.util.string.ToStringMixin

Base class for vector models, which defines the fundamental prediction interface. A vector model takes data frames as input, where each row represents a vector of information.

__init__()

abstract predict(x: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame

abstract is_regression_model() → bool

abstract get_predicted_variable_names() → list

with_name(name: str) → sensai.vector_model.TVectorModelBase

Sets the model’s name.

Parameters: name – the name
Returns: self

set_name(name)

get_name()

class VectorModelFittableBase[source]

Bases: sensai.vector_model.VectorModelBase, abc.ABC

Base class for vector models, which encompasses the fundamental prediction and fitting interfaces. A vector model takes data frames as input, where each row represents a vector of information.

abstract fit(x: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame)

abstract is_fitted() → bool

class TrainingContext(original_input: pandas.core.frame.DataFrame, original_output: pandas.core.frame.DataFrame)[source]

Bases: object

Contains context information for an ongoing training process

__init__(original_input: pandas.core.frame.DataFrame, original_output: pandas.core.frame.DataFrame)

class VectorModel(check_input_columns=True)[source]

Bases: sensai.vector_model.VectorModelFittableBase, sensai.util.cache.PickleLoadSaveMixin, abc.ABC

Represents a model which uses data frames as inputs and outputs whose rows define individual data points. Every data frame row represents a vector of information (one-dimensional array), hence the name of the model. Note that the vectors in question are not necessarily vectors in the mathematical sense, as the information in each cell is not required to be numeric or uniform but can be arbitrarily complex.

TOSTRING_INCLUDE_PREPROCESSORS = True

__init__(check_input_columns=True)

Parameters: check_input_columns – whether to check if the input column list (that is fed to the underlying model, i.e. after feature generation) during inference coincides with the input column list that was observed during training. This should be disabled if feature generation is not performed by the model itself, e.g. in meta-models such as ensemble models.

with_raw_input_transformers(*transformers: Union[sensai.data_transformation.dft.DataFrameTransformer, List[sensai.data_transformation.dft.DataFrameTransformer]]) → sensai.vector_model.TVectorModel

Makes the model use the given transformers (removing previously set raw input transformers, if any), which are to be applied to the raw input data frame (prior to feature generation).

Parameters: transformers – DataFrameTransformer instances to use (in sequence) for the transformation of inputs
Returns: self

with_feature_transformers(*transformers: Union[sensai.data_transformation.dft.DataFrameTransformer, List[sensai.data_transformation.dft.DataFrameTransformer]], add=False) → sensai.vector_model.TVectorModel

Makes the model use the given transformers which are to be applied to the data frames generated by feature generators. (If the model does not use feature generators, the transformers will be applied to whatever is produced by the raw input transformers or, if there are none, the original raw input data frame).

Parameters

transformers – DataFrameTransformer instances to use (in sequence) for the transformation of features
add – whether to add the transformers to the existing transformers rather than replacing them

Returns

self

with_input_transformers(*input_transformers: Union[sensai.data_transformation.dft.DataFrameTransformer, List[sensai.data_transformation.dft.DataFrameTransformer]]) → sensai.vector_model.TVectorModel

Makes the model use the given feature transformers (removing previously set transformers, if any), i.e. it transforms the data frame that is generated by the feature generators (if any).

Parameters: input_transformers – DataFrameTransformer instances to use (in sequence) for the transformation of inputs
Returns: self

with_feature_generator(feature_generator: Optional[sensai.featuregen.feature_generator.FeatureGenerator]) → sensai.vector_model.TVectorModel

Makes the model use the given feature generator in order to obtain the model inputs. If the model shall use more than one feature generator, pass a MultiFeatureGenerator which combines them or use the perhaps more convenient FeatureCollector in conjunction with withFeatureCollector().

Note: Feature computation takes place before input transformation.

Parameters: feature_generator – the feature generator to use for input computation
Returns: self

with_feature_collector(feature_collector: sensai.featuregen.feature_generator_registry.FeatureCollector, shared: bool = False) → sensai.vector_model.TVectorModel

Makes the model use a multi-feature generator obtained from the given collector in order compute the underlying model’s input from the data frame that is given. Overrides any feature generator previously passed to withFeatureGenerator() (if any).

Note: Feature generation takes place before feature transformation.

Parameters

feature_collector – the feature collector from which to obtain the multi-feature generator
shared – whether the given feature collector is shared between models (i.e. whether the same instance is passed to multiple models). Passing shared=False ensures that models using the same collector do not end up using the same multi-feature collector.

Returns

self

is_fitted()

Returns: True if the model has been fitted, False otherwise

compute_model_inputs(x: pandas.core.frame.DataFrame)

Applies feature generators and input transformers (if any) to generate from an input data frame the input for the underlying model

Parameters: x – the input data frame, to which input preprocessing is to be applied
Returns: the input data frame that serves as input for the underlying model

compute_model_outputs(y: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame

predict(x: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame

Applies the model to the given input data frame

Parameters: x – the input data frame
Returns: the model outputs in the form of a data frame whose index corresponds to the index of x

fit_input_output_data(io_data: sensai.data.InputOutputData, fit_preprocessors=True, fit_model=True)

Fits the model using the given data

Parameters

io_data – the input/output data
fit_preprocessors – whether the model’s preprocessors (feature generators and data frame transformers) shall be fitted
fit_model – whether the model itself shall be fitted

fit(x: pandas.core.frame.DataFrame, y: Optional[pandas.core.frame.DataFrame], fit_preprocessors=True, fit_model=True)

Fits the model using the given data

Parameters

x – a data frame containing input data
y – a data frame containing output data; may be None if the underlying model does not actually require fitting, e.g. in the case of a rule-based models, but fitting is still necessary for preprocessors
fit_preprocessors – whether the model’s preprocessors (feature generators and data frame transformers) shall be fitted
fit_model – whether the model itself shall be fitted

is_being_fitted() → bool

Returns: True if the model is currently in the process of being fitted, False otherwise

get_predicted_variable_names()

Returns: the list of variable names that are ultimately output by this model (i.e. the columns of the data frame output by predict())

get_model_input_variable_names() → Optional[List[str]]

Returns: the list of variable names required by the underlying model as input (after feature generation and data frame transformation) or None if the model has not been fitted (or is a rule-based model which does not determine the variable names).

get_input_transformer(cls: Type[sensai.data_transformation.dft.DataFrameTransformer])

Gets the (first) feature transformer of the given type (if any) within this models feature transformer chain

Parameters: cls – the type of transformer to look for
Returns: the first matching feature transformer or None

get_feature_transformer(cls: Type[sensai.data_transformation.dft.DataFrameTransformer])

Gets the (first) feature transformer of the given type (if any) within this models feature transformer chain

Parameters: cls – the type of transformer to look for
Returns: the first matching feature transformer or None

get_raw_input_transformer(cls: Type[sensai.data_transformation.dft.DataFrameTransformer])

Gets the (first) raw input transformer of the given type (if any) within this models raw input transformer chain

Parameters: cls – the type of transformer to look for
Returns: the first matching raw input transformer or None

get_input_transformer_chain() → sensai.data_transformation.dft.DataFrameTransformerChain

Returns: the model’s feature transformer chain (which may be empty and contain no actual transformers), i.e. the transformers that are applied after feature generation

get_raw_input_transformer_chain() → sensai.data_transformation.dft.DataFrameTransformerChain

Returns: the model’s raw input transformer chain (which may be empty and contain no actual transformers), i.e. the transformers that are applied before feature generation

get_feature_transformer_chain() → sensai.data_transformation.dft.DataFrameTransformerChain

Returns: the model’s feature transformer chain (which may be empty and contain no actual transformers), i.e. the transformers that are applied after feature generation

set_feature_generator(feature_generator: Optional[sensai.featuregen.feature_generator.FeatureGenerator])

get_feature_generator() → Optional[sensai.featuregen.feature_generator.FeatureGenerator]

Returns: the model’s feature generator (if any)

remove_input_preprocessors(): Removes all input preprocessors (i.e. raw input transformers, feature generators and feature transformers) from the model

class VectorRegressionModel(check_input_columns=True)[source]

Bases: sensai.vector_model.VectorModel, abc.ABC

__init__(check_input_columns=True)

Parameters: check_input_columns – Whether to check if the input column list (after feature generation) during inference coincides with the input column list during fit. This should be disabled if feature generation is not performed by the model itself, e.g. in ensemble models.

is_regression_model() → bool

with_output_transformers(*output_transformers: Union[sensai.data_transformation.dft.DataFrameTransformer, List[sensai.data_transformation.dft.DataFrameTransformer]]) → sensai.vector_model.TVectorRegressionModel

Makes the model use the given output transformers. Call with empty input to remove existing output transformers. The transformers are ignored during the fit phase. Not supported for rule-based models.

Important: The output columns names of the last output transformer should be the same as the first one’s input column names. If this fails to hold, an exception will be raised when predict() is called.

Note: Output transformers perform post-processing after the actual predictions have been made. Contrary to invertible target transformers, they are not invoked during the fit phase. Therefore, any losses computed there, including the losses on validation sets (e.g. for early stopping), will be computed on the non-post-processed data. A possible use case for such post-processing is if you know how improve the predictions of your fittable model by some heuristics or by hand-crafted rules.

How not to use: Output transformers are not meant to transform the predictions into something with a different semantic meaning (e.g. normalized into non-normalized or something like that) - you should consider using a targetTransformer for this purpose. Instead, they give the possibility to improve predictions through post processing, when this is desired.

Parameters: output_transformers – DataFrameTransformers for the transformation of outputs (after the model has been applied)
Returns: self

with_target_transformer(target_transformer: Optional[sensai.data_transformation.dft.InvertibleDataFrameTransformer]) → sensai.vector_model.TVectorRegressionModel

Makes the model use the given target transformers such that the underlying low-level model is trained on the transformed targets, but this high-level model still outputs the original (untransformed) values, i.e. the transformation is applied to targets during training and the inverse transformation is applied to the underlying model’s predictions during inference. Hence the requirement of the transformer being invertible.

This method is not supported for rule-based models, because they are not trained and therefore the transformation would serve no purpose.

NOTE: All feature generators and data frame transformers - should they make use of outputs - will be fit on the untransformed target. The targetTransformer only affects the fitting of the underlying model.

Parameters: target_transformer – a transformer which transforms the targets (training data outputs) prior to learning the model, such that the model learns to predict the transformed outputs
Returns: self

get_target_transformer()

get_output_transformer_chain()

predict(x: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame

Applies the model to the given input data frame

Parameters: x – the input data frame
Returns: the model outputs in the form of a data frame whose index corresponds to the index of x

is_fitted()

Returns: True if the model has been fitted, False otherwise

get_model_output_variable_names(): Gets the list of variable names predicted by the underlying model. For the case where at training time the ground truth is transformed by a target transformer which changes column names, the names of the variables prior to the transformation will be returned. Thus this method always returns the variable names that are actually predicted by the underlying model alone. For the variable names that are ultimately output by the entire VectorModel instance when calling predict, use getPredictedVariableNames.

class VectorClassificationModel(check_input_columns=True)[source]

Bases: sensai.vector_model.VectorModel, abc.ABC

__init__(check_input_columns=True)

Parameters: check_input_columns – Whether to check if the input column list (after feature generation) during inference coincides with the input column list during fit. This should be disabled if feature generation is not performed by the model itself, e.g. in ensemble models.

is_regression_model() → bool

get_class_labels() → List[Any]

convert_class_probabilities_to_predictions(df: pandas.core.frame.DataFrame)

Converts from a data frame as returned by predictClassProbabilities to a result as return by predict.

Parameters: df – the output data frame from predictClassProbabilities
Returns: an output data frame as it would be returned by predict

predict_class_probabilities(x: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame

Parameters: x – the input data
Returns: a data frame where the list of columns is the list of class labels and the values are probabilities, with the same index as the input data frame. Raises an exception if the classifier cannot predict probabilities.

class RuleBasedVectorRegressionModel(predicted_variable_names: list)[source]

Bases: sensai.vector_model.VectorRegressionModel, abc.ABC

__init__(predicted_variable_names: list)

Parameters: predicted_variable_names – These are typically known at init time for rule-based models

class RuleBasedVectorClassificationModel(labels: list, predicted_variable_name='predictedLabel')[source]

Bases: sensai.vector_model.VectorClassificationModel, abc.ABC

__init__(labels: list, predicted_variable_name='predictedLabel')

Parameters

labels –
predicted_variable_name –