torch_base

class MCDropoutCapableNNModule(*args: Any, **kwargs: Any)[source]

Bases: torch.nn.Module, abc.ABC

Base class for NN modules that are to support MC-Dropout. Support can be added by applying the _dropout function in the module’s forward method. Then, to apply inference that samples results, call inferMCDropout rather than just using __call__.

__init__() → None

infer_mc_dropout(x: Union[torch.Tensor, Sequence[torch.Tensor]], num_samples, p=None) → Tuple[torch.Tensor, torch.Tensor]

Applies inference using MC-Dropout, drawing the given number of samples.

Parameters

x – the model input (a tensor or tuple/list of tensors)
num_samples – the number of samples to draw with MC-Dropout
p – the dropout probability to apply, overriding the probability specified by the model’s forward method; if None, use model’s default

Returns

a pair (y, sd) where y the mean output tensor and sd is a tensor of the same dimension containing standard deviations

class TorchModel(cuda=True)[source]

Bases: abc.ABC, sensai.util.string.ToStringMixin

sensAI abstraction for torch models, which supports one-line training, allows for convenient model application, has basic mechanisms for data scaling, and soundly handles persistence (via pickle). An instance wraps a torch.nn.Module, which is constructed on demand during training via the factory method createTorchModule.

__init__(cuda=True) → None

set_torch_module(module: torch.nn.Module) → None

set_normalisation_check_threshold(threshold: Optional[float])

get_module_bytes() → bytes

set_module_bytes(model_bytes: bytes) → None

get_torch_module() → torch.nn.Module

abstract create_torch_module() → torch.nn.Module

apply(x: Union[torch.Tensor, numpy.ndarray, sensai.torch.torch_data.TorchDataSet, Sequence[torch.Tensor]], as_numpy: bool = True, create_batch: bool = False, mc_dropout_samples: Optional[int] = None, mc_dropout_probability: Optional[float] = None, scale_output: bool = False, scale_input: bool = False) → Union[torch.Tensor, numpy.ndarray, Tuple]

Applies the model to the given input tensor and returns the result

Parameters

x – the input tensor (either a batch or, if createBatch=True, a single data point), a data set or a tuple/list of tensors (if the model accepts more than one input). If it is a data set, it will be processed at once, so the data set must not be too large to be processed at once.
as_numpy – flag indicating whether to convert the result to a numpy.array (if False, return tensor)
create_batch – whether to add an additional tensor dimension for a batch containing just one data point
mc_dropout_samples – if not None, apply MC-Dropout-based inference with the respective number of samples; if None, apply regular inference
mc_dropout_probability – the probability with which to apply dropouts in MC-Dropout-based inference; if None, use model’s default
scale_output – whether to scale the output that is produced by the underlying model (using this instance’s output scaler, if any)
scale_input – whether to scale the input (using this instance’s input scaler, if any) before applying the underlying model

Returns

an output tensor or, if MC-Dropout is applied, a pair (y, sd) where y the mean output tensor and sd is a tensor of the same dimension containing standard deviations

apply_scaled(x: Union[torch.Tensor, numpy.ndarray, sensai.torch.torch_data.TorchDataSet, Sequence[torch.Tensor]], as_numpy: bool = True, create_batch: bool = False, mc_dropout_samples: Optional[int] = None, mc_dropout_probability: Optional[float] = None) → Union[torch.Tensor, numpy.ndarray]

applies the model to the given input tensor and returns the scaled result (i.e. in the original scale)

Parameters

x – the input tensor(s) or data set
as_numpy – flag indicating whether to convert the result to a numpy.array (if False, return tensor)
create_batch – whether to add an additional tensor dimension for a batch containing just one data point
mc_dropout_samples – if not None, apply MC-Dropout-based inference with the respective number of samples; if None, apply regular inference
mc_dropout_probability – the probability with which to apply dropouts in MC-Dropout-based inference; if None, use model’s default

Returns

a scaled output tensor or, if MC-Dropout is applied, a pair (y, sd) of scaled tensors, where y the mean output tensor and sd is a tensor of the same dimension containing standard deviations

scaled_output(output: torch.Tensor) → torch.Tensor

fit(data: sensai.torch.torch_data.TorchDataSetProvider, nn_optimiser_params: sensai.torch.torch_opt.NNOptimiserParams, strategy: Optional[sensai.torch.torch_base.TorchModelFittingStrategy] = None) → None

Fits this model using the given model and strategy

Parameters

data – a provider for the data with which to fit the model
strategy – the fitting strategy; if None, use TorchModelFittingStrategyDefault. Pass your own strategy to perform custom fitting processes, e.g. process which involve multi-stage learning
nn_optimiser_params – the parameters with which to create an optimiser which can be applied in the fitting strategy

property best_epoch: Optional[int]

property total_epochs: Optional[int]

class TorchModelFittingStrategy[source]

Bases: abc.ABC

Defines the interface for fitting strategies that can be used in TorchModel.fit

abstract fit(model: sensai.torch.torch_base.TorchModel, data: sensai.torch.torch_data.TorchDataSetProvider, nn_optimiser: sensai.torch.torch_opt.NNOptimiser) → Optional[sensai.torch.torch_opt.TrainingInfo]

class TorchModelFittingStrategyDefault[source]

Bases: sensai.torch.torch_base.TorchModelFittingStrategy

Represents the default fitting strategy, which simply applies the given optimiser to the model and data

fit(model: sensai.torch.torch_base.TorchModel, data: sensai.torch.torch_data.TorchDataSetProvider, nn_optimiser: sensai.torch.torch_opt.NNOptimiser) → Optional[sensai.torch.torch_opt.TrainingInfo]

class TorchModelFromModuleFactory(module_factory: Callable[[...], torch.nn.Module], *args, cuda: bool = True, **kwargs)[source]

Bases: sensai.torch.torch_base.TorchModel

__init__(module_factory: Callable[[...], torch.nn.Module], *args, cuda: bool = True, **kwargs) → None

create_torch_module() → torch.nn.Module

class TorchModelFromModule(module: torch.nn.Module, cuda: bool = True)[source]

Bases: sensai.torch.torch_base.TorchModel

__init__(module: torch.nn.Module, cuda: bool = True)

create_torch_module() → torch.nn.Module

class TorchModelFactoryFromModule(module: torch.nn.Module, cuda: bool = True)[source]

Bases: object

Represents a factory for the creation of a TorchModel based on a torch module

__init__(module: torch.nn.Module, cuda: bool = True)

class VectorTorchModel(cuda: bool = True)[source]

Bases: sensai.torch.torch_base.TorchModel, abc.ABC

Base class for TorchModels that can be used within VectorModels, where the input and output dimensions are determined by the data

__init__(cuda: bool = True) → None

create_torch_module() → torch.nn.Module

abstract create_torch_module_for_dims(input_dim: int, output_dim: int) → torch.nn.Module

Parameters

input_dim – the number of input dimensions as reported by the data set provider (number of columns in input data frame for default providers)
output_dim – the number of output dimensions as reported by the data set provider (for default providers, this will be the number of columns in the output data frame or, for classification, the number of classes)

Returns

the torch module

class TorchAutoregressiveResultHandler[source]

Bases: abc.ABC

Supports the saving of predictions results such that subsequent predictions can build on earlier predictions, thus supporting autoregressive models.

abstract clear_results()

abstract save_results(input_df: pandas.core.frame.DataFrame, results: numpy.ndarray) → None

Saves the regression results such that they can be used as input for subsequent prediction steps. The input will typically be processed by a feature generator or vectoriser, so the result should be stored in a place from which the respective feature generator or vectoriser can retrieve it.

Parameters

input_df – the input data frame for which results were obtained (number of rows corresponds to length of results)
results – the results array, which is typically a 2D array where results[i] is an array containing the results for the i-th input row

class TorchVectorRegressionModel(torch_model_factory: Callable[[], sensai.torch.torch_base.TorchModel], normalisation_mode: sensai.normalisation.NormalisationMode = NormalisationMode.NONE, nn_optimiser_params: Optional[Union[dict, sensai.torch.torch_opt.NNOptimiserParams]] = None)[source]

Bases: sensai.vector_model.VectorRegressionModel

Base class for the implementation of VectorRegressionModels based on TorchModels. An instance of this class will have an instance of TorchModel as the underlying model.

__init__(torch_model_factory: Callable[[], sensai.torch.torch_base.TorchModel], normalisation_mode: sensai.normalisation.NormalisationMode = NormalisationMode.NONE, nn_optimiser_params: Optional[Union[dict, sensai.torch.torch_opt.NNOptimiserParams]] = None) → None

Parameters

torch_model_factory – the factory function with which to create the contained TorchModel instance that the instance is to encapsulate. For the instance to be picklable, this cannot be a lambda or locally defined function.
normalisation_mode – the normalisation mode to apply to input data frames
nn_optimiser_params – the parameters to apply in NNOptimiser during training

classmethod from_module(module: torch.nn.Module, cuda=True, normalisation_mode: sensai.normalisation.NormalisationMode = NormalisationMode.NONE, nn_optimiser_params: Optional[sensai.torch.torch_opt.NNOptimiserParams] = None) → sensai.torch.torch_base.TorchVectorRegressionModel

with_input_tensoriser(tensoriser: sensai.torch.torch_data.Tensoriser) → sensai.torch.torch_base.TTorchVectorRegressionModel

Parameters: tensoriser – tensoriser to use in order to convert input data frames to (one or more) tensors. The default tensoriser directly converts the data frame’s values (which is assumed to contain only scalars that can be coerced to floats) to a float tensor. The use of a custom tensoriser is necessary if a non-trivial conversion is necessary or if the data frame is to be converted to more than one input tensor.
Returns: self

with_output_tensoriser(tensoriser: sensai.torch.torch_data.RuleBasedTensoriser) → sensai.torch.torch_base.TTorchVectorRegressionModel

Parameters

tensoriser –

tensoriser to use in order to convert the output data frame to a tensor. The default output tensoriser directly converts the data frame’s values to a float tensor.

NOTE: It is required to be a rule-based tensoriser, because mechanisms that require fitting on the data and thus perform a data-dependendent conversion are likely to cause problems because they would need to be reversed at inference time (since the model will be trained on the converted values). If you require a transformation, use a target transformer, which will be applied before the tensoriser.

Returns

self

with_output_tensor_to_array_converter(output_tensor_to_array_converter: sensai.torch.torch_base.OutputTensorToArrayConverter) → sensai.torch.torch_base.TTorchVectorRegressionModel

Configures the use of a custom converter from tensors to numpy arrays, which is applied during inference. A custom converter can be required, for example, to handle variable-length outputs (where the output tensor will typically contain unwanted padding). Note that since the converter is for inference only, it may be required to use a custom loss evaluator during training if the use of a custom converter is necessary.

Parameters: output_tensor_to_array_converter – the converter
Returns: self

with_torch_data_set_provider_factory(torch_data_set_provider_factory: sensai.torch.torch_base.TorchDataSetProviderFactory) → sensai.torch.torch_base.TTorchVectorRegressionModel

Parameters: torch_data_set_provider_factory – the torch data set provider factory, which is used to instantiate the provider which will provide the training and validation data sets from the input data frame that is passed in for learning. By default, TorchDataSetProviderFactoryRegressionDefault is used.
Returns: self

with_data_frame_splitter(data_frame_splitter: sensai.data.DataFrameSplitter) → sensai.torch.torch_base.TTorchVectorRegressionModel

Parameters: data_frame_splitter – the data frame splitter which is used to split the input/output data frames that are passed for learning into a data frame that is used for training and a data frame that is used for validation. The input data frame is the data frame that is passed as input to the splitter, and the returned indices are used to split both the input and output data frames in the same way.
Returns: self

with_normalisation_check_threshold(threshold: Optional[float]) → sensai.torch.torch_base.TTorchVectorRegressionModel

Defines a threshold with which to check inputs that are passed to the underlying neural network. Whenever an (absolute) input value exceeds the threshold, a warning is triggered.

Parameters: threshold – the threshold
Returns: self

with_autoregressive_result_handler(result_handler: sensai.torch.torch_base.TorchAutoregressiveResultHandler, inference_batch_size=1) → sensai.torch.torch_base.TTorchVectorRegressionModel

Adds a result handler which can be used to store prediction results such that subsequent predictions can use the prediction result, supporting autoregressive models. The autoregressive predictions are assumed to be handled in a single call to method predict(), and the results will be stored for the duration of the call. For autoregressive predictions that build on earlier predictions, we must typically restrict the batch size such that predictions from the earlier batch can be saved and correctly reused as input for the subsequent predictions. The models input preprocessors (such as feature generators or vectorisers) must make ensure that the results being stored by the result handler are appropriately used as input.

Parameters

result_handler – the result handler
inference_batch_size – the batch size to use for predictions

Returns

self

class TorchVectorClassificationModel(output_mode: sensai.torch.torch_enums.ClassificationOutputMode, torch_model_factory: Callable[[], sensai.torch.torch_base.TorchModel], normalisation_mode: sensai.normalisation.NormalisationMode = NormalisationMode.NONE, nn_optimiser_params: Optional[sensai.torch.torch_opt.NNOptimiserParams] = None)[source]

Bases: sensai.vector_model.VectorClassificationModel

Base class for the implementation of VectorClassificationModels based on TorchModels. An instance of this class will have an instance of TorchModel as the underlying model.

__init__(output_mode: sensai.torch.torch_enums.ClassificationOutputMode, torch_model_factory: Callable[[], sensai.torch.torch_base.TorchModel], normalisation_mode: sensai.normalisation.NormalisationMode = NormalisationMode.NONE, nn_optimiser_params: Optional[sensai.torch.torch_opt.NNOptimiserParams] = None) → None

Parameters

output_mode – specifies the nature of the output of the underlying neural network model
torch_model_factory – the factory function with which to create the contained TorchModel instance that the instance is to encapsulate. For the instance to be picklable, this cannot be a lambda or locally defined function.
normalisation_mode – the normalisation mode to apply to input data frames
nn_optimiser_params – the parameters to apply in NNOptimiser during training

classmethod from_module(module: torch.nn.Module, output_mode: sensai.torch.torch_enums.ClassificationOutputMode, cuda=True, normalisation_mode: sensai.normalisation.NormalisationMode = NormalisationMode.NONE, nn_optimiser_params: Optional[sensai.torch.torch_opt.NNOptimiserParams] = None) → sensai.torch.torch_base.TorchVectorClassificationModel

with_input_tensoriser(tensoriser: sensai.torch.torch_data.Tensoriser) → sensai.torch.torch_base.TTorchVectorClassificationModel

Parameters: tensoriser – tensoriser to use in order to convert input data frames to (one or more) tensors. The default tensoriser directly converts the data frame’s values (which is assumed to contain only scalars that can be coerced to floats) to a float tensor. The use of a custom tensoriser is necessary if a non-trivial conversion is necessary or if the data frame is to be converted to more than one input tensor.
Returns: self

with_output_tensoriser(tensoriser: sensai.torch.torch_data.RuleBasedTensoriser) → sensai.torch.torch_base.TTorchVectorClassificationModel

Parameters: tensoriser – tensoriser to use in order to convert the output data frame to a tensor. NOTE: It is required to be a rule-based tensoriser, because mechanisms that require fitting on the data and thus perform a data-dependendent conversion are likely to cause problems because they would need to be reversed at inference time (since the model will be trained on the converted values). If you require a transformation, use a target transformer, which will be applied before the tensoriser.

with_torch_data_set_provider_factory(torch_data_set_provider_factory: sensai.torch.torch_base.TorchDataSetProviderFactory) → sensai.torch.torch_base.TTorchVectorClassificationModel

Parameters: torch_data_set_provider_factory – the torch data set provider factory, which is used to instantiate the provider which will provide the training and validation data sets from the input data frame that is passed in for learning. By default, TorchDataSetProviderFactoryClassificationDefault is used.
Returns: self

with_data_frame_splitter(data_frame_splitter: sensai.data.DataFrameSplitter) → sensai.torch.torch_base.TTorchVectorClassificationModel

Parameters: data_frame_splitter – the data frame splitter which is used to split the input/output data frames that are passed for learning into a data frame that is used for training and a data frame that is used for validation. The input data frame is the data frame that is passed as input to the splitter, and the returned indices are used to split both the input and output data frames in the same way.
Returns: self

with_normalisation_check_threshold(threshold: Optional[float]) → sensai.torch.torch_base.TTorchVectorClassificationModel

Defines a threshold with which to check inputs that are passed to the underlying neural network. Whenever an (absolute) input value exceeds the threshold, a warning is triggered.

Parameters: threshold – the threshold
Returns: self

class TorchDataSetProviderFactory[source]

Bases: abc.ABC

abstract create_data_set_provider(inputs: pandas.core.frame.DataFrame, outputs: pandas.core.frame.DataFrame, model: Union[sensai.torch.torch_base.TorchVectorRegressionModel, sensai.torch.torch_base.TorchVectorClassificationModel], training_context: sensai.vector_model.TrainingContext, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser], output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser], data_frame_splitter: Optional[sensai.data.DataFrameSplitter]) → sensai.torch.torch_data.TorchDataSetProvider

class TorchDataSetProviderFactoryClassificationDefault(tensorise_dynamically=False)[source]

Bases: sensai.torch.torch_base.TorchDataSetProviderFactory

__init__(tensorise_dynamically=False)

Parameters: tensorise_dynamically – whether tensorisation shall take place on the fly whenever the provided data sets are iterated; if False, tensorisation takes place once in a precomputation stage (tensors must jointly fit into memory)

create_data_set_provider(inputs: pandas.core.frame.DataFrame, outputs: pandas.core.frame.DataFrame, model: sensai.torch.torch_base.TorchVectorClassificationModel, training_context: sensai.vector_model.TrainingContext, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser], output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser], data_frame_splitter: Optional[sensai.data.DataFrameSplitter]) → sensai.torch.torch_data.TorchDataSetProvider

class TorchDataSetProviderFactoryRegressionDefault(tensorise_dynamically=False)[source]

Bases: sensai.torch.torch_base.TorchDataSetProviderFactory

__init__(tensorise_dynamically=False)

Parameters: tensorise_dynamically – whether tensorisation shall take place on the fly whenever the provided data sets are iterated; if False, tensorisation takes place once in a precomputation stage (tensors must jointly fit into memory)

create_data_set_provider(inputs: pandas.core.frame.DataFrame, outputs: pandas.core.frame.DataFrame, model: sensai.torch.torch_base.TorchVectorRegressionModel, training_context: sensai.vector_model.TrainingContext, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser], output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser], data_frame_splitter: Optional[sensai.data.DataFrameSplitter]) → sensai.torch.torch_data.TorchDataSetProvider

class OutputTensorToArrayConverter[source]

Bases: abc.ABC

abstract convert(model_output: torch.Tensor, model_input: Union[torch.Tensor, Sequence[torch.Tensor]]) → numpy.ndarray

Parameters

model_output – the output tensor generated by the model
model_input – the input tensor(s) for which the model produced the output (which may provide relevant meta-data)

Returns

a numpy array of shape (N, D) where N=output.shape[0] is the number of data points and D is the number of variables predicted by the model