torch_base

class MCDropoutCapableNNModule(*args: Any, **kwargs: Any)[source]

Bases: torch.nn.Module, abc.ABC

Base class for NN modules that are to support MC-Dropout. Support can be added by applying the _dropout function in the module’s forward method. Then, to apply inference that samples results, call inferMCDropout rather than just using __call__.

__init__() None
infer_mc_dropout(x: Union[torch.Tensor, Sequence[torch.Tensor]], num_samples, p=None) Tuple[torch.Tensor, torch.Tensor]

Applies inference using MC-Dropout, drawing the given number of samples.

Parameters
  • x – the model input (a tensor or tuple/list of tensors)

  • num_samples – the number of samples to draw with MC-Dropout

  • p – the dropout probability to apply, overriding the probability specified by the model’s forward method; if None, use model’s default

Returns

a pair (y, sd) where y the mean output tensor and sd is a tensor of the same dimension containing standard deviations

class TorchModel(cuda=True)[source]

Bases: abc.ABC, sensai.util.string.ToStringMixin

sensAI abstraction for torch models, which supports one-line training, allows for convenient model application, has basic mechanisms for data scaling, and soundly handles persistence (via pickle). An instance wraps a torch.nn.Module, which is constructed on demand during training via the factory method createTorchModule.

__init__(cuda=True) None
set_torch_module(module: torch.nn.Module) None
set_normalisation_check_threshold(threshold: Optional[float])
get_module_bytes() bytes
set_module_bytes(model_bytes: bytes) None
get_torch_module() torch.nn.Module
abstract create_torch_module() torch.nn.Module
apply(x: Union[torch.Tensor, numpy.ndarray, sensai.torch.torch_data.TorchDataSet, Sequence[torch.Tensor]], as_numpy: bool = True, create_batch: bool = False, mc_dropout_samples: Optional[int] = None, mc_dropout_probability: Optional[float] = None, scale_output: bool = False, scale_input: bool = False) Union[torch.Tensor, numpy.ndarray, Tuple]

Applies the model to the given input tensor and returns the result

Parameters
  • x – the input tensor (either a batch or, if createBatch=True, a single data point), a data set or a tuple/list of tensors (if the model accepts more than one input). If it is a data set, it will be processed at once, so the data set must not be too large to be processed at once.

  • as_numpy – flag indicating whether to convert the result to a numpy.array (if False, return tensor)

  • create_batch – whether to add an additional tensor dimension for a batch containing just one data point

  • mc_dropout_samples – if not None, apply MC-Dropout-based inference with the respective number of samples; if None, apply regular inference

  • mc_dropout_probability – the probability with which to apply dropouts in MC-Dropout-based inference; if None, use model’s default

  • scale_output – whether to scale the output that is produced by the underlying model (using this instance’s output scaler, if any)

  • scale_input – whether to scale the input (using this instance’s input scaler, if any) before applying the underlying model

Returns

an output tensor or, if MC-Dropout is applied, a pair (y, sd) where y the mean output tensor and sd is a tensor of the same dimension containing standard deviations

apply_scaled(x: Union[torch.Tensor, numpy.ndarray, sensai.torch.torch_data.TorchDataSet, Sequence[torch.Tensor]], as_numpy: bool = True, create_batch: bool = False, mc_dropout_samples: Optional[int] = None, mc_dropout_probability: Optional[float] = None) Union[torch.Tensor, numpy.ndarray]

applies the model to the given input tensor and returns the scaled result (i.e. in the original scale)

Parameters
  • x – the input tensor(s) or data set

  • as_numpy – flag indicating whether to convert the result to a numpy.array (if False, return tensor)

  • create_batch – whether to add an additional tensor dimension for a batch containing just one data point

  • mc_dropout_samples – if not None, apply MC-Dropout-based inference with the respective number of samples; if None, apply regular inference

  • mc_dropout_probability – the probability with which to apply dropouts in MC-Dropout-based inference; if None, use model’s default

Returns

a scaled output tensor or, if MC-Dropout is applied, a pair (y, sd) of scaled tensors, where y the mean output tensor and sd is a tensor of the same dimension containing standard deviations

scaled_output(output: torch.Tensor) torch.Tensor
fit(data: sensai.torch.torch_data.TorchDataSetProvider, nn_optimiser_params: sensai.torch.torch_opt.NNOptimiserParams, strategy: Optional[sensai.torch.torch_base.TorchModelFittingStrategy] = None) None

Fits this model using the given model and strategy

Parameters
  • data – a provider for the data with which to fit the model

  • strategy – the fitting strategy; if None, use TorchModelFittingStrategyDefault. Pass your own strategy to perform custom fitting processes, e.g. process which involve multi-stage learning

  • nn_optimiser_params – the parameters with which to create an optimiser which can be applied in the fitting strategy

property best_epoch: Optional[int]
property total_epochs: Optional[int]
class TorchModelFittingStrategy[source]

Bases: abc.ABC

Defines the interface for fitting strategies that can be used in TorchModel.fit

abstract fit(model: sensai.torch.torch_base.TorchModel, data: sensai.torch.torch_data.TorchDataSetProvider, nn_optimiser: sensai.torch.torch_opt.NNOptimiser) Optional[sensai.torch.torch_opt.TrainingInfo]
class TorchModelFittingStrategyDefault[source]

Bases: sensai.torch.torch_base.TorchModelFittingStrategy

Represents the default fitting strategy, which simply applies the given optimiser to the model and data

fit(model: sensai.torch.torch_base.TorchModel, data: sensai.torch.torch_data.TorchDataSetProvider, nn_optimiser: sensai.torch.torch_opt.NNOptimiser) Optional[sensai.torch.torch_opt.TrainingInfo]
class TorchModelFromModuleFactory(module_factory: Callable[[...], torch.nn.Module], *args, cuda: bool = True, **kwargs)[source]

Bases: sensai.torch.torch_base.TorchModel

__init__(module_factory: Callable[[...], torch.nn.Module], *args, cuda: bool = True, **kwargs) None
create_torch_module() torch.nn.Module
class TorchModelFromModule(module: torch.nn.Module, cuda: bool = True)[source]

Bases: sensai.torch.torch_base.TorchModel

__init__(module: torch.nn.Module, cuda: bool = True)
create_torch_module() torch.nn.Module
class TorchModelFactoryFromModule(module: torch.nn.Module, cuda: bool = True)[source]

Bases: object

Represents a factory for the creation of a TorchModel based on a torch module

__init__(module: torch.nn.Module, cuda: bool = True)
class VectorTorchModel(cuda: bool = True)[source]

Bases: sensai.torch.torch_base.TorchModel, abc.ABC

Base class for TorchModels that can be used within VectorModels, where the input and output dimensions are determined by the data

__init__(cuda: bool = True) None
create_torch_module() torch.nn.Module
abstract create_torch_module_for_dims(input_dim: int, output_dim: int) torch.nn.Module
Parameters
  • input_dim – the number of input dimensions as reported by the data set provider (number of columns in input data frame for default providers)

  • output_dim – the number of output dimensions as reported by the data set provider (for default providers, this will be the number of columns in the output data frame or, for classification, the number of classes)

Returns

the torch module

class TorchAutoregressiveResultHandler[source]

Bases: abc.ABC

Supports the saving of predictions results such that subsequent predictions can build on earlier predictions, thus supporting autoregressive models.

abstract clear_results()
abstract save_results(input_df: pandas.core.frame.DataFrame, results: numpy.ndarray) None

Saves the regression results such that they can be used as input for subsequent prediction steps. The input will typically be processed by a feature generator or vectoriser, so the result should be stored in a place from which the respective feature generator or vectoriser can retrieve it.

Parameters
  • input_df – the input data frame for which results were obtained (number of rows corresponds to length of results)

  • results – the results array, which is typically a 2D array where results[i] is an array containing the results for the i-th input row

class TorchVectorRegressionModel(torch_model_factory: Callable[[], sensai.torch.torch_base.TorchModel], normalisation_mode: sensai.normalisation.NormalisationMode = NormalisationMode.NONE, nn_optimiser_params: Optional[Union[dict, sensai.torch.torch_opt.NNOptimiserParams]] = None)[source]

Bases: sensai.vector_model.VectorRegressionModel

Base class for the implementation of VectorRegressionModels based on TorchModels. An instance of this class will have an instance of TorchModel as the underlying model.

__init__(torch_model_factory: Callable[[], sensai.torch.torch_base.TorchModel], normalisation_mode: sensai.normalisation.NormalisationMode = NormalisationMode.NONE, nn_optimiser_params: Optional[Union[dict, sensai.torch.torch_opt.NNOptimiserParams]] = None) None
Parameters
  • torch_model_factory – the factory function with which to create the contained TorchModel instance that the instance is to encapsulate. For the instance to be picklable, this cannot be a lambda or locally defined function.

  • normalisation_mode – the normalisation mode to apply to input data frames

  • nn_optimiser_params – the parameters to apply in NNOptimiser during training

classmethod from_module(module: torch.nn.Module, cuda=True, normalisation_mode: sensai.normalisation.NormalisationMode = NormalisationMode.NONE, nn_optimiser_params: Optional[sensai.torch.torch_opt.NNOptimiserParams] = None) sensai.torch.torch_base.TorchVectorRegressionModel
with_input_tensoriser(tensoriser: sensai.torch.torch_data.Tensoriser) sensai.torch.torch_base.TTorchVectorRegressionModel
Parameters

tensoriser – tensoriser to use in order to convert input data frames to (one or more) tensors. The default tensoriser directly converts the data frame’s values (which is assumed to contain only scalars that can be coerced to floats) to a float tensor. The use of a custom tensoriser is necessary if a non-trivial conversion is necessary or if the data frame is to be converted to more than one input tensor.

Returns

self

with_output_tensoriser(tensoriser: sensai.torch.torch_data.RuleBasedTensoriser) sensai.torch.torch_base.TTorchVectorRegressionModel
Parameters

tensoriser

tensoriser to use in order to convert the output data frame to a tensor. The default output tensoriser directly converts the data frame’s values to a float tensor.

NOTE: It is required to be a rule-based tensoriser, because mechanisms that require fitting on the data and thus perform a data-dependendent conversion are likely to cause problems because they would need to be reversed at inference time (since the model will be trained on the converted values). If you require a transformation, use a target transformer, which will be applied before the tensoriser.

Returns

self

with_output_tensor_to_array_converter(output_tensor_to_array_converter: sensai.torch.torch_base.OutputTensorToArrayConverter) sensai.torch.torch_base.TTorchVectorRegressionModel

Configures the use of a custom converter from tensors to numpy arrays, which is applied during inference. A custom converter can be required, for example, to handle variable-length outputs (where the output tensor will typically contain unwanted padding). Note that since the converter is for inference only, it may be required to use a custom loss evaluator during training if the use of a custom converter is necessary.

Parameters

output_tensor_to_array_converter – the converter

Returns

self

with_torch_data_set_provider_factory(torch_data_set_provider_factory: sensai.torch.torch_base.TorchDataSetProviderFactory) sensai.torch.torch_base.TTorchVectorRegressionModel
Parameters

torch_data_set_provider_factory – the torch data set provider factory, which is used to instantiate the provider which will provide the training and validation data sets from the input data frame that is passed in for learning. By default, TorchDataSetProviderFactoryRegressionDefault is used.

Returns

self

with_data_frame_splitter(data_frame_splitter: sensai.data.DataFrameSplitter) sensai.torch.torch_base.TTorchVectorRegressionModel
Parameters

data_frame_splitter – the data frame splitter which is used to split the input/output data frames that are passed for learning into a data frame that is used for training and a data frame that is used for validation. The input data frame is the data frame that is passed as input to the splitter, and the returned indices are used to split both the input and output data frames in the same way.

Returns

self

with_normalisation_check_threshold(threshold: Optional[float]) sensai.torch.torch_base.TTorchVectorRegressionModel

Defines a threshold with which to check inputs that are passed to the underlying neural network. Whenever an (absolute) input value exceeds the threshold, a warning is triggered.

Parameters

threshold – the threshold

Returns

self

with_autoregressive_result_handler(result_handler: sensai.torch.torch_base.TorchAutoregressiveResultHandler, inference_batch_size=1) sensai.torch.torch_base.TTorchVectorRegressionModel

Adds a result handler which can be used to store prediction results such that subsequent predictions can use the prediction result, supporting autoregressive models. The autoregressive predictions are assumed to be handled in a single call to method predict(), and the results will be stored for the duration of the call. For autoregressive predictions that build on earlier predictions, we must typically restrict the batch size such that predictions from the earlier batch can be saved and correctly reused as input for the subsequent predictions. The models input preprocessors (such as feature generators or vectorisers) must make ensure that the results being stored by the result handler are appropriately used as input.

Parameters
  • result_handler – the result handler

  • inference_batch_size – the batch size to use for predictions

Returns

self

class TorchVectorClassificationModel(output_mode: sensai.torch.torch_enums.ClassificationOutputMode, torch_model_factory: Callable[[], sensai.torch.torch_base.TorchModel], normalisation_mode: sensai.normalisation.NormalisationMode = NormalisationMode.NONE, nn_optimiser_params: Optional[sensai.torch.torch_opt.NNOptimiserParams] = None)[source]

Bases: sensai.vector_model.VectorClassificationModel

Base class for the implementation of VectorClassificationModels based on TorchModels. An instance of this class will have an instance of TorchModel as the underlying model.

__init__(output_mode: sensai.torch.torch_enums.ClassificationOutputMode, torch_model_factory: Callable[[], sensai.torch.torch_base.TorchModel], normalisation_mode: sensai.normalisation.NormalisationMode = NormalisationMode.NONE, nn_optimiser_params: Optional[sensai.torch.torch_opt.NNOptimiserParams] = None) None
Parameters
  • output_mode – specifies the nature of the output of the underlying neural network model

  • torch_model_factory – the factory function with which to create the contained TorchModel instance that the instance is to encapsulate. For the instance to be picklable, this cannot be a lambda or locally defined function.

  • normalisation_mode – the normalisation mode to apply to input data frames

  • nn_optimiser_params – the parameters to apply in NNOptimiser during training

classmethod from_module(module: torch.nn.Module, output_mode: sensai.torch.torch_enums.ClassificationOutputMode, cuda=True, normalisation_mode: sensai.normalisation.NormalisationMode = NormalisationMode.NONE, nn_optimiser_params: Optional[sensai.torch.torch_opt.NNOptimiserParams] = None) sensai.torch.torch_base.TorchVectorClassificationModel
with_input_tensoriser(tensoriser: sensai.torch.torch_data.Tensoriser) sensai.torch.torch_base.TTorchVectorClassificationModel
Parameters

tensoriser – tensoriser to use in order to convert input data frames to (one or more) tensors. The default tensoriser directly converts the data frame’s values (which is assumed to contain only scalars that can be coerced to floats) to a float tensor. The use of a custom tensoriser is necessary if a non-trivial conversion is necessary or if the data frame is to be converted to more than one input tensor.

Returns

self

with_output_tensoriser(tensoriser: sensai.torch.torch_data.RuleBasedTensoriser) sensai.torch.torch_base.TTorchVectorClassificationModel
Parameters

tensoriser – tensoriser to use in order to convert the output data frame to a tensor. NOTE: It is required to be a rule-based tensoriser, because mechanisms that require fitting on the data and thus perform a data-dependendent conversion are likely to cause problems because they would need to be reversed at inference time (since the model will be trained on the converted values). If you require a transformation, use a target transformer, which will be applied before the tensoriser.

with_torch_data_set_provider_factory(torch_data_set_provider_factory: sensai.torch.torch_base.TorchDataSetProviderFactory) sensai.torch.torch_base.TTorchVectorClassificationModel
Parameters

torch_data_set_provider_factory – the torch data set provider factory, which is used to instantiate the provider which will provide the training and validation data sets from the input data frame that is passed in for learning. By default, TorchDataSetProviderFactoryClassificationDefault is used.

Returns

self

with_data_frame_splitter(data_frame_splitter: sensai.data.DataFrameSplitter) sensai.torch.torch_base.TTorchVectorClassificationModel
Parameters

data_frame_splitter – the data frame splitter which is used to split the input/output data frames that are passed for learning into a data frame that is used for training and a data frame that is used for validation. The input data frame is the data frame that is passed as input to the splitter, and the returned indices are used to split both the input and output data frames in the same way.

Returns

self

with_normalisation_check_threshold(threshold: Optional[float]) sensai.torch.torch_base.TTorchVectorClassificationModel

Defines a threshold with which to check inputs that are passed to the underlying neural network. Whenever an (absolute) input value exceeds the threshold, a warning is triggered.

Parameters

threshold – the threshold

Returns

self

class TorchDataSetProviderFactory[source]

Bases: abc.ABC

abstract create_data_set_provider(inputs: pandas.core.frame.DataFrame, outputs: pandas.core.frame.DataFrame, model: Union[sensai.torch.torch_base.TorchVectorRegressionModel, sensai.torch.torch_base.TorchVectorClassificationModel], training_context: sensai.vector_model.TrainingContext, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser], output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser], data_frame_splitter: Optional[sensai.data.DataFrameSplitter]) sensai.torch.torch_data.TorchDataSetProvider
class TorchDataSetProviderFactoryClassificationDefault(tensorise_dynamically=False)[source]

Bases: sensai.torch.torch_base.TorchDataSetProviderFactory

__init__(tensorise_dynamically=False)
Parameters

tensorise_dynamically – whether tensorisation shall take place on the fly whenever the provided data sets are iterated; if False, tensorisation takes place once in a precomputation stage (tensors must jointly fit into memory)

create_data_set_provider(inputs: pandas.core.frame.DataFrame, outputs: pandas.core.frame.DataFrame, model: sensai.torch.torch_base.TorchVectorClassificationModel, training_context: sensai.vector_model.TrainingContext, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser], output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser], data_frame_splitter: Optional[sensai.data.DataFrameSplitter]) sensai.torch.torch_data.TorchDataSetProvider
class TorchDataSetProviderFactoryRegressionDefault(tensorise_dynamically=False)[source]

Bases: sensai.torch.torch_base.TorchDataSetProviderFactory

__init__(tensorise_dynamically=False)
Parameters

tensorise_dynamically – whether tensorisation shall take place on the fly whenever the provided data sets are iterated; if False, tensorisation takes place once in a precomputation stage (tensors must jointly fit into memory)

create_data_set_provider(inputs: pandas.core.frame.DataFrame, outputs: pandas.core.frame.DataFrame, model: sensai.torch.torch_base.TorchVectorRegressionModel, training_context: sensai.vector_model.TrainingContext, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser], output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser], data_frame_splitter: Optional[sensai.data.DataFrameSplitter]) sensai.torch.torch_data.TorchDataSetProvider
class OutputTensorToArrayConverter[source]

Bases: abc.ABC

abstract convert(model_output: torch.Tensor, model_input: Union[torch.Tensor, Sequence[torch.Tensor]]) numpy.ndarray
Parameters
  • model_output – the output tensor generated by the model

  • model_input – the input tensor(s) for which the model produced the output (which may provide relevant meta-data)

Returns

a numpy array of shape (N, D) where N=output.shape[0] is the number of data points and D is the number of variables predicted by the model