torch_data

to_tensor(d: Union[torch.Tensor, numpy.ndarray, list], cuda=False)[source]
class TensorScaler[source]

Bases: abc.ABC

abstract cuda()

Makes this scaler’s components use CUDA

abstract normalise(tensor: torch.Tensor) torch.Tensor

Applies scaling/normalisation to the given tensor :param tensor: the tensor to scale/normalise :return: the scaled/normalised tensor

abstract denormalise(tensor: torch.Tensor) torch.Tensor

Applies the inverse of method normalise to the given tensor :param tensor: the tensor to denormalise :return: the denormalised tensor

class TensorScalerCentreAndScale(centre: Optional[torch.Tensor] = None, scale: Optional[torch.Tensor] = None)[source]

Bases: sensai.torch.torch_data.TensorScaler

__init__(centre: Optional[torch.Tensor] = None, scale: Optional[torch.Tensor] = None)
cuda()

Makes this scaler’s components use CUDA

normalise(tensor: torch.Tensor) torch.Tensor

Applies scaling/normalisation to the given tensor :param tensor: the tensor to scale/normalise :return: the scaled/normalised tensor

denormalise(tensor: torch.Tensor) torch.Tensor

Applies the inverse of method normalise to the given tensor :param tensor: the tensor to denormalise :return: the denormalised tensor

class TensorScalerFromVectorDataScaler(vector_data_scaler: sensai.normalisation.VectorDataScaler, cuda: bool)[source]

Bases: sensai.torch.torch_data.TensorScalerCentreAndScale

__init__(vector_data_scaler: sensai.normalisation.VectorDataScaler, cuda: bool)
class TensorScalerIdentity[source]

Bases: sensai.torch.torch_data.TensorScaler

cuda()

Makes this scaler’s components use CUDA

normalise(tensor: torch.Tensor) torch.Tensor

Applies scaling/normalisation to the given tensor :param tensor: the tensor to scale/normalise :return: the scaled/normalised tensor

denormalise(tensor: torch.Tensor) torch.Tensor

Applies the inverse of method normalise to the given tensor :param tensor: the tensor to denormalise :return: the denormalised tensor

class TensorScalerFromDFTSkLearnTransformer(dft: sensai.data_transformation.dft.DFTSkLearnTransformer)[source]

Bases: sensai.torch.torch_data.TensorScalerCentreAndScale

__init__(dft: sensai.data_transformation.dft.DFTSkLearnTransformer)
class Tensoriser[source]

Bases: abc.ABC

Represents a method for transforming a data frame into one or more tensors to be processed by a neural network model

tensorise(df: pandas.core.frame.DataFrame) Union[torch.Tensor, List[torch.Tensor]]
abstract fit(df: pandas.core.frame.DataFrame, model=None)
Parameters
  • df – the data frame with which to fit this tensoriser

  • model – the model in the context of which the fitting takes place (if any). The fitting process may set parameters within the model that can only be determined from the (pre-tensorised) data.

class RuleBasedTensoriser[source]

Bases: sensai.torch.torch_data.Tensoriser, abc.ABC

Base class for tensorisers which transform data frames into tensors based on a predefined set of rules and do not require fitting

fit(df: pandas.core.frame.DataFrame, model=None)
Parameters
  • df – the data frame with which to fit this tensoriser

  • model – the model in the context of which the fitting takes place (if any). The fitting process may set parameters within the model that can only be determined from the (pre-tensorised) data.

class TensoriserDataFrameFloatValuesMatrix[source]

Bases: sensai.torch.torch_data.RuleBasedTensoriser

class TensoriserClassLabelIndices[source]

Bases: sensai.torch.torch_data.RuleBasedTensoriser

class DataUtil[source]

Bases: abc.ABC

Interface for DataUtil classes, which are used to process data for neural networks

abstract split_into_tensors(fractional_size_of_first_set) Tuple[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor]]

Splits the data set

Parameters

fractional_size_of_first_set – the desired fractional size in

Returns

a tuple (A, B) where A and B are tuples (in, out) with input and output data

abstract get_output_tensor_scaler() sensai.torch.torch_data.TensorScaler

Gets the scaler with which to scale model outputs

Returns

the scaler

abstract get_input_tensor_scaler() sensai.torch.torch_data.TensorScaler

Gets the scaler with which to scale model inputs

Returns

the scaler

abstract model_output_dim() int
Returns

the dimensionality that is to be output by the model to be trained

abstract input_dim()
Returns

the dimensionality of the input or None if it is variable

class VectorDataUtil(inputs: pandas.core.frame.DataFrame, outputs: pandas.core.frame.DataFrame, cuda: bool, normalisation_mode=NormalisationMode.NONE, differing_output_normalisation_mode=None, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, data_frame_splitter: Optional[sensai.data.DataFrameSplitter] = None)[source]

Bases: sensai.torch.torch_data.DataUtil

__init__(inputs: pandas.core.frame.DataFrame, outputs: pandas.core.frame.DataFrame, cuda: bool, normalisation_mode=NormalisationMode.NONE, differing_output_normalisation_mode=None, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, data_frame_splitter: Optional[sensai.data.DataFrameSplitter] = None)
Parameters
  • inputs – the data frame of inputs

  • outputs – the data frame of outputs

  • cuda – whether to apply CUDA

  • normalisation_mode – the normalisation mode to use for inputs and (unless differingOutputNormalisationMode is specified) outputs

  • differing_output_normalisation_mode – the normalisation mode to apply to outputs, overriding normalisationMode; if None, use normalisationMode

get_output_tensor_scaler()

Gets the scaler with which to scale model outputs

Returns

the scaler

get_input_tensor_scaler()

Gets the scaler with which to scale model inputs

Returns

the scaler

split_into_tensors(fractional_size_of_first_set)

Splits the data set

Parameters

fractional_size_of_first_set – the desired fractional size in

Returns

a tuple (A, B) where A and B are tuples (in, out) with input and output data

split_into_data_sets(fractional_size_of_first_set, cuda: bool, tensorise_dynamically=False) Tuple[sensai.torch.torch_data.TorchDataSet, sensai.torch.torch_data.TorchDataSet]
input_dim()
Returns

the dimensionality of the input or None if it is variable

output_dim()
Returns

the dimensionality of the outputs (ground truth values)

model_output_dim()
Returns

the dimensionality that is to be output by the model to be trained

class ClassificationVectorDataUtil(inputs: pandas.core.frame.DataFrame, outputs: pandas.core.frame.DataFrame, cuda, num_classes, normalisation_mode=NormalisationMode.NONE, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, data_frame_splitter: Optional[sensai.data.DataFrameSplitter] = None)[source]

Bases: sensai.torch.torch_data.VectorDataUtil

__init__(inputs: pandas.core.frame.DataFrame, outputs: pandas.core.frame.DataFrame, cuda, num_classes, normalisation_mode=NormalisationMode.NONE, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, data_frame_splitter: Optional[sensai.data.DataFrameSplitter] = None)
Parameters
  • inputs – the data frame of inputs

  • outputs – the data frame of outputs

  • cuda – whether to apply CUDA

  • normalisation_mode – the normalisation mode to use for inputs and (unless differingOutputNormalisationMode is specified) outputs

  • differing_output_normalisation_mode – the normalisation mode to apply to outputs, overriding normalisationMode; if None, use normalisationMode

model_output_dim()
Returns

the dimensionality that is to be output by the model to be trained

class TorchDataSet[source]

Bases: object

abstract iter_batches(batch_size: int, shuffle: bool = False, input_only=False) Iterator[Union[Tuple[torch.Tensor, torch.Tensor], Tuple[Sequence[torch.Tensor], torch.Tensor], torch.Tensor, Sequence[torch.Tensor]]]

Provides an iterator over batches from the data set.

Parameters
  • batch_size – the maximum size of each batch

  • shuffle – whether to shuffle the data set

  • input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.

abstract size() Optional[int]

Returns the total size of the data set (number of data points) if it is known.

Returns

the number of data points or None of the size is not known.

class TorchDataSetProvider(input_tensor_scaler: Optional[sensai.torch.torch_data.TensorScaler] = None, output_tensor_scaler: Optional[sensai.torch.torch_data.TensorScaler] = None, input_dim: Optional[int] = None, model_output_dim: Optional[int] = None)[source]

Bases: object

__init__(input_tensor_scaler: Optional[sensai.torch.torch_data.TensorScaler] = None, output_tensor_scaler: Optional[sensai.torch.torch_data.TensorScaler] = None, input_dim: Optional[int] = None, model_output_dim: Optional[int] = None)
abstract provide_split(fractional_size_of_first_set: float) Tuple[sensai.torch.torch_data.TorchDataSet, sensai.torch.torch_data.TorchDataSet]

Provides two data sets, which could, for example, serve as training and validation sets.

Parameters

fractional_size_of_first_set – the fractional size of the first data set

Returns

a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data

get_output_tensor_scaler() sensai.torch.torch_data.TensorScaler
get_input_tensor_scaler() sensai.torch.torch_data.TensorScaler
get_model_output_dim() int
Returns

the number of output dimensions that would be required to be generated by the model to match this dataset.

get_input_dim() Optional[int]
Returns

the number of output dimensions that would be required to be generated by the model to match this dataset. For models that accept variable input sizes (such as RNNs), this may be None.

class TensorTuple(tensors: Union[torch.Tensor, Sequence[torch.Tensor]])[source]

Bases: object

Represents a tuple of tensors (or a single tensor) and can be used to manipulate the contained tensors simultaneously

__init__(tensors: Union[torch.Tensor, Sequence[torch.Tensor]])
cuda() sensai.torch.torch_data.TensorTuple
tuple() Sequence[torch.Tensor]
item() Union[torch.Tensor, Sequence[torch.Tensor]]
concat(other: sensai.torch.torch_data.TensorTuple) sensai.torch.torch_data.TensorTuple
class TorchDataSetFromTensors(x: Union[torch.Tensor, Sequence[torch.Tensor]], y: Optional[torch.Tensor], cuda: bool)[source]

Bases: sensai.torch.torch_data.TorchDataSet

__init__(x: Union[torch.Tensor, Sequence[torch.Tensor]], y: Optional[torch.Tensor], cuda: bool)
Parameters
  • x – the input tensor(s); if more than one, they must be of the same length (and a slice of each shall be provided to the model as an input in each batch)

  • y – the output tensor

  • cuda – whether any generated tensors shall be moved to the selected CUDA device

iter_batches(batch_size: int, shuffle: bool = False, input_only=False) Iterator[Union[Tuple[torch.Tensor, torch.Tensor], Tuple[Sequence[torch.Tensor], torch.Tensor], torch.Tensor, Sequence[torch.Tensor]]]

Provides an iterator over batches from the data set.

Parameters
  • batch_size – the maximum size of each batch

  • shuffle – whether to shuffle the data set

  • input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.

size()

Returns the total size of the data set (number of data points) if it is known.

Returns

the number of data points or None of the size is not known.

class TorchDataSetFromDataFramesPreTensorised(input_df: pandas.core.frame.DataFrame, output_df: Optional[pandas.core.frame.DataFrame], cuda: bool, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None)[source]

Bases: sensai.torch.torch_data.TorchDataSetFromTensors

__init__(input_df: pandas.core.frame.DataFrame, output_df: Optional[pandas.core.frame.DataFrame], cuda: bool, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None)
Parameters
  • x – the input tensor(s); if more than one, they must be of the same length (and a slice of each shall be provided to the model as an input in each batch)

  • y – the output tensor

  • cuda – whether any generated tensors shall be moved to the selected CUDA device

class TorchDataSetFromDataFramesDynamicallyTensorised(input_df: pandas.core.frame.DataFrame, output_df: Optional[pandas.core.frame.DataFrame], cuda: bool, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None)[source]

Bases: sensai.torch.torch_data.TorchDataSet

__init__(input_df: pandas.core.frame.DataFrame, output_df: Optional[pandas.core.frame.DataFrame], cuda: bool, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None)
size() Optional[int]

Returns the total size of the data set (number of data points) if it is known.

Returns

the number of data points or None of the size is not known.

iter_batches(batch_size: int, shuffle: bool = False, input_only=False)

Provides an iterator over batches from the data set.

Parameters
  • batch_size – the maximum size of each batch

  • shuffle – whether to shuffle the data set

  • input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.

class TorchDataSetFromDataFrames(input_df: pandas.core.frame.DataFrame, output_df: Optional[pandas.core.frame.DataFrame], cuda: bool, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, tensorise_dynamically=False)[source]

Bases: sensai.torch.torch_data.TorchDataSet

__init__(input_df: pandas.core.frame.DataFrame, output_df: Optional[pandas.core.frame.DataFrame], cuda: bool, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, tensorise_dynamically=False)
iter_batches(batch_size: int, shuffle: bool = False, input_only=False)

Provides an iterator over batches from the data set.

Parameters
  • batch_size – the maximum size of each batch

  • shuffle – whether to shuffle the data set

  • input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.

size() Optional[int]

Returns the total size of the data set (number of data points) if it is known.

Returns

the number of data points or None of the size is not known.

class TorchDataSetProviderFromDataUtil(data_util: sensai.torch.torch_data.DataUtil, cuda: bool)[source]

Bases: sensai.torch.torch_data.TorchDataSetProvider

__init__(data_util: sensai.torch.torch_data.DataUtil, cuda: bool)
provide_split(fractional_size_of_first_set: float) Tuple[sensai.torch.torch_data.TorchDataSet, sensai.torch.torch_data.TorchDataSet]

Provides two data sets, which could, for example, serve as training and validation sets.

Parameters

fractional_size_of_first_set – the fractional size of the first data set

Returns

a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data

class TorchDataSetProviderFromVectorDataUtil(data_util: sensai.torch.torch_data.VectorDataUtil, cuda: bool, tensorise_dynamically=False)[source]

Bases: sensai.torch.torch_data.TorchDataSetProvider

__init__(data_util: sensai.torch.torch_data.VectorDataUtil, cuda: bool, tensorise_dynamically=False)
provide_split(fractional_size_of_first_set: float) Tuple[sensai.torch.torch_data.TorchDataSet, sensai.torch.torch_data.TorchDataSet]

Provides two data sets, which could, for example, serve as training and validation sets.

Parameters

fractional_size_of_first_set – the fractional size of the first data set

Returns

a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data

class TensorTransformer[source]

Bases: abc.ABC

abstract transform(t: torch.Tensor) torch.Tensor