torch_data
- class TensorScaler[source]
Bases:
abc.ABC
- abstract cuda()
Makes this scaler’s components use CUDA
- abstract normalise(tensor: torch.Tensor) torch.Tensor
Applies scaling/normalisation to the given tensor :param tensor: the tensor to scale/normalise :return: the scaled/normalised tensor
- abstract denormalise(tensor: torch.Tensor) torch.Tensor
Applies the inverse of method normalise to the given tensor :param tensor: the tensor to denormalise :return: the denormalised tensor
- class TensorScalerCentreAndScale(centre: Optional[torch.Tensor] = None, scale: Optional[torch.Tensor] = None)[source]
Bases:
sensai.torch.torch_data.TensorScaler
- __init__(centre: Optional[torch.Tensor] = None, scale: Optional[torch.Tensor] = None)
- cuda()
Makes this scaler’s components use CUDA
- normalise(tensor: torch.Tensor) torch.Tensor
Applies scaling/normalisation to the given tensor :param tensor: the tensor to scale/normalise :return: the scaled/normalised tensor
- denormalise(tensor: torch.Tensor) torch.Tensor
Applies the inverse of method normalise to the given tensor :param tensor: the tensor to denormalise :return: the denormalised tensor
- class TensorScalerFromVectorDataScaler(vector_data_scaler: sensai.normalisation.VectorDataScaler, cuda: bool)[source]
Bases:
sensai.torch.torch_data.TensorScalerCentreAndScale
- __init__(vector_data_scaler: sensai.normalisation.VectorDataScaler, cuda: bool)
- class TensorScalerIdentity[source]
Bases:
sensai.torch.torch_data.TensorScaler
- cuda()
Makes this scaler’s components use CUDA
- normalise(tensor: torch.Tensor) torch.Tensor
Applies scaling/normalisation to the given tensor :param tensor: the tensor to scale/normalise :return: the scaled/normalised tensor
- denormalise(tensor: torch.Tensor) torch.Tensor
Applies the inverse of method normalise to the given tensor :param tensor: the tensor to denormalise :return: the denormalised tensor
- class TensorScalerFromDFTSkLearnTransformer(dft: sensai.data_transformation.dft.DFTSkLearnTransformer)[source]
Bases:
sensai.torch.torch_data.TensorScalerCentreAndScale
- __init__(dft: sensai.data_transformation.dft.DFTSkLearnTransformer)
- class Tensoriser[source]
Bases:
abc.ABC
Represents a method for transforming a data frame into one or more tensors to be processed by a neural network model
- tensorise(df: pandas.core.frame.DataFrame) Union[torch.Tensor, List[torch.Tensor]]
- abstract fit(df: pandas.core.frame.DataFrame, model=None)
- Parameters
df – the data frame with which to fit this tensoriser
model – the model in the context of which the fitting takes place (if any). The fitting process may set parameters within the model that can only be determined from the (pre-tensorised) data.
- class RuleBasedTensoriser[source]
Bases:
sensai.torch.torch_data.Tensoriser
,abc.ABC
Base class for tensorisers which transform data frames into tensors based on a predefined set of rules and do not require fitting
- fit(df: pandas.core.frame.DataFrame, model=None)
- Parameters
df – the data frame with which to fit this tensoriser
model – the model in the context of which the fitting takes place (if any). The fitting process may set parameters within the model that can only be determined from the (pre-tensorised) data.
- class DataUtil[source]
Bases:
abc.ABC
Interface for DataUtil classes, which are used to process data for neural networks
- abstract split_into_tensors(fractional_size_of_first_set) Tuple[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor]]
Splits the data set
- Parameters
fractional_size_of_first_set – the desired fractional size in
- Returns
a tuple (A, B) where A and B are tuples (in, out) with input and output data
- abstract get_output_tensor_scaler() sensai.torch.torch_data.TensorScaler
Gets the scaler with which to scale model outputs
- Returns
the scaler
- abstract get_input_tensor_scaler() sensai.torch.torch_data.TensorScaler
Gets the scaler with which to scale model inputs
- Returns
the scaler
- abstract model_output_dim() int
- Returns
the dimensionality that is to be output by the model to be trained
- abstract input_dim()
- Returns
the dimensionality of the input or None if it is variable
- class VectorDataUtil(inputs: pandas.core.frame.DataFrame, outputs: pandas.core.frame.DataFrame, cuda: bool, normalisation_mode=NormalisationMode.NONE, differing_output_normalisation_mode=None, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, data_frame_splitter: Optional[sensai.data.DataFrameSplitter] = None)[source]
Bases:
sensai.torch.torch_data.DataUtil
- __init__(inputs: pandas.core.frame.DataFrame, outputs: pandas.core.frame.DataFrame, cuda: bool, normalisation_mode=NormalisationMode.NONE, differing_output_normalisation_mode=None, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, data_frame_splitter: Optional[sensai.data.DataFrameSplitter] = None)
- Parameters
inputs – the data frame of inputs
outputs – the data frame of outputs
cuda – whether to apply CUDA
normalisation_mode – the normalisation mode to use for inputs and (unless differingOutputNormalisationMode is specified) outputs
differing_output_normalisation_mode – the normalisation mode to apply to outputs, overriding normalisationMode; if None, use normalisationMode
- get_output_tensor_scaler()
Gets the scaler with which to scale model outputs
- Returns
the scaler
- get_input_tensor_scaler()
Gets the scaler with which to scale model inputs
- Returns
the scaler
- split_into_tensors(fractional_size_of_first_set)
Splits the data set
- Parameters
fractional_size_of_first_set – the desired fractional size in
- Returns
a tuple (A, B) where A and B are tuples (in, out) with input and output data
- split_into_data_sets(fractional_size_of_first_set, cuda: bool, tensorise_dynamically=False) Tuple[sensai.torch.torch_data.TorchDataSet, sensai.torch.torch_data.TorchDataSet]
- input_dim()
- Returns
the dimensionality of the input or None if it is variable
- output_dim()
- Returns
the dimensionality of the outputs (ground truth values)
- model_output_dim()
- Returns
the dimensionality that is to be output by the model to be trained
- class ClassificationVectorDataUtil(inputs: pandas.core.frame.DataFrame, outputs: pandas.core.frame.DataFrame, cuda, num_classes, normalisation_mode=NormalisationMode.NONE, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, data_frame_splitter: Optional[sensai.data.DataFrameSplitter] = None)[source]
Bases:
sensai.torch.torch_data.VectorDataUtil
- __init__(inputs: pandas.core.frame.DataFrame, outputs: pandas.core.frame.DataFrame, cuda, num_classes, normalisation_mode=NormalisationMode.NONE, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, data_frame_splitter: Optional[sensai.data.DataFrameSplitter] = None)
- Parameters
inputs – the data frame of inputs
outputs – the data frame of outputs
cuda – whether to apply CUDA
normalisation_mode – the normalisation mode to use for inputs and (unless differingOutputNormalisationMode is specified) outputs
differing_output_normalisation_mode – the normalisation mode to apply to outputs, overriding normalisationMode; if None, use normalisationMode
- model_output_dim()
- Returns
the dimensionality that is to be output by the model to be trained
- class TorchDataSet[source]
Bases:
object
- abstract iter_batches(batch_size: int, shuffle: bool = False, input_only=False) Iterator[Union[Tuple[torch.Tensor, torch.Tensor], Tuple[Sequence[torch.Tensor], torch.Tensor], torch.Tensor, Sequence[torch.Tensor]]]
Provides an iterator over batches from the data set.
- Parameters
batch_size – the maximum size of each batch
shuffle – whether to shuffle the data set
input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.
- abstract size() Optional[int]
Returns the total size of the data set (number of data points) if it is known.
- Returns
the number of data points or None of the size is not known.
- class TorchDataSetProvider(input_tensor_scaler: Optional[sensai.torch.torch_data.TensorScaler] = None, output_tensor_scaler: Optional[sensai.torch.torch_data.TensorScaler] = None, input_dim: Optional[int] = None, model_output_dim: Optional[int] = None)[source]
Bases:
object
- __init__(input_tensor_scaler: Optional[sensai.torch.torch_data.TensorScaler] = None, output_tensor_scaler: Optional[sensai.torch.torch_data.TensorScaler] = None, input_dim: Optional[int] = None, model_output_dim: Optional[int] = None)
- abstract provide_split(fractional_size_of_first_set: float) Tuple[sensai.torch.torch_data.TorchDataSet, sensai.torch.torch_data.TorchDataSet]
Provides two data sets, which could, for example, serve as training and validation sets.
- Parameters
fractional_size_of_first_set – the fractional size of the first data set
- Returns
a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data
- get_output_tensor_scaler() sensai.torch.torch_data.TensorScaler
- get_input_tensor_scaler() sensai.torch.torch_data.TensorScaler
- get_model_output_dim() int
- Returns
the number of output dimensions that would be required to be generated by the model to match this dataset.
- get_input_dim() Optional[int]
- Returns
the number of output dimensions that would be required to be generated by the model to match this dataset. For models that accept variable input sizes (such as RNNs), this may be None.
- class TensorTuple(tensors: Union[torch.Tensor, Sequence[torch.Tensor]])[source]
Bases:
object
Represents a tuple of tensors (or a single tensor) and can be used to manipulate the contained tensors simultaneously
- __init__(tensors: Union[torch.Tensor, Sequence[torch.Tensor]])
- tuple() Sequence[torch.Tensor]
- item() Union[torch.Tensor, Sequence[torch.Tensor]]
- class TorchDataSetFromTensors(x: Union[torch.Tensor, Sequence[torch.Tensor]], y: Optional[torch.Tensor], cuda: bool)[source]
Bases:
sensai.torch.torch_data.TorchDataSet
- __init__(x: Union[torch.Tensor, Sequence[torch.Tensor]], y: Optional[torch.Tensor], cuda: bool)
- Parameters
x – the input tensor(s); if more than one, they must be of the same length (and a slice of each shall be provided to the model as an input in each batch)
y – the output tensor
cuda – whether any generated tensors shall be moved to the selected CUDA device
- iter_batches(batch_size: int, shuffle: bool = False, input_only=False) Iterator[Union[Tuple[torch.Tensor, torch.Tensor], Tuple[Sequence[torch.Tensor], torch.Tensor], torch.Tensor, Sequence[torch.Tensor]]]
Provides an iterator over batches from the data set.
- Parameters
batch_size – the maximum size of each batch
shuffle – whether to shuffle the data set
input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.
- size()
Returns the total size of the data set (number of data points) if it is known.
- Returns
the number of data points or None of the size is not known.
- class TorchDataSetFromDataFramesPreTensorised(input_df: pandas.core.frame.DataFrame, output_df: Optional[pandas.core.frame.DataFrame], cuda: bool, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None)[source]
Bases:
sensai.torch.torch_data.TorchDataSetFromTensors
- __init__(input_df: pandas.core.frame.DataFrame, output_df: Optional[pandas.core.frame.DataFrame], cuda: bool, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None)
- Parameters
x – the input tensor(s); if more than one, they must be of the same length (and a slice of each shall be provided to the model as an input in each batch)
y – the output tensor
cuda – whether any generated tensors shall be moved to the selected CUDA device
- class TorchDataSetFromDataFramesDynamicallyTensorised(input_df: pandas.core.frame.DataFrame, output_df: Optional[pandas.core.frame.DataFrame], cuda: bool, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None)[source]
Bases:
sensai.torch.torch_data.TorchDataSet
- __init__(input_df: pandas.core.frame.DataFrame, output_df: Optional[pandas.core.frame.DataFrame], cuda: bool, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None)
- size() Optional[int]
Returns the total size of the data set (number of data points) if it is known.
- Returns
the number of data points or None of the size is not known.
- iter_batches(batch_size: int, shuffle: bool = False, input_only=False)
Provides an iterator over batches from the data set.
- Parameters
batch_size – the maximum size of each batch
shuffle – whether to shuffle the data set
input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.
- class TorchDataSetFromDataFrames(input_df: pandas.core.frame.DataFrame, output_df: Optional[pandas.core.frame.DataFrame], cuda: bool, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, tensorise_dynamically=False)[source]
Bases:
sensai.torch.torch_data.TorchDataSet
- __init__(input_df: pandas.core.frame.DataFrame, output_df: Optional[pandas.core.frame.DataFrame], cuda: bool, input_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, output_tensoriser: Optional[sensai.torch.torch_data.Tensoriser] = None, tensorise_dynamically=False)
- iter_batches(batch_size: int, shuffle: bool = False, input_only=False)
Provides an iterator over batches from the data set.
- Parameters
batch_size – the maximum size of each batch
shuffle – whether to shuffle the data set
input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.
- size() Optional[int]
Returns the total size of the data set (number of data points) if it is known.
- Returns
the number of data points or None of the size is not known.
- class TorchDataSetProviderFromDataUtil(data_util: sensai.torch.torch_data.DataUtil, cuda: bool)[source]
Bases:
sensai.torch.torch_data.TorchDataSetProvider
- __init__(data_util: sensai.torch.torch_data.DataUtil, cuda: bool)
- provide_split(fractional_size_of_first_set: float) Tuple[sensai.torch.torch_data.TorchDataSet, sensai.torch.torch_data.TorchDataSet]
Provides two data sets, which could, for example, serve as training and validation sets.
- Parameters
fractional_size_of_first_set – the fractional size of the first data set
- Returns
a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data
- class TorchDataSetProviderFromVectorDataUtil(data_util: sensai.torch.torch_data.VectorDataUtil, cuda: bool, tensorise_dynamically=False)[source]
Bases:
sensai.torch.torch_data.TorchDataSetProvider
- __init__(data_util: sensai.torch.torch_data.VectorDataUtil, cuda: bool, tensorise_dynamically=False)
- provide_split(fractional_size_of_first_set: float) Tuple[sensai.torch.torch_data.TorchDataSet, sensai.torch.torch_data.TorchDataSet]
Provides two data sets, which could, for example, serve as training and validation sets.
- Parameters
fractional_size_of_first_set – the fractional size of the first data set
- Returns
a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data