torchtext

class TorchtextDataSetFromDataFrame(*args: Any, **kwargs: Any)[source]

Bases: torchtext.data.Dataset

A specialisation of torchtext.data.Dataset, where the data is taken from a pandas.DataFrame

__init__(df: pandas.core.frame.DataFrame, fields: Dict[str, torchtext.data.Field])
Parameters
  • df – the data frame from which to obtain the data

  • fields – a mapping from column names in the given data frame to torchtext fields, i.e. the keys are the columns to read and the values are the fields to use for generated Example instances

class TorchDataSetFromTorchtextDataSet(dataSet: torchtext.data.Dataset, inputField: str, outputField: Optional[str], cuda: bool)[source]

Bases: sensai.torch.torch_data.TorchDataSet

__init__(dataSet: torchtext.data.Dataset, inputField: str, outputField: Optional[str], cuda: bool)
iter_batches(batch_size: int, shuffle: bool = False, input_only=False) Generator[Union[Tuple[torch.Tensor, torch.Tensor], torch.Tensor], None, None]

Provides an iterator over batches from the data set.

Parameters
  • batch_size – the maximum size of each batch

  • shuffle – whether to shuffle the data set

  • input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.

size() Optional[int]

Returns the total size of the data set (number of data points) if it is known.

Returns

the number of data points or None of the size is not known.

class TorchDataSetProviderFromTorchtextDataSet(dataSet: torchtext.data.Dataset, inputField: str, outputField: str, cuda: bool, model_output_dim, input_dim=None)[source]

Bases: sensai.torch.torch_data.TorchDataSetProvider

__init__(dataSet: torchtext.data.Dataset, inputField: str, outputField: str, cuda: bool, model_output_dim, input_dim=None)
provide_split(fractional_size_of_first_set: float) Tuple[sensai.torch.torch_data.TorchDataSet, sensai.torch.torch_data.TorchDataSet]

Provides two data sets, which could, for example, serve as training and validation sets.

Parameters

fractional_size_of_first_set – the fractional size of the first data set

Returns

a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data