torchtext

class TorchtextDataSetFromDataFrame(*args: Any, **kwargs: Any)[source]

Bases: torchtext.data.Dataset

A specialisation of torchtext.data.Dataset, where the data is taken from a pandas.DataFrame

__init__(df: pandas.core.frame.DataFrame, fields: Dict[str, torchtext.data.Field])

Parameters

df – the data frame from which to obtain the data
fields – a mapping from column names in the given data frame to torchtext fields, i.e. the keys are the columns to read and the values are the fields to use for generated Example instances

class TorchDataSetFromTorchtextDataSet(dataSet: torchtext.data.Dataset, inputField: str, outputField: Optional[str], cuda: bool)[source]

Bases: sensai.torch.torch_data.TorchDataSet

__init__(dataSet: torchtext.data.Dataset, inputField: str, outputField: Optional[str], cuda: bool)

iter_batches(batch_size: int, shuffle: bool = False, input_only=False) → Generator[Union[Tuple[torch.Tensor, torch.Tensor], torch.Tensor], None, None]

Provides an iterator over batches from the data set.

Parameters

batch_size – the maximum size of each batch
shuffle – whether to shuffle the data set
input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.

size() → Optional[int]

Returns the total size of the data set (number of data points) if it is known.

Returns: the number of data points or None of the size is not known.

class TorchDataSetProviderFromTorchtextDataSet(dataSet: torchtext.data.Dataset, inputField: str, outputField: str, cuda: bool, model_output_dim, input_dim=None)[source]

Bases: sensai.torch.torch_data.TorchDataSetProvider

__init__(dataSet: torchtext.data.Dataset, inputField: str, outputField: str, cuda: bool, model_output_dim, input_dim=None)

provide_split(fractional_size_of_first_set: float) → Tuple[sensai.torch.torch_data.TorchDataSet, sensai.torch.torch_data.TorchDataSet]

Provides two data sets, which could, for example, serve as training and validation sets.

Parameters: fractional_size_of_first_set – the fractional size of the first data set
Returns: a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data