lstnet_modules

class LSTNetwork(*args: Any, **kwargs: Any)[source]

Bases: sensai.torch.torch_base.MCDropoutCapableNNModule

Network for (auto-regressive) time-series prediction with long- and short-term dependencies as proposed by G. Lai et al. It applies two parallel paths to a time series of size (numInputTimeSlices, inputDimPerTimeSlice):

Complex path with the following stages:

Convolutions on the time series input data (CNNs): For a CNN with numCnnTimeSlices (= kernel size), it produces an output series of size numInputTimeSlices-numCnnTimeSlices+1. If the number of parallel convolutions is numConvolutions, the total output size of this stage is thus numConvolutions*(numInputTimeSlices-numCnnTimeSlices+1)

Two RNN components which process the CNN output in parallel:

RNN (GRU) The output dimension of this stage is the hidden state of the GRU after seeing the entire input data from the previous stage, i.e. if has size hidRNN.

Skip-RNN (GRU), which processes time series elements that are ‘skip’ time slices apart. It does this by grouping the input such that ‘skip’ GRUs are applied in parallel, which all use the same parameters. If the hidden state dimension of each GRU is hidSkip, then the output size of this stage is skip*hidSkip.

Dense layer

Direct regression dense layer (so-called “highway” path) which uses the features of the last hwWindow time slices to directly make a prediction

The model ultimately combines the outputs of these two paths via a combination function. Many parts of the model are optional and can be completely disabled. The model can produce one or more (potentially multi-dimensional) outputs, where each output typically corresponds to a time slice for which a prediction is made.

The model expects as input a tensor of size (batchSize, numInputTimeSlices, inputDimPerTimeSlice). As output, the model will produce a tensor of size (batchSize, numOutputTimeSlices, outputDimPerTimeSlice) if mode==REGRESSION and a tensor of size (batchSize, outputDimPerTimeSlice=numClasses, numOutputTimeSlices) if mode==CLASSIFICATION; the latter shape matches what is required by the multi-dimensional case of loss function CrossEntropyLoss, for example, and therefore is suitable for classification use cases. For mode==ENCODER, the model will produce a tensor of size (batch_size, hidRNN + skip * hidSkip).

class Mode(value)

Bases: enum.Enum

An enumeration.

REGRESSION = 'regression'

CLASSIFICATION = 'classification'

ENCODER = 'encoder'

__init__(num_input_time_slices: int, input_dim_per_time_slice: int, num_output_time_slices: int = 1, output_dim_per_time_slice: int = 1, num_convolutions: int = 100, num_cnn_time_slices: int = 6, hid_rnn: int = 100, skip: int = 0, hid_skip: int = 5, hw_window: int = 0, hw_combine: str = 'plus', dropout=0.2, output_activation: Union[str, sensai.torch.torch_enums.ActivationFunction, Callable] = 'sigmoid', mode: sensai.torch.torch_models.lstnet.lstnet_modules.LSTNetwork.Mode = Mode.REGRESSION)

Parameters

num_input_time_slices – the number of input time slices
input_dim_per_time_slice – the dimension of the input data per time slice
num_output_time_slices – the number of time slices predicted by the model
output_dim_per_time_slice – the number of dimensions per output time slice. While this is the number of target variables per time slice for regression problems, this must be the number of classes for classification problems.
num_cnn_time_slices – the number of time slices considered by each convolution (i.e. it is one of the dimensions of the matrix used for convolutions, the other dimension being inputDimPerTimeSlice), a.k.a. “Ck”
num_convolutions – the number of separate convolutions to apply, i.e. the number of independent convolution matrices, a.k.a “hidC”; if it is 0, then the entire complex processing path is not applied.
hid_rnn – the number of hidden output dimensions for the RNN stage
skip – the number of time slices to skip for the skip-RNN. If it is 0, then the skip-RNN is not used.
hid_skip – the number of output dimensions of each of the skip parallel RNNs
hw_window – the number of time slices from the end of the input time series to consider as input for the highway component. If it is 0, the highway component is not used.
hw_combine – {“plus”, “product”, “bilinear”} the function with which the highway component’s output is combined with the complex path’s output
dropout – the dropout probability to use during training (dropouts are applied after every major step in the evaluation path)
output_activation – the output activation function
mode – the prediction mode. For CLASSIFICATION, the output tensor dimension ordering is adapted to suit loss functions such as CrossEntropyLoss. When set to ENCODER, will output the latent representation prior to the dense layer in the complex path of the network (see class docstring).

static compute_encoder_dim(hid_rnn: int, skip: int, hid_skip: int) → int

get_encoder_dim()

Returns: the vector dimension that is output for the case where mode=ENCODER

forward(x)