Causal-Retro-Causal Neural Network

Module

Prosper_nn provides implementations for specialized time series forecasting neural networks and related utility functions.

Copyright (C) 2022 Nico Beck, Julia Schemm, Henning Frechen, Jacob Fidorra,

Denni Schmidt, Sai Kiran Srivatsav Gollapalli

This file is part of Propser_nn.

Propser_nn is free software: you can redistribute it and/or modify

it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

class prosper_nn.models.crcnn.crcnn.CRCNN(n_state_neurons: int, n_features_Y: int, past_horizon: int, forecast_horizon: int, n_branches: int = 3, batchsize: int | None = None, sparsity: float = 0.0, activation: ~typing.Type[~torch.autograd.function.Function] = <built-in method tanh of type object>, init_state_causal: ~torch.Tensor | None = None, learn_init_state_causal: bool = True, init_state_retro_causal: ~torch.Tensor | None = None, learn_init_state_retro_causal: bool = True, teacher_forcing: float = 1, decrease_teacher_forcing: float = 0, mirroring: bool = False, no_ptf_mirror: bool = True)[source]

Bases: Module

The CRCNN class creates a Causal Retro-Causal Neural Network.

It consists of a number of branches, each one an HCNN model (using the HCNNCell), which are alternately causal (going forward in time) and retro-causal (going backward in time). The forecast between the last retro causal and the last causal branch is used for the actual forecast, the others are for training only. All causal branches use the same initial state (in the past) and the same state matrix A for making one step forward into the future. All the retro-causal branches use the same initial state (in the future) and the same state matrix A’ for making one step backward into the past.

When the errors are trained down to zero, the model converges to a CRCNN only containing one causal and one retro-causal branch. In contrast to the HCNN, the expectation at each time step is the sum of the outputs of one retro-causal and one causal branch. By this, hopefully both causal and retro-causal dynamics in the data are captured.

Parameters:
  • n_state_neurons (int) – The dimension of the state in the CRCNN Cell. It must be a positive integer.

  • n_features_Y (int) – The size of the data in each time step. It must be a positive integer and n_features_Y <= n_state_neurons.

  • past_horizon (int) – The past horizon gives the amount of time steps into the past where an observation is available.

  • forecast_horizon (int) – The forecast horizon gives the amount of time steps into the future, where no observation is available. It represents the amount of forecast steps the model returns.

  • n_branches (int) – The total number of branches of the CRCNN. n_branches must be minimum 3 for teacher forcing to work.

  • batchsize (int) – The amount of samples in each batch.

  • sparsity (float) – The share of weights that are set to zero in the matrices A in the causal and the retro-causal cell. These weights are not trainable and therefore always zero. For big matrices (dimension > 50) this can be necessary to guarantee numerical stability and it increases the long-term memory of the model.

  • activation (nn.functional, optional) – The activation function that is applied on the output of the hidden layers. The same function is used on all hidden layers. No function is applied if no function is given.

  • init_state_causal (torch.Tensor) – The initial states of (all) the causal branches of the CRCNN model. Can be given optionally and is chosen randomly if not specified. If given, it should have the shape = (1, n_state_neurons).

  • learn_init_state_causal (boolean) – Learn the initial hidden state of the causal branches or not.

  • init_state_retro_causal (torch.Tensor) – The initial states of (all) the retro causal branches of the CRCNN model. Can be given optionally and is chosen randomly if not specified. If given, it should have the shape = (1, n_state_neurons).

  • learn_init_state_retro_causal (boolean) – Learn the initial hidden state of the retro causal branches or not.

  • teacher_forcing (float) – The probability that teacher forcing is applied for a single state neuron. In each time step this is repeated and therefore enforces stochastic learning if the value is smaller than 1.

  • decrease_teacher_forcing (float) – The amount by which teacher_forcing is decreased each epoch.

  • mirroring (bool) – If set to True, the mirror trick is applied. This means that a future_bias is added that learns the forecast and is used as a fake future Y to do teacher forcing even in the future. Even if the mirror trick is used for training, mirroring should be set to False when forecasting in order to get the real forecast and not the fake forecasting error (s. Returns).

  • no_ptf_mirror (bool) – If mirroring is True and teacher_forcing < 1, the user can choose whether random teacher forcing is applied on the mirroring nodes or not. Therefore, it concerns the partial teacher forcing during the future.

Return type:

None

adjust_teacher_forcing()[source]

Decrease teacher_forcing each epoch by decrease_teacher_forcing until it reaches zero. :param None:

Return type:

None

forward(Y: Tensor) Tensor[source]
Parameters:

Y (torch.Tensor) – Y should be 3-dimensional with the shape = (past_horizon, batchsize, n_features_Y). This time series of observations is used for training the model in order to predict future observations.

Returns:

Contains past_error, i.e. the forecasting errors along the past_horizon where Y is known, and forecast, i.e. the forecast along the forecast_horizon, for each pair of causal and retro causal branches. If mirroring = True, the forecast contains the fake forecasting errors produced by using the future_bias as a fake future Y. In this case, therefore, the forecast should be used for training and the target should be 0. shape=(n_branches-1, past_horizon + forecast_horizon, batchsize, n_features_Y)

Return type:

torch.Tensor

Note

The model uses the Historical Consistent Neural Network Cell in both temporal directions.

Example

import torch

from prosper_nn.models.crcnn import CRCNN
import prosper_nn.utils.generate_time_series_data as gtsd
import prosper_nn.utils.create_input_ecnn_hcnn as ci

# Define network and data parameters
past_horizon = 10
forecast_horizon = 5
n_features_Y = 2
n_data = 20
n_state_neurons = 3
n_branches = 3
batchsize = 5

# Initialise Causal-Retro-Causal Neural Network
crcnn = CRCNN(n_state_neurons, n_features_Y, past_horizon, forecast_horizon, n_branches)

# Generate data with "unknown" variables U
Y, U = gtsd.sample_data(n_data, n_features_Y=n_features_Y - 1, n_features_U=1)
Y = torch.cat((Y, U), 1)
Y_batches = ci.create_input(Y, past_horizon, batchsize)

targets = torch.zeros((n_branches - 1, past_horizon, batchsize, n_features_Y))

# Train model
optimizer = torch.optim.Adam(crcnn.parameters())
loss_function = torch.nn.MSELoss()

for epoch in range(10):
    for batch_index in range(0, Y_batches.shape[0]):
        Y_batch = Y_batches[batch_index]
        model_output = crcnn(Y_batch)
        past_errors, forecasts = torch.split(model_output, past_horizon, dim=1)

        crcnn.zero_grad()
        loss = loss_function(past_errors, targets)
        loss.backward()
        optimizer.step()

Reference

Zimmermann HG., Tietz C., Grothmann R. (2012) Forecasting with Recurrent Neural Networks: 12 Tricks. In: Montavon G., Orr G.B., Müller KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_37