Using Neural Networks for L96 Parameterization: Online Testing#

Outline:#

In this notebook, we’ll extend upon the concepts learned in the previous notebook by implementing the neural networks as a parameterization in the single time scale Lorenz 96, and testing its impact in a simulation. Testing the impacts of parameterizations in a simulation is sometimes also referred to as online testing.

%matplotlib inline
import time

import matplotlib.pyplot as plt
import numpy as np
import torch
from torch import nn, optim
from torch.utils.data import TensorDataset, DataLoader

from L96_model import L96, RK4, L96_eq1_xdot
# Ensuring reproducibility
np.random.seed(14)
torch.manual_seed(14);

Load in the pre-trained parameterization networks#

We first load in the networks that were trained in the previous notebook.

linear_weights = torch.load("./networks/linear.pth")
local_FCNN_weights = torch.load("./networks/local_FCNN.pth")
nonlocal_FCNN_weights = torch.load("./networks/non_local_FCNN.pth")

We need to define the architecture that each model has before the weights can be assigned.

Hide code cell content
# The model architectures
# ---------------------------


class LinearRegression(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = nn.Linear(1, 1)  # A single input and a single output

    def forward(self, x):
        # This method is automatically executed when
        # we call a object of this class
        x = self.linear1(x)
        return x


class FCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = nn.Linear(1, 16)  # 8 inputs
        self.linear2 = nn.Linear(16, 16)
        self.linear3 = nn.Linear(16, 1)  # 8 outputs

        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.linear1(x))
        x = self.relu(self.linear2(x))
        x = self.linear3(x)
        return x


class NonLocal_FCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = nn.Linear(8, 16)  # 8 inputs
        self.linear2 = nn.Linear(16, 16)
        self.linear3 = nn.Linear(16, 8)  # 8 outputs

        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.linear1(x))
        x = self.relu(self.linear2(x))
        x = self.linear3(x)
        return x
# Initialize network instances and assign pre-trained weights.
linear_network = LinearRegression()
linear_network.load_state_dict(linear_weights)

local_fcnn_network = FCNN()
local_fcnn_network.load_state_dict(local_FCNN_weights)

nonlocal_fcnn_network = NonLocal_FCNN()
nonlocal_fcnn_network.load_state_dict(nonlocal_FCNN_weights)
<All keys matched successfully>

Adding Parameterizations to GCM#

To recap, [Lorenz, 1995] describes a two-time scale dynamical system using two equations which are:

\[\begin{gather*} \frac{d}{dt} X_k &= - X_{k-1} \left( X_{k-2} - X_{k+1} \right) - X_k + F - \left( \frac{hc}{b} \right) \sum_{j=0}^{J-1} Y_{j,k} \end{gather*}\]
\[\begin{gather*} \frac{d}{dt} Y_{j,k} &= - cbY_{j+1,k} \left( Y_{j+2,k} - Y_{j-1,k} \right) - c Y_{j,k} + \frac{hc}{b} X_k \end{gather*}\]

Note

All the GCM networks used in this notebook have been introduced earlier in notebooks Key aspects of GCMs parameterizations and Tuning GCM Parameterizations. To see the definition of those networks, expand the cells in the respective GCM sections below.

T_test = 10
forcing = 18
dt = 0.01

k = 8
j = 32

W = L96(k, j, F=forcing)

# Full L96 model (two time scale model)
X_full, _, _ = W.run(dt, T_test)
X_full = X_full.astype(np.float32)

init_conditions = X_full[0, :]

GCM Without Neural Network Parameterization#

Hide code cell content
class GCM_without_parameterization:
    """GCM without parameterization

    Args:
        F: Forcing term
        time_stepping: Time stepping method
    """

    def __init__(self, F, time_stepping=RK4):
        self.F = F
        self.time_stepping = time_stepping

    def rhs(self, X, _):
        """Compute right hand side of the the GCM equations"""
        return L96_eq1_xdot(X, self.F)

    def __call__(self, X0, dt, nt, param=[0]):
        """Run GCM

        Args:
            X0: Initial conditions of X
            dt: Time increment
            nt: Number of forward steps to take
            param: Parameters of closure

        Returns:
            Model output for all variables of X at each timestep
            along with the corresponding time units
        """
        time, hist, X = (
            dt * np.arange(nt + 1),
            np.zeros((nt + 1, len(X0))) * np.nan,
            X0.copy(),
        )
        hist[0] = X

        for n in range(nt):
            X = self.time_stepping(self.rhs, dt, X, param)
            hist[n + 1], time[n + 1] = X, dt * (n + 1)
        return hist, time
gcm_no_param = GCM_without_parameterization(forcing)
X_no_param, t = gcm_no_param(init_conditions, dt, int(T_test / dt))

GCM With Neural Network Parameterization#

Hide code cell content
class GCM_network:
    """GCM with neural network parameterization

    Args:
        F: Forcing term
        network: Neural network
        time_stepping: Time stepping method
    """

    def __init__(self, F, network, time_stepping=RK4):
        self.F = F
        self.network = network
        self.time_stepping = time_stepping

    def rhs(self, X, _):
        """Compute right hand side of the the GCM equations"""
        if self.network.linear1.in_features == 1:
            X_torch = torch.from_numpy(X)
            X_torch = torch.unsqueeze(X_torch, 1)
        else:
            X_torch = torch.from_numpy(np.expand_dims(X, 0))

        # Adding NN parameterization
        return L96_eq1_xdot(X, self.F) + np.squeeze(self.network(X_torch).data.numpy())

    def __call__(self, X0, dt, nt, param=[0]):
        """Run GCM

        Args:
            X0: Initial conditions of X
            dt: Time increment
            nt: Number of forward steps to take
            param: Parameters of closure

        Returns:
            Model output for all variables of X at each timestep
            along with the corresponding time units
        """
        time, hist, X = (
            dt * np.arange(nt + 1),
            np.zeros((nt + 1, len(X0))) * np.nan,
            X0.copy(),
        )
        hist[0] = X

        for n in range(nt):
            X = self.time_stepping(self.rhs, dt, X, param)
            hist[n + 1], time[n + 1] = X, dt * (n + 1)
        return hist, time
# Evaluate with linear network
gcm_linear_net = GCM_network(forcing, linear_network)
Xnn_linear, t = gcm_linear_net(init_conditions, dt, int(T_test / dt), linear_network)

# Evaluate with local FCNN
gcm_local_net = GCM_network(forcing, local_fcnn_network)
Xnn_local, t = gcm_local_net(init_conditions, dt, int(T_test / dt), local_fcnn_network)

# Evaluate with nonlocal FCNN
gcm_nonlocal_net = GCM_network(forcing, nonlocal_fcnn_network)
Xnn_nonlocal, t = gcm_nonlocal_net(
    init_conditions, dt, int(T_test / dt), nonlocal_fcnn_network
)

Comparing Results#

Comparing the predictions of GCM with different parameterizations.

time_i = 200
plt.figure(dpi=150)
plt.plot(t[:time_i], X_full[:time_i, 4], label="Full L96")
plt.plot(t[:time_i], X_no_param[:time_i, 4], "--", label="No parameterization")

plt.plot(t[:time_i], Xnn_linear[:time_i, 4], label="linear parameterization")

plt.plot(t[:time_i], Xnn_local[:time_i, 4], label="local NN")
plt.plot(t[:time_i], Xnn_nonlocal[:time_i, 4], label="nonlocal NN")
plt.legend(loc="upper left", fontsize=7);
../_images/182380f0ec2cd4008c052f74a5317eabfe91c6c004df7d4af8b5d78acc2d2a96.png

Checking over many different initial conditions#

err_linear, err_local, err_nonlocal = [], [], []
T_test = 1

for i in range(90):
    init_conditions_i = X_full[i * 10, :]

    # Evaluate with linear network
    gcm_linear_net = GCM_network(forcing, linear_network)
    Xnn_linear, t = gcm_linear_net(
        init_conditions_i, dt, int(T_test / dt), linear_network
    )

    # Evaluate with local FCNN
    gcm_local_net = GCM_network(forcing, local_fcnn_network)
    Xnn_local, t = gcm_local_net(
        init_conditions_i, dt, int(T_test / dt), local_fcnn_network
    )

    # Evaluate with nonlocal FCNN
    gcm_nonlocal_net = GCM_network(forcing, nonlocal_fcnn_network)
    Xnn_nonlocal, t = gcm_nonlocal_net(
        init_conditions_i, dt, int(T_test / dt), nonlocal_fcnn_network
    )

    # GCM parameterized by the global 3-layer network
    # gcm_net_3layers = GCM_network(forcing, nn_3l)
    # Xnn_3layer_i, t = gcm_net_3layers(init_conditions_i, dt, int(T_test / dt), nn_3l)

    # GCM parameterized by the linear network
    # gcm_net_1layers = GCM_network(forcing, linear_network)
    # Xnn_1layer_i, t = gcm_net_1layers(init_conditions_i, dt, int(T_test / dt), linear_network)

    err_linear.append(
        np.sum(np.abs(X_full[i * 10 : i * 10 + T_test * 100 + 1] - Xnn_linear))
    )

    err_local.append(
        np.sum(np.abs(X_full[i * 10 : i * 10 + T_test * 100 + 1] - Xnn_local))
    )

    err_nonlocal.append(
        np.sum(np.abs(X_full[i * 10 : i * 10 + T_test * 100 + 1] - Xnn_nonlocal))
    )


print(f"Sum of errors for linear: {sum(err_linear):.2f}")
print(f"Sum of errors for local neural network: {sum(err_local):.2f}")
print(f"Sum of errors for non-local neural network: {sum(err_nonlocal):.2f}")
Sum of errors for linear: 53703.23
Sum of errors for local neural network: 31679.62
Sum of errors for non-local neural network: 40778.95

In this evaluation we see that the neural networks perform much better than the linear model. However, incorporating non-locality did not improve model performance over the local model. It may be possible to have improved this by building a better non-local model by a larger parameter sweep.

Summary#

In this notebook we showed how neural networks can be added as a parameterization to the single timescale L96 model (the gcm analogue of the two timescale L96 model). The neural network parameterizations performed better than the linear parameterization, as shown by evaluating the model performance over many initial conditions.

In these notebook we have followed the strategy of :

  • running a realistic (more time scales resolved) simulation.

  • evaluating the impact of the fast scales (which could not be resolved if the simulation resolved fewer time scales) on the slow scales in the realistic simulation.

  • learning a functional relationship between the impact of the fast scales on the slow scale and the slow scales.

  • incoporating this relationship (ML based parameterization) in the model of the slow time scales, and evaluating the success (if any).

This is a strategy of offline training and online testing.

A different strategy to have would have been to not necessarily learn the impact of or the patterns in the sub-grid scales. But rather optimize a model such that when it is added to the slow time scale model, it leads to the time evolution of the slow time scale model being closer to the two time scale model (which is our ultimate goal). This strategy is sometimes called online training, and discussed in the next notebook. A short primer to this was presented in the notebook on gcm tuning, where we tuned parameters to match the time evolution of the more realistic model.