Understanding Fully Convolutional Neural Networks#

Now that we have generated coarsened, low-resolution datasets, we can now feed them as training data into our parameterized machine learning (ML) models and begin running predictions against unseen datasets. In this tutorial series, we will focus on one category of ML models, fully convolutional neural networks (FCNNs), though there are other strata of models that can be employed (and are explored within the paper) including hybrid linear and symbolic regression using genetic programming. Before we begin running these models and making predictions, in this notebook, we will take the time to better our understanding of FCNNs including the initialization process, data preprocessing, undergoing training sessions and generating predictions.

The local code that we utilize for running parameterized ML models resides within this repository. Our main focus of interest is in the files neural_networks.py and utils.py located in src/pyqg_parameterization_benchmarks.

Initializing and Instantiating FCNNs#

Starting at a high level, within neural networks.py, there sits the FCNNParameterization class. We use this class to generate parameterized FCNN models on which we can train and make predictions.

The function that sets up this initial creation is the class method train_on() which takes in the dataset that the models will be initially trained on, the path to save the models to as well as the inputs and targets we are training on as strings.

# neural_networks.py:244
class FCNNParameterization(Parameterization):
    
# neural_networks.py:281:287
@classmethod
def train_on(cls, dataset, directory,
        inputs=['q','u','v'], 
        targets=['q_subgrid_forcing'], # See https://m2lines.github.io/MLwithQG/notebooks/Applying_Filtering_and_Coarse-Graining_to_Generate_Low_Resolution_Datasets.html#subgrid-forcings for valid target values of sugrid forcing and flux
        num_epochs=50,
        zero_mean=True,
        padding='circular', **kw): # Accepts values 'same', 'circuluar', or None

We can also pass in arguments for additional parameters including the number of epochs, whether the final output layers should be constrained to have zero spatial mean when predicting the subgrid forcing target, and padding technique. This method then creates two FullyCNN objects, one for each layer of the quasigeostrophic model on which we ran simulations on.

# neural_networks.py:289:299
layers = range(len(dataset.lev))

models = [
    FullyCNN(
        [(feat, zi) for feat in inputs for zi in layers],
        [(feat, z) for feat in targets],
        zero_mean=zero_mean,
        padding=padding 

    ) for z in layers
]

By default, each FCNN model has 8 fully convolutional layers (128 and 64 filters for the first two layers, respectively and 32 thereafter), ReLU activations, batch normalization after all immediate layers, and circular padding due to the periodicity of the domain.

# neural_networks.py:14:23
class FullyCNN(nn.Sequential):
    """Pytorch class defining our CNN architecture, plus some helpers for
    dealing with constraints and scaling."""
    def __init__(self, inputs, targets, padding='circular', zero_mean=True):
        if padding is None:
            padding_5 = 0
            padding_3 = 0
        elif padding in ['same', 'circular']:
            padding_5 = 2
            padding_3 = 1

# neural_networks.py:35:42
block1 = self._make_subblock(nn.Conv2d(n_in, 128, 5, padding=padding_5, **kw))
block2 = self._make_subblock(nn.Conv2d(128, 64, 5, padding=padding_5, **kw))
block3 = self._make_subblock(nn.Conv2d(64, 32, 3, padding=padding_3, **kw))
block4 = self._make_subblock(nn.Conv2d(32, 32, 3, padding=padding_3, **kw))
block5 = self._make_subblock(nn.Conv2d(32, 32, 3, padding=padding_3, **kw))
block6 = self._make_subblock(nn.Conv2d(32, 32, 3, padding=padding_3, **kw))
block7 = self._make_subblock(nn.Conv2d(32, 32, 3, padding=padding_3, **kw))
conv8 = nn.Conv2d(32, n_out, 3, padding=padding_3)

# neural_networks.py:54:55
def _make_subblock(self, conv):
    return [conv, nn.ReLU(), nn.BatchNorm2d(conv.out_channels)]

Data Preprocessing and Preparation#

Feature Extraction#

Upon initializing the models, the method train_on() then trains the newly created models on the training dataset that was passed in. However, the raw training data must be preprocessed and prepared in order to be fed into the models for training. This is done by first extracting the relevant input and target feature values from the training dataset.

# neural_networks.py:308:309
X = model.extract_inputs(dataset)
Y = model.extract_targets(dataset)

# neural_networks.py:57:66
def extract_vars(self, m, features, dtype=np.float32):
    ex = FeatureExtractor(m)

    arr = np.stack([
        np.take(ex(feat), z, axis=-3) for feat, z in features
    ], axis=-3)

    arr = arr.reshape((-1, len(features), ex.nx, ex.nx))
    arr = arr.astype(dtype)
    return arr

# utils.py:126:128
class FeatureExtractor:
    """Helper class for taking spatial derivatives and translating string
    expressions into data. Works with either pyqg.Model or xarray.Dataset."""

The above functions extract_inputs() and extract_targets() are wrappper functions of the method extract_vars() which creates a FeatureExtractor object from the dataset. This class works with pyqg.Model or xarray.Dataset as a helper class for taking spatial derivatives and translating string expressions into data since we specified the inputs and targets of the parameterized FCNN as strings. This object is used in extracting the appropriate features from the dataset and reshaping these features from an xarray.Dataset format to a numpy.ndarray representation which can then be passed into the FCNN. The main function that carries this out is extract_feature().

# utils.py:208:209
def extract_feature(self, feature):
    """Evaluate a string feature, e.g. laplacian(advected(curl(u,v)))."""

Standardization#

Now, upon extracting the relevant features from the inputs and targets of the training dataset, another preprocessing technique that is then applied on these features is standardization. This entails scaling data to fit a standard normal distribution.

# neural_networks.py:310
model.fit(X, Y, num_epochs=num_epochs, **kw)

# neural_networks.py:131:135
def fit(self, inputs, targets, rescale=False, **kw):
        if rescale or not hasattr(self, 'input_scale') or self.input_scale is None:
            self.input_scale = ChannelwiseScaler(inputs)
        if rescale or not hasattr(self, 'output_scale') or self.output_scale is None:
            self.output_scale = ChannelwiseScaler(targets, zero_mean=self.is_zero_mean)

The function fit() takes in, as parameters, the extracted feature values for the inputs and targets and other additional parameters including the number of epochs to train on and whether to rescale based on the input and target values that are passed in. Each FullyCNN model has an input scaler and output scaler in the form of ChannelwiseScaler objects. The ChannelwiseScalar class, which inherits from its parent class, BasicScaler, calculates the mean, zero spatial mean if necessary, and standard deviation along each feature channel of the the inputs and targets.

# neural_networks.py:199:209
class ChannelwiseScaler(BasicScaler):
    def __init__(self, x, zero_mean=False):
        assert len(x.shape) == 4
        if zero_mean:
            mu = 0
        else:
            mu = np.array([x[:,i].mean()
                for i in range(x.shape[1])])[np.newaxis,:,np.newaxis,np.newaxis]
        sd = np.array([x[:,i].std()
            for i in range(x.shape[1])])[np.newaxis,:,np.newaxis,np.newaxis]
        super().__init__(mu, sd)

# neural_networks.py:188
class BasicScaler(object):

These scaler objects also perform the standardization step on the data. This is done by calling transform() on the input and output scaler objects and passing in the input and target values, respectively.

# neural_networks.py:136:139
train(self,
      self.input_scale.transform(inputs),
      self.output_scale.transform(targets),
      **kw)

# neural_networks.py:193:194
def transform(self, x):
    return (x - self.mu) / self.sd

Lastly, a call is made to the function train() in order to kick off the training session now that the training data has been preprocessed and prepared.

Training FCNNs#

# neural_networks.py:222
def train(net, inputs, targets, num_epochs=50, batch_size=64, learning_rate=0.001, device=None):

The above function train() performs a training session on the FCNN using a sample dataset. It takes in the instance of the FullyCNN object as well as the preprocessed inputs and targets from the training dataset. There are additional parameters including the number of epochs to train over, the batch size, the learning rate, and device to specify whether memory will be loaded onto the GPU or CPU. These parameters can be adjusted to yield different variations of training settings. By default, the FCNNs are trained for 50 epochs over minibatches of 64 samples.

Examining the code in this method, first, a check is made to see whether GPU resources are available, otherwise the device falls back onto the CPU.

# neural_networks.py:223:225
if device is None:
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    net.to(device)

The learning algorithm employed during training is the Adam optimizer, an adaptive learning rate optimizer and a powerful tool for improving the accuracy and speed of neural networks. The scheduling technique used during training is MultiStepLR, which decays the learning based on number of epochs reaching specific milestones. Training is evaluated on a mean squared error (MSE) loss.

# neural_networks.py:226:228
optimizer = optim.Adam(net.parameters(), lr=learning_rate)
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[int(num_epochs/2), int(num_epochs*3/4), int(num_epochs*7/8)], gamma=0.1)
criterion = nn.MSELoss()

Each pass of the below loop performs one training epoch in which first a batch of training data is loaded by calling minibatch().

# neural_networks.py:211
def minibatch(*arrays, batch_size=64, as_tensor=True, shuffle=True):

# neural_networks.py:229:232
for epoch in range(num_epochs):
    epoch_loss = 0.0
    epoch_steps = 0
    for x, y in minibatch(inputs, targets, batch_size=batch_size):

The optimizer’s gradients are then zeroed.

# neural_networks.py:233
optimizer.zero_grad()

Predictions from the FCNN model are made for this input batch.

# neural_networks.py:234
yhat = net.forward(x.to(device))

Calculations of the loss for that set of predictions against the actual labels on the training dataset and the backward gradients over the learning weights are made.

# neural_networks.py:235:237
ytrue = y.to(device)
loss = criterion(yhat, ytrue)
loss.backward()

The optimizer then adjusts the model’s learning weights based the gradients observed for the batch.

# neural_networks.py:238
optimizer.step()

Finally, the loss data is gathered and then reported as an average per-batch loss.

# neural_networks.py:239:242     
    epoch_loss += loss.item()
    epoch_steps += 1
print(f"Loss after Epoch {epoch+1}: {epoch_loss/epoch_steps}")
scheduler.step()

Saving and Loading FCNNs#

Upon completing the inital training session on these models, they are then saved to the directory path that was originally specified and an instance of the trained parameterization is returned.

# neural_networks.py:311:312
model.save(os.path.join(directory, f"models/{z}"))

To load a saved FCNN paramaterization later for further training or to make more predictions on, we instantiate a FCNNParameterization object, passing in the directory path to where the parameterization is saved and then read the models saved there by calling the class method load() from FullyCNN.

param = FCNNParameterization('/home/jovyan/models/fcnn_qu_to_Sq2')

# neural_networks.py:245:250
def __init__(self, directory, models=None, **kw):
    self.directory = directory
    self.models = models if models is not None else [
        FullyCNN.load(f, **kw)
        for f in sorted(glob.glob(os.path.join(directory, "models/*")))
    ]

# neural_networks.py:163:164
@classmethod
def load(cls, path, set_eval=True, **kwargs):

Running Predictions#

Having now created and trained parameterized FCNN models, we can begin making predictions on them against held-out datasets of filtered and coarse-grained high resolution simulations. We will focus our scope within these tutorials to offline testing, though we can also observe performance through online testing and metrics. The below function test_offline() takes in a coarsened, low-resolution dataset and then predicts the subgrid forcing targets using the parameterization. The parameterization’s predictions are then evaluated on a number of different online metrics including the coefficient of determination (\(R^2\)) and the Pearson correlation (\(\rho\)).

# utils.py:82:84
def test_offline(self, dataset):
        """Evaluate the parameterization on an offline dataset,
        computing a variety of metrics."""

The function then returns an xarray.Dataset object describing the predictions made by the parameterization is returned including a number of computed metrics.