Symbolic Regression vs. Neural Networks on Multiscale L96

Symbolic Regression vs. Neural Networks on Multiscale L96#

To recap from the previous notebooks, Lorenz (1996) describes a “two time-scale” model in two equations which are:

(20)#\[\begin{align} \frac{d}{dt} X_k &= - X_{k-1} \left( X_{k-2} - X_{k+1} \right) - X_k + F - \left( \frac{hc}{b} \right) \sum_{j=0}^{J-1} Y_{j,k} \\ \frac{d}{dt} Y_{j,k} &= - cbY_{j+1,k} \left( Y_{j+2,k} - Y_{j-1,k} \right) - c Y_{j,k} + \frac{hc}{b} X_k \end{align}\]

This model can be visualized as follows:

https://www.researchgate.net/publication/319201436/figure/fig1/AS:869115023589376@1584224577926/Visualisation-of-a-two-scale-Lorenz-96-system-with-J-8-and-K-6-Global-scale-values.png

Fig. 8 Visualisation of a two-scale Lorenz ‘96 system with J = 8 and K = 6. Global-scale variables (\(X_k\)) are updated based on neighbouring variables and on the local-scale variables (\(Y_{j,k}\)) associated with the corresponding global-scale variable. Local-scale variabless are updated based on neighbouring variables and the associated global-scale variable. The neighbourhood topology of both local and global-scale variables is circular. Image from Exploiting the chaotic behaviour of atmospheric models with reconfigurable architectures - Scientific Figure on ResearchGate..#

import os
import time
from IPython.display import display, SVG

import gplearn
import gplearn.genetic
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pydot
import pysindy as ps
import torch
import torch.nn.functional as F
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.preprocessing import PolynomialFeatures
from torch import nn, optim
from torch.utils.data import DataLoader, TensorDataset

from L96_model import L96

# Ensuring reproducibility
np.random.seed(14)
torch.manual_seed(14);

Dataset Generation#

Let’s generate some data using the same conditions as in the previous notebooks Introduction to Neural Networks and Using Neural Networks for L96 Parameterization.

Obtaining the Ground Truth#

time_steps = 20000
forcing, dt, T = 18, 0.01, 0.01 * time_steps

W = L96(8, 32, F=forcing)

X_true, _, _, subgrid_tend = W.run(dt, T, store=True, return_coupling=True)
X_true, subgrid_tend = X_true.astype(np.float32), subgrid_tend.astype(np.float32)

print(f"X_true shape: {X_true.shape}")
print(f"subgrid_tend shape: {subgrid_tend.shape}")

X_true shape: (20001, 8)
subgrid_tend shape: (20001, 8)

In this case, although we have 8 different dimensions in \(X\), each is derived in the exact same way from neighboring values of \(X\) (and local subdimensions \(Y\)). Therefore, we can actually reduce this problem from an 8-to-8D regression problem down to an 8-to-1D regression problem:

Dx = X_true.shape[1]
inputs = np.vstack([X_true[:, range(-i, Dx - i)] for i in range(Dx)])
targets = np.hstack([subgrid_tend[:, -i] for i in range(Dx)])

print(f"inputs shape: {inputs.shape}")
print(f"targets shape: {targets.shape}")

inputs shape: (160008, 8)
targets shape: (160008,)

../_images/f855d7d3566848851979c3e0d905fbdc1630b0abd51fe7f48c1ea397e63749fd.png

../_images/20def59700276516a64c0c1f0a0251f1700800791732d644369260d14404e5cf.png

Looking at these marginal histograms of each large-scale variable vs. the subgrid forcing, we can see there’s a mostly linear relationship between \(x_0\) and its forcing, suggesting that the subgrid terms tend to force small values higher and large values lower. This suggests that an extremely simple linear model only containing \(x_0\) with negative slope will do reasonably well. However, the relationship isn’t completely linear, and also contains a bifurcation at larger \(x_0\).

For the other features, the relationships are less straightforward, but there’s clearly high mutual information.

Split the Data into Training and Test Sets#

data_idxs = np.arange(len(inputs))
np.random.shuffle(data_idxs)

num_train_samples = int(len(inputs) * 0.75)
train_idxs = data_idxs[:num_train_samples]
test_idxs = data_idxs[num_train_samples:]

Build the Dataset and the Dataloaders#

BATCH_SIZE = 1024

# Training Data
train_dataset = TensorDataset(
    torch.tensor(inputs[train_idxs]), torch.tensor(targets[train_idxs])
)
loader_train = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True)

# Test Data
test_dataset = TensorDataset(
    torch.tensor(inputs[test_idxs]), torch.tensor(targets[test_idxs])
)
loader_test = DataLoader(dataset=test_dataset, batch_size=BATCH_SIZE, shuffle=True)

Creating and Training the Neural Network#

Now let’s create a neural network and train it on the dataset we generated above.

Define Functions to Train and Evaluate the Model#

Note

The code in this section is taken from the notebook Introduction to Neural Networks.

Show code cell content Hide code cell content

def train_model(network, criterion, loader, optimizer):
    """Train the network for one epoch"""
    network.train()

    train_loss = 0
    for batch_x, batch_y in loader:
        # Get predictions
        if len(batch_x.shape) == 1:
            # This if block is needed to add a dummy dimension if our inputs are 1D
            # (where each number is a different sample)
            prediction = torch.squeeze(network(torch.unsqueeze(batch_x, 1)))
        else:
            prediction = network(batch_x)

        # Compute the loss
        loss = criterion(prediction, batch_y)
        train_loss += loss.item()

        # Clear the gradients
        optimizer.zero_grad()

        # Backpropagation to compute the gradients and update the weights
        loss.backward()
        optimizer.step()

    return train_loss / len(loader)


def test_model(network, criterion, loader):
    """Test the network"""
    network.eval()  # Evaluation mode (important when having dropout layers)

    test_loss = 0
    with torch.no_grad():
        for batch_x, batch_y in loader:
            # Get predictions
            if len(batch_x.shape) == 1:
                # This if block is needed to add a dummy dimension if our inputs are 1D
                # (where each number is a different sample)
                prediction = torch.squeeze(network(torch.unsqueeze(batch_x, 1)))
            else:
                prediction = network(batch_x)

            # Compute the loss
            loss = criterion(prediction, batch_y)
            test_loss += loss.item()

        # Get an average loss for the entire dataset
        test_loss /= len(loader)

    return test_loss


def fit_model(network, criterion, optimizer, train_loader, val_loader, n_epochs):
    """Train and validate the network"""
    train_losses, val_losses = [], []
    start_time = time.time()
    for epoch in range(1, n_epochs + 1):
        train_loss = train_model(network, criterion, train_loader, optimizer)
        val_loss = test_model(network, criterion, val_loader)
        train_losses.append(train_loss)
        val_losses.append(val_loss)
    end_time = time.time()
    print(f"Training completed in {int(end_time - start_time)} seconds.")

    return train_losses, val_losses

Create the Neural Network#

class Net_ANN(nn.Module):
    def __init__(self):
        super(Net_ANN, self).__init__()
        self.linear1 = nn.Linear(8, 16)
        self.linear2 = nn.Linear(16, 16)
        self.linear3 = nn.Linear(16, 1)

    def forward(self, x):
        x = F.relu(self.linear1(x))
        x = F.relu(self.linear2(x))
        x = self.linear3(x)[:, 0]
        return x

Train the Network#

nn_3l = Net_ANN()

n_epochs = 20
criterion = torch.nn.MSELoss()
optimizer = optim.Adam(nn_3l.parameters(), lr=0.01)
train_loss, validation_loss = fit_model(
    nn_3l, criterion, optimizer, loader_train, loader_test, n_epochs
)

Training completed in 17 seconds.

Visualize Results#

plt.figure(dpi=150)
plt.plot(train_loss, "b", label="Training loss")
plt.plot(validation_loss, "r", label="Validation loss")
plt.legend();

../_images/7f3fa80426ba0bae2a1796f89d9ff7b721f67a90c659d0e07c900c6d36819f9c.png

preds_nn = nn_3l(torch.from_numpy(inputs[test_idxs])).detach().numpy()
print("R2 Score:", r2_score(preds_nn, targets[test_idxs]))

R2 Score: 0.9223949649031744

The neural network gets an \(R^2\) of ~ 0.916, which seems pretty good given the non-determinism of the problem.

Comparing with Linear Regression#

linear_reg = LinearRegression()
linear_reg.fit(inputs[train_idxs], targets[train_idxs])
print("R2 Score:", linear_reg.score(inputs[test_idxs], targets[test_idxs]))

R2 Score: 0.8160184487760795

Linear regression also does fairly well, though about 0.1 lower in terms of \(R^2\). Lets look at its coefficients:

plt.figure(dpi=150)
plt.bar(range(8), linear_reg.coef_)
plt.xticks(range(8), [f"x{i}" for i in range(8)], fontsize=12)
plt.ylabel("Coefficient value", fontsize=12)
plt.show()

../_images/05b4325b1a0283a53cf754fa3c07e7fd4eed7718ef854cf168d176a68f2ffc2c.png

This matches the 2D histogram plots from earlier that showed a negative linear relationship between forcing and \(x_0\).

plt.figure(figsize=(15, 6), dpi=150)
plt.title("Neural Network vs. Linear Regression", fontsize=16)

preds_nn = nn_3l(torch.from_numpy(inputs)).detach().numpy()
preds_lr = linear_reg.predict(inputs)
random_idx = np.random.choice(len(targets - 1000))
random_slice = slice(random_idx, random_idx + 1000)

plt.plot(preds_nn[random_slice], label="NN Predicted values")
plt.plot(preds_lr[random_slice], label="LR Predicted values")
plt.plot(
    targets[random_slice], label="True values", ls="--", color="black", alpha=0.667
)
plt.xlabel(f"Time (starting at idx {random_idx})", fontsize=16)
plt.ylabel("Subgrid forcing", fontsize=16)
plt.legend(fontsize=14)
plt.show()

../_images/92e8f274f380afbbff5642988990f667d73ecf3a2b6cc79702182989e470dd25.png

Symbolic regression with genetic programming#

Let’s try using one of the genetic programming libraries to discover a parameterization:

symbolic_reg = gplearn.genetic.SymbolicRegressor(
    population_size=5000,
    generations=25,
    p_crossover=0.7,
    p_subtree_mutation=0.1,
    p_hoist_mutation=0.05,
    p_point_mutation=0.1,
    max_samples=0.9,
    verbose=1,
    parsimony_coefficient=0.001,
    const_range=(-2, 2),
    function_set=("add", "mul", "sub", "div", "sin", "log"),
)

symbolic_reg.fit(inputs[train_idxs][:20000], targets[train_idxs][:20000])

    |   Population Average    |             Best Individual              |
---- ------------------------- ------------------------------------------ ----------
 Gen   Length          Fitness   Length          Fitness      OOB Fitness  Time Left
  15.09          2250.77        3          1.64883          1.66775      3.21m
  10.32          6.53204        5          1.62545          1.63651      3.13m
  12.32          16.9512        5          1.62268          1.66148      3.04m
   7.04          5.26195       20          1.57546           1.6044      2.26m
   5.07          9.58891       24          1.55363          1.55098      1.84m
   5.79          5.96614       21          1.51949          1.49863      1.97m
   9.63          4.65507       17          1.50515          1.50598      2.64m
  16.11          2.78142       28          1.47296          1.45776      3.67m
  19.77          2.69051       28           1.4555          1.46709      3.93m
  21.42          2.80303       41          1.45297          1.48639      3.84m
  22.63          2.61995       37          1.44204          1.47566      3.98m
  23.78          2.29202       41          1.42739           1.4853      3.87m
  24.47           6.6118        8          1.41676          1.43687      3.60m
  25.15          2.55741       22           1.4201          1.47976      3.29m
  25.67          2.27797       28          1.41595          1.43681      2.97m
  24.77          2.45272       42          1.35228          1.38372      2.61m
  20.61          3.97112       23          1.34902          1.34875      2.02m
  12.16          3.66793       24          1.33076          1.35901      1.30m
  14.17          3.62681       24          1.32406          1.35392      1.30m
  27.37          2.86559       29          1.31132          1.33767      1.82m
  25.82          2.40423       31          1.30332          1.33496      1.41m
  23.41          2.45246       38          1.30026          1.32775     59.04s
  23.45          2.57487       38          1.29615          1.31448     38.19s
  24.97          2.56582       33          1.29189          1.30837     19.99s
  24.11          2.33584       37          1.29417          1.32678      0.00s

sub(sin(mul(-0.447, X0)), sub(X0, sin(sub(-1.107, log(add(X2, sub(sub(sub(-1.107, log(add(X2, sub(sin(sub(-1.107, log(mul(-0.447, X0)))), sub(X0, -0.220))))), X0), log(sin(mul(-0.447, X0))))))))))

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

print(symbolic_reg._program)

sub(sin(mul(-0.447, X0)), sub(X0, sin(sub(-1.107, log(add(X2, sub(sub(sub(-1.107, log(add(X2, sub(sin(sub(-1.107, log(mul(-0.447, X0)))), sub(X0, -0.220))))), X0), log(sin(mul(-0.447, X0))))))))))

print(
    "R2 Score:", r2_score(symbolic_reg.predict(inputs[test_idxs]), targets[test_idxs])
)

R2 Score: 0.8499308040527297

While the program does slightly better than linear regression, it’s extremely complicated (graph visualization below for slightly easier inspection):

# Build the graph
graph = pydot.graph_from_dot_data(symbolic_reg._program.export_graphviz())[0]
graph.write_svg("figs/symbolic_regression.svg")

# Display the graph
display(SVG("figs/symbolic_regression.svg"))

../_images/3c9cfd7f89e0fc228bba3dc3d6f7a497d61892eac02cc639cd4d85a7221e639e.svg

Explore the Pareto frontier#

We can explore the Pareto frontier of the final population of programs as well, to see if there’s a simpler program that still does well

df = pd.DataFrame.from_dict(
    [
        dict(
            idx=i,
            expr=str(p),
            fitness=p.fitness_,
            length=p.length_,
            r2_score=r2_score(p.execute(inputs[test_idxs]), targets[test_idxs]),
        )
        for i, p in enumerate(symbolic_reg._programs[-1])
    ]
)

pareto_frontier = []
used_programs = set()

for l in range(1, 40):
    programs = df[df.length <= l]
    best = programs.r2_score.argmax()
    idx = programs.idx.values[best]
    r2 = programs.r2_score.values[best]
    program = symbolic_reg._programs[-1][idx]
    if program not in used_programs:
        pareto_frontier.append(dict(length=l, r2=r2, expr=str(program)))
        used_programs.add(program)

pd.set_option("max_colwidth", 200)
pareto_frontier = pd.DataFrame.from_dict(pareto_frontier)

pareto_frontier

	length	r2	expr
0	1	0.000000	-0.220
1	2	0.000000	sin(-1.107)
2	3	0.787718	sub(-0.220, X0)
3	8	0.837378	sub(sin(mul(-0.447, X0)), sub(X0, -0.220))
4	11	0.841150	sub(sin(mul(-0.447, X0)), sub(X0, sin(mul(-0.447, X0))))
5	12	0.842545	sub(sin(sin(mul(-0.447, X0))), sub(X0, sin(mul(-0.447, X0))))
6	13	0.843272	sub(sin(sin(mul(-0.447, X0))), sub(X0, sin(sin(mul(-0.447, X0)))))
7	16	0.854240	sub(sin(mul(-0.447, X0)), sub(X0, sin(sub(-1.107, log(add(X2, div(X0, -1.107)))))))
8	17	0.855131	sub(sin(sub(-1.107, log(add(X2, sub(sin(X0), X0))))), sub(X0, sin(mul(-0.447, X0))))
9	19	0.856080	sub(sin(sub(-1.107, log(add(X2, sub(sub(sin(X0), X0), -1.107))))), sub(X0, sin(mul(-0.447, X0))))
10	21	0.856531	sub(sin(sub(-1.107, log(add(X2, sub(sub(-0.447, X0), log(sub(X0, -0.220))))))), sub(X0, sin(mul(-0.307, X0))))
11	23	0.856728	sub(sin(sub(-1.107, log(add(X2, sub(sub(sin(X0), X0), log(sin(mul(-0.447, X0)))))))), sub(X0, sin(mul(-0.447, X0))))
12	35	0.856754	sub(sin(sub(-1.107, log(add(X2, sub(sin(sub(-1.107, log(add(X2, sub(sub(sin(-0.605), X0), log(sub(X0, -0.220))))))), sub(X0, sin(mul(-0.447, X0)))))))), sub(X0, sin(mul(-0.447, X0))))
13	39	0.857035	sub(sin(mul(-0.447, X0)), sub(X0, sin(sub(-1.107, log(add(X2, sub(sub(sin(sub(-1.107, log(add(X2, sub(sub(-0.605, X0), log(sub(-0.447, X0))))))), sub(X0, sin(mul(-0.447, X0)))), log(mul(-0.447, X0...

Unfortunately, nothing really jumps out here as both simple and performant.

Symbolic Regression with Sparse Linear Regression and Polynomial Features#

We can also apply sparse regression methods to this problem with polynomial features

def learned_expr(coef, names, cutoff=1e-3):
    coef_idx = np.argwhere(np.abs(coef) > cutoff)[:, 0]
    sort_order = reversed(np.argsort(np.abs(coef[coef_idx])))
    return " + ".join(
        [
            f"{coef[coef_idx[i]]:.4f}*{names[coef_idx[i]].replace(' ','*')}"
            for i in sort_order
        ]
    )

pf = PolynomialFeatures(2)
feats = pf.fit_transform(inputs)
names = pf.get_feature_names_out([f"x_{i}" for i in range(8)])

Non-sparse linear regression#

linear_reg = LinearRegression()
linear_reg.fit(feats[train_idxs], targets[train_idxs])
learned_expr(linear_reg.coef_, names)

'-0.6913*x_0 + 0.4333*x_1 + 0.3393*x_7 + 0.2112*x_5 + 0.1420*x_6 + 0.1118*x_2 + -0.0399*x_0*x_1 + -0.0333*x_0*x_2 + -0.0323*x_6*x_7 + -0.0288*x_0*x_7 + -0.0283*x_1*x_2 + -0.0248*x_1*x_5 + -0.0242*x_3 + -0.0219*x_1*x_6 + -0.0209*x_5*x_7 + 0.0201*x_0^2 + -0.0185*x_1*x_4 + -0.0135*x_0*x_5 + 0.0124*x_0*x_3 + 0.0107*x_4 + 0.0099*x_1*x_3 + 0.0090*x_2*x_5 + -0.0088*x_4*x_7 + -0.0081*x_5^2 + -0.0076*x_2*x_3 + -0.0074*x_2*x_7 + 0.0071*x_3*x_4 + 0.0071*x_4*x_5 + -0.0069*x_3*x_5 + 0.0061*x_3*x_6 + 0.0059*x_1*x_7 + -0.0057*x_7^2 + 0.0050*x_3^2 + -0.0048*x_0*x_4 + -0.0045*x_1^2 + -0.0036*x_0*x_6 + -0.0036*x_3*x_7 + 0.0034*x_2^2 + 0.0024*x_4*x_6 + 0.0023*x_2*x_4 + 0.0015*x_2*x_6 + -0.0011*x_5*x_6'

print("R2 Score:", linear_reg.score(feats[test_idxs], targets[test_idxs]))

R2 Score: 0.8872654640415502

Adding polynomial features brings our \(R^2\) much closer to that of the neural network, but we have a ton of terms. Let’s try making the regression sparser.

Sequentially thresholded least squares#

stlsq = ps.optimizers.stlsq.STLSQ(alpha=0.001, threshold=0.01)
stlsq.fit(feats[train_idxs], targets[train_idxs])
learned_expr(stlsq.coef_[0], names)

'-3.7208*1 + -0.7842*x_0 + 0.3574*x_1 + 0.2431*x_7 + 0.1900*x_2 + 0.1778*x_5 + 0.1522*x_6 + -0.0390*x_0*x_1 + -0.0356*x_0*x_2 + -0.0335*x_6*x_7 + 0.0267*x_0^2 + -0.0265*x_1*x_5 + -0.0257*x_0*x_7 + -0.0227*x_1*x_2 + 0.0189*x_4 + -0.0164*x_1*x_6 + -0.0156*x_5*x_7 + 0.0145*x_3 + -0.0120*x_0*x_5 + 0.0110*x_0*x_3'

stlsq.score(feats[test_idxs], targets[test_idxs])

0.882294259449451

Looks like we can get a much sparser expression without sacrificing too much performance, though if we increase the threshold slightly, we will see a trade-off:

stlsq2 = ps.optimizers.stlsq.STLSQ(alpha=0.001, threshold=0.02)
stlsq2.fit(feats[train_idxs], targets[train_idxs])
learned_expr(stlsq2.coef_[0], names)

'-1.5860*1 + -0.9857*x_0 + 0.1324*x_2 + 0.0728*x_3 + -0.0725*x_7 + -0.0436*x_1 + -0.0322*x_0*x_2 + 0.0307*x_0^2'

stlsq2.score(feats[test_idxs], targets[test_idxs])

0.861645175881121

This is much simpler, but less performant. It still strongly outperformed the particluar genetic programming method we used, though. Let’s visualize some of the predictions:

plt.figure(figsize=(15, 6), dpi=150)
plt.title("Neural network vs. quadratic regressions", fontsize=16)

preds_nn = nn_3l(torch.from_numpy(inputs)).detach().numpy()
preds_lr_quad = linear_reg.predict(feats)
preds_st_quad = stlsq.predict(feats)
preds_st_quad2 = stlsq2.predict(feats)
random_idx = np.random.choice(len(targets - 1000))
random_slice = slice(random_idx, random_idx + 1000)

plt.plot(preds_nn[random_slice], label="NN predicted values")
plt.plot(preds_lr_quad[random_slice], label="LR + quadratic")
plt.plot(preds_st_quad[random_slice], label=r"STLSQ + quadratic, threshold=$0.01$")
plt.plot(preds_st_quad2[random_slice], label=r"STLSQ + quadratic, threshold=$0.02$")
plt.plot(
    targets[random_slice], label="True values", ls="--", color="black", alpha=0.667
)
plt.xlabel(f"Time (starting at idx {random_idx})", fontsize=16)
plt.ylabel("Subgrid forcing", fontsize=16)
plt.legend(fontsize=14)
plt.show()

../_images/e323c198f2fa33e54b62a4be82530a22531fdaa42a2bec813ca38070e6976499.png

Add threshold features#

The 2D histogram of \(x_0\) looks like it might be potentially modelable by piecewise linear regression:

plt.figure(figsize=(4, 4), dpi=150)
plt.hist2d(inputs[:, 0], targets, bins=50)
plt.xlim(-6, 14)
plt.xlabel("$x_0$", fontsize=16)
plt.ylabel("Forcing", fontsize=16)
plt.show()

../_images/2eb0980f89a1b692ee578497b5bb6be9d3c2443fcff29895c866de77b2c47c73.png

Sparse regression#

Let’s see if sparse regression will discover that if we provide it with thresholding features and cross-terms:

x0_thresholds = np.array(
    [np.ones_like(inputs[:, 0])] + [inputs[:, 0] > i for i in range(6)]
)
x0_threshold_names = [""] + [f"*(x0>{t})" for t in range(6)]

feats2 = np.array(
    [
        feats[:, i] * x0_thresholds[j]
        for i in range(feats.shape[1])
        for j in range(len(x0_thresholds))
    ]
).T
names2 = [
    f"{names[i]}{x0_threshold_names[j]}"
    for i in range(feats.shape[1])
    for j in range(len(x0_thresholds))
]

stlsq3 = ps.optimizers.stlsq.STLSQ(alpha=0.001, threshold=0.25)
stlsq3.fit(feats2[train_idxs], targets[train_idxs])
learned_expr(stlsq3.coef_[0], names2)

/usr/share/miniconda/envs/L96M2lines/lib/python3.9/site-packages/sklearn/utils/extmath.py:189: RuntimeWarning: invalid value encountered in matmul
  ret = a @ b

'-2.8051*1*(x0>4) + -0.9871*x_0 + -0.6571*1*(x0>2) + 0.6440*x_0*(x0>4) + -0.5681*1*(x0>0) + -0.5286*1*(x0>1) + -0.2734*1*(x0>5)'

stlsq3.score(feats2[test_idxs], targets[test_idxs])

0.8590951075317006

plt.figure(figsize=(4, 4), dpi=150)
plt.hist2d(inputs[:, 0], targets, bins=50)

fn = (
    lambda x0: -2.1200 * 1 * (x0 > 2)
    + -1.2695 * 1 * (x0 > 0)
    + -0.9488 * x0
    + -0.6694 * 1 * (x0 > 4)
    + 0.5175 * x0 * (x0 > 1)
)

xt = np.linspace(-6, 14, 100)
yt = fn(xt)
plt.plot(xt, yt, color="red")
plt.xlim(-6, 14)
plt.xlabel("$x_0$", fontsize=16)
plt.ylabel("Forcing", fontsize=16)
plt.show()

../_images/053e8c3c37b9a00cf8d20a19e31c676f31bf368a97da87912a091f99036bbf85.png

It looks like the model learns a piecewise linear function in a single variable, even though we didn’t explicitly require it!

Genetic programming#

We can also try using genetic programming here, but with a different function set that removes some of the nonlinear operators we included before and replaces them with min and max, which allow for piecewise behavior:

symbolic_reg_pw = gplearn.genetic.SymbolicRegressor(
    population_size=5000,
    generations=50,
    p_crossover=0.7,
    p_subtree_mutation=0.1,
    p_hoist_mutation=0.05,
    p_point_mutation=0.1,
    max_samples=0.9,
    verbose=1,
    parsimony_coefficient=0.001,
    const_range=(-2, 2),
    function_set=("add", "mul", "max", "min"),
)
symbolic_reg_pw.fit(inputs[train_idxs][:20000], targets[train_idxs][:20000])

    |   Population Average    |             Best Individual              |
---- ------------------------- ------------------------------------------ ----------
 Gen   Length          Fitness   Length          Fitness      OOB Fitness  Time Left
  27.65      3.78741e+06        7          2.45249          2.54746      5.82m
   8.00          9.51443        5          1.85843          1.82615      4.05m
  10.82          7.30098        3           1.6483          1.63764      4.45m
   8.34          27.6089        3          1.64321          1.68342      4.14m
   4.92          26.8323       15          1.50401          1.50866      3.69m
   3.87          16.3644       15          1.44304          1.44976      3.60m
   6.13           148.18       11          1.40594          1.42375      3.78m
  12.76          15.4382       13          1.37424          1.38711      4.78m
  14.48          10.2082       13           1.3658          1.36791      4.99m
  15.67          12.0612       21          1.35567          1.37664      5.04m
  15.53          17917.1       23           1.3514          1.40418      4.85m
  15.51          7.61133       29          1.35019          1.38838      4.88m
  16.27          4.11042       19          1.34782          1.38404      4.84m
  15.22          12.9581       23          1.34734          1.38836      4.57m
  14.50          1540.79       17          1.34519          1.40768      4.33m
  13.97          5.27408       17          1.34517          1.40786      4.14m
  14.09          13.3842       17          1.34172          1.43891      4.13m
  14.21          3.51828       13          1.34363          1.37425      3.95m
  14.46          84.8708       19          1.34315          1.43713      3.80m
  14.50           8.1359       15          1.33915          1.37526      3.66m
  14.49          4.71403       15          1.33942          1.41217      3.55m
  14.12          4.65016       15          1.33662          1.39804      3.38m
  13.77          38.3175       19          1.33467          1.42346      3.31m
  13.67          9.76407       17          1.33539          1.41711      3.10m
  13.87          9.07149       15          1.33255          1.43467      3.00m
  14.14          114.952       15          1.33405          1.42116      2.88m
  14.53          110.894       15          1.33412          1.42058      2.78m
  14.70          3.25037       19          1.33509          1.41981      2.68m
  14.72          10.7036       13          1.33059          1.37007      2.62m
  14.88          3.91762       25          1.32809          1.37427      2.45m
  14.62          6.29094       15          1.32687          1.39063      2.30m
  14.57           3.5343       17           1.3249          1.40442      2.18m
  14.12          3.90761       21          1.32425          1.41418      2.05m
  13.83          5.69476       13          1.32439          1.42588      1.97m
  13.48          8.32606       15          1.32568          1.40132      1.78m
  13.25          79.5995       17          1.32476          1.40569      1.66m
  13.24          4.45101       13          1.32515          1.41905      1.55m
  13.04          7.14949       13          1.32524           1.4182      1.41m
  13.15          5.93211       15          1.32344          1.42146      1.31m
  13.15          9.69874       17          1.32478          1.40562      1.22m
  13.18          5.74077       13          1.32703          1.40212      1.07m
  12.99          9.94645       13          1.32421          1.42748     56.92s
  12.99          4.36878       13          1.32517          1.41883     49.86s
  13.12          12.7419       13          1.32593          1.41204     42.96s
  13.26          3.95264       13          1.32416          1.42798     35.94s
  13.06          13.1207       21          1.32404          1.42906     29.54s
  13.04          6.56359       13          1.32464          1.42364     21.41s
  13.19          6.23439       21          1.32029          1.34429     14.35s
  13.10          27.2948       21          1.32101          1.33781      7.18s
  13.26          5.91966       21          1.31816          1.36342      0.00s

add(max(min(mul(max(X0, -1.442), -0.938), X1), -1.937), mul(min(X0, add(max(X5, X3), max(X6, X4))), -0.720))