Activity 15: Forward prediction in neural networks Live#
2026-03-26
Is this image a cat?#
Neural networks learn to combine raw inputs into useful internal representations. That’s what the hidden layers do, and it makes neural networks powerful for image recognition.
For this activity, we’ll use two simple input features to determine whether an image contains a cat (courtesy of Uni the Cat):

Feature |
Description |
|---|---|
\(x_1\) |
|
\(x_2\) |
|
Part 2: Reproduce in NumPy#
Now let’s verify our hand computation with code.
import numpy as np
# Same values as the pen-and-paper exercise
x = np.array([1, 1]) # shape (2,): has_box=1, has_whiskers=1
W1 = np.array([[ 0.5, 0.3],
[-0.4, 0.2],
[ 0.8, 0.6]]) # shape (3, 2): weight grid, input -> hidden
v = np.array([0.3, 0.2, 0.5]) # shape (3,): weights, hidden -> output
print(f"x shape: {x.shape}")
print(f"W1 shape: {W1.shape}")
print(f"v shape: {v.shape}")
x shape: (2,)
W1 shape: (3, 2)
v shape: (3,)
The @ operator#
On pen-and-paper, we computed each hidden neuron’s value by taking a row of the weight grid and computing a weighted sum with the input:
\(z_1 = 0.5 \times 1 + 0.3 \times 1 = 0.8\)
\(z_2 = (-0.4) \times 1 + 0.2 \times 1 = -0.2\)
\(z_3 = 0.8 \times 1 + 0.6 \times 1 = 1.4\)
NumPy has a shorthand that computes all of these row-wise weighted sums at once: the @ operator for matrix multiplication:
z = W1 @ x # does all 3 weighted sums in one line (matrix multiplication)
The shape rule is: if W1 has shape (3, 2) and x has shape (2,), then W1 @ x has shape (3,). The last dimension of the left side (2) must match the dimension of the right side (2), and the result keeps the remaining dimension (3).
We can use the @ operator to perform some of the broadcast-and-sum operations we’ve seen before:
# These two are equivalent:
z_broadcast = np.sum(W1 * x, axis=1) # broadcast x across rows, sum columns
z_matmul = W1 @ x # same thing, one operator
print(f"broadcast: {z_broadcast}")
print(f"matmul: {z_matmul}")
broadcast: [ 0.8 -0.2 1.4]
matmul: [ 0.8 -0.2 1.4]
Then, we can use np.maximum to apply the ReLU activation function. Read the docs, then complete our full forward prediction pass using @.
# Same values as the pen-and-paper exercise
x = np.array([1, 1]) # shape (2,): has_box=1, has_whiskers=1
W1 = np.array([[ 0.5, 0.3],
[-0.4, 0.2],
[ 0.8, 0.6]]) # shape (3, 2): weight grid, input -> hidden
v = np.array([0.3, 0.2, 0.5]) # shape (3,): weights, hidden -> output
# TODO Compute the pre-activation values using the `@` operator
z = W1 @ x
# TODO Apply ReLU (hint: np.maximum compares element-wise)
h = np.maximum(z, 0)
# TODO Compute the output using the `@` operator
y_hat = h @ v
print(f"z (pre-activation): {z}")
print(f"h (after ReLU): {h}")
print(f"y_hat (output): {y_hat}")
z (pre-activation): [ 0.8 -0.2 1.4]
h (after ReLU): [0.8 0. 1.4]
y_hat (output): 0.94
if __name__ == "__main__":
# These should match your hand computation
assert np.allclose(z, np.array([0.8, -0.2, 1.4])), f"z should be [0.8, -0.2, 1.4], got {z}"
assert np.allclose(h, np.array([0.8, 0.0, 1.4])), f"h should be [0.8, 0.0, 1.4], got {h}"
assert np.allclose(y_hat, 0.94), f"y_hat should be 0.94, got {y_hat}"
Part 3: Reproduce in PyTorch#
We just implemented a neural network forward pass in NumPy. PyTorch provides building blocks that let us express the same network even more concisely, as well as makes training neural networks much easier.
# standard import idioms for pytorch
import torch
import torch.nn as nn
torch.tensor: PyTorch’s version of np.array#
A torch.tensor works just like a NumPy array — it holds numbers and has a shape. The key difference is that tensors can track gradients for training.
# same as our NumPy input from before
x_torch = torch.tensor([1.0, 1.0])
print(f"type: {type(x_torch)}")
print(f"shape: {x_torch.shape}")
print(f"value: {x_torch}")
type: <class 'torch.Tensor'>
shape: torch.Size([2])
value: tensor([1., 1.])
nn.Sequential: building a network#
PyTorch’s nn.Sequential lets us build our entire network — input \(\to\) hidden with ReLU \(\to\) output — by stacking layers in order:
nn.Linear(in_features, out_features)— a layer that computes weighted sums. By default, this includes a bias term (\(w_0\)) automatically.nn.ReLU()— the ReLU activation function
We use bias=False below to match our pen-and-paper exercise, which omitted bias for simplicity.
model = nn.Sequential(
nn.Linear(2, 3, bias=False), # input (2,) -> hidden (3,): the weight grid W
nn.ReLU(), # apply ReLU activation
nn.Linear(3, 1, bias=False), # hidden (3,) -> output (1,): the weights v
)
print(model)
Sequential(
(0): Linear(in_features=2, out_features=3, bias=False)
(1): ReLU()
(2): Linear(in_features=3, out_features=1, bias=False)
)
Setting the weights to match our exercise#
By default, PyTorch initializes weights randomly. To verify that this model does the same computation as Parts 2 and 3, the starter code below manually sets the weights to our exercise values.
Setup code
Don’t worry about the details of nn.Parameter and torch.no_grad(), as this is just the PyTorch way to manually assign specific weight values. In practice, we almost never do this; we let the training process find good weights via gradient descent.
# Setup code: set the weights to match our pen-and-paper exercise
with torch.no_grad():
# Layer 1: the weight grid W (input -> hidden)
model[0].weight = nn.Parameter(torch.tensor(
[[ 0.5, 0.3],
[-0.4, 0.2],
[ 0.8, 0.6]]))
# Layer 2: the weights v (hidden -> output), stored as a (1, 3) grid
model[2].weight = nn.Parameter(torch.tensor(
[[0.3, 0.2, 0.5]]))
Check that the weight shapes match what we expect:
# model[0] is the first nn.Linear, model[1] is nn.ReLU, model[2] is the second nn.Linear
print(f"Layer 1 weight shape: {model[0].weight.shape}") # should be (3, 2)
print(f"Layer 2 weight shape: {model[2].weight.shape}") # should be (1, 3)
Layer 1 weight shape: torch.Size([3, 2])
Layer 2 weight shape: torch.Size([1, 3])
We can next pass our input through the model and check that the output matches our hand computation and NumPy result:
# the way that pytorch models make predictions is through a "forward" pass
# This is equivalent calling model.predict() in scikit-learn
output = model.forward(x_torch)
print(f"PyTorch output: {output}")
print(f"Hand computation: 0.94")
PyTorch output: tensor([0.9400], grad_fn=<SqueezeBackward4>)
Hand computation: 0.94
Compare our NumPy implementation to the PyTorch version:
NumPy |
PyTorch |
|
|---|---|---|
Input |
|
|
Layer 1 |
|
|
Activation |
|
|
Layer 2 |
|
|
The building blocks are the same but PyTorch places wrappers around them that make it easier to compose and that can automatically compute gradients.