Overview
This notebook builds a feedforward neural network from scratch using manual backpropagation to solve the XOR problem, then scales up to MLP classifiers on the Iris dataset with varying hidden-layer widths.
You Will Learn
- Implementing a neural network without any deep learning framework
- Coding the four-step backpropagation algorithm by hand
- Training a network to solve XOR and watching convergence
- Comparing MLP performance across different hidden neuron counts (1–32)
- Visualising decision boundaries at different network capacities
Main Content
Building a Neural Network from Scratch
The notebook starts with the NeuralNetwork class that uses a list of LogisticRegression neurons (reused from week 4). The architecture for XOR is 2 inputs + bias, 2 hidden neurons, and 1 output neuron, all with sigmoid activation. You implement the forward pass by computing activations layer by layer, then implement backpropagation with the four manual steps: output deltas, hidden deltas, output weight updates, hidden weight updates. No autograd, no PyTorch backward() — just NumPy and the chain rule.
Solving XOR
The XOR training loop iterates over the four data points repeatedly, running forward and backward for each sample. With appropriate learning rate (typically 0.5–2.0) and enough iterations (500–5000), the loss drops from ~0.7 to near zero. You plot the BCE loss curve and verify that the network outputs values close to the correct labels. The decision boundary visualisation shows how the hidden layer warps the input space to separate the diagonal XOR pattern.
MLP Experiments on Iris
The second part trains PyTorch MLPs with hidden layer sizes {1, 2, 4, 8, 16, 32} on the Iris dataset. For each configuration you track training and test accuracy across epochs. The results show a clear progression: 1 hidden neuron underfits (~60–70% accuracy), 4–8 neurons reach near-optimal performance (~95–97%), and 16–32 neurons achieve similar accuracy but with more variance across random seeds.
Examples
XOR Training Loop
Training a hand-coded neural network on XOR data.
nn = NeuralNetwork(n_inputs=2, n_hidden=2, n_outputs=1)
for iteration in range(5000):
total_loss = 0
for x, target in xor_data:
output = nn.forward(x)
loss = -target*np.log(output) - (1-target)*np.log(1-output)
nn.backward(target, learning_rate=1.0)
total_loss += loss
if iteration % 1000 == 0:
print(f"Iter {iteration}, Loss: {total_loss:.4f}")Common Mistakes
Using too small a learning rate for XOR with sigmoid
Why: The sigmoid gradient is at most 0.25, so gradients are already small. A tiny learning rate makes convergence painfully slow.
Fix: Start with learning rate 0.5–2.0 for small sigmoid networks.
Mini Exercises
1. Run the XOR network 10 times with different random seeds. How often does it converge?