Neural Networks Notebook — XOR & Iris MLP — Machine Learning

Overview

This notebook builds a feedforward neural network from scratch using manual backpropagation to solve the XOR problem, then scales up to MLP classifiers on the Iris dataset with varying hidden-layer widths.

You Will Learn

Implementing a neural network without any deep learning framework
Coding the four-step backpropagation algorithm by hand
Training a network to solve XOR and watching convergence
Comparing MLP performance across different hidden neuron counts (1–32)
Visualising decision boundaries at different network capacities

Main Content

Building a Neural Network from Scratch

The notebook starts with the NeuralNetwork class that uses a list of LogisticRegression neurons (reused from week 4). The architecture for XOR is 2 inputs + bias, 2 hidden neurons, and 1 output neuron, all with sigmoid activation. You implement the forward pass by computing activations layer by layer, then implement backpropagation with the four manual steps: output deltas, hidden deltas, output weight updates, hidden weight updates. No autograd, no PyTorch backward() — just NumPy and the chain rule.

Solving XOR

The XOR training loop iterates over the four data points repeatedly, running forward and backward for each sample. With appropriate learning rate (typically 0.5–2.0) and enough iterations (500–5000), the loss drops from ~0.7 to near zero. You plot the BCE loss curve and verify that the network outputs values close to the correct labels. The decision boundary visualisation shows how the hidden layer warps the input space to separate the diagonal XOR pattern.

MLP Experiments on Iris

The second part trains PyTorch MLPs with hidden layer sizes {1, 2, 4, 8, 16, 32} on the Iris dataset. For each configuration you track training and test accuracy across epochs. The results show a clear progression: 1 hidden neuron underfits (~60–70% accuracy), 4–8 neurons reach near-optimal performance (~95–97%), and 16–32 neurons achieve similar accuracy but with more variance across random seeds.

Examples

XOR Training Loop

Training a hand-coded neural network on XOR data.

nn = NeuralNetwork(n_inputs=2, n_hidden=2, n_outputs=1)

for iteration in range(5000):
    total_loss = 0
    for x, target in xor_data:
        output = nn.forward(x)
        loss = -target*np.log(output) - (1-target)*np.log(1-output)
        nn.backward(target, learning_rate=1.0)
        total_loss += loss
    if iteration % 1000 == 0:
        print(f"Iter {iteration}, Loss: {total_loss:.4f}")

Common Mistakes

Using too small a learning rate for XOR with sigmoid

Why: The sigmoid gradient is at most 0.25, so gradients are already small. A tiny learning rate makes convergence painfully slow.

Fix: Start with learning rate 0.5–2.0 for small sigmoid networks.

Mini Exercises

1. Run the XOR network 10 times with different random seeds. How often does it converge?

Neural Networks Notebook — XOR & Iris MLP