PRACTICE

Logistic Regression from Scratch in PyTorch

Implement sigmoid, build a custom LogisticRegression layer, code BCE loss, visualise decision boundaries on Iris, extend to one-vs-all multiclass, and watch logistic regression fail on XOR.

Download Notebook (.ipynb)

Overview

Here you implement logistic regression in PyTorch: sigmoid, linear layer, binary cross-entropy, training on Iris, decision boundary visualisation, one-vs-all multi-class, and a hands-on failure case on XOR.

You Will Learn

  • Implementing sigmoid and binary cross-entropy in PyTorch
  • Building a LogisticRegression nn.Module
  • Training and evaluating on a binary Iris subset
  • Visualising decision boundaries in 2D feature space
  • Extending logistic regression to multi-class via one-vs-all
  • Empirically observing failure on XOR

Main Content

Implementing Sigmoid and BCE

You begin by implementing the sigmoid function as a differentiable PyTorch operation: sigma = 1 / (1 + torch.exp(−z)). For numerical stability you typically compute BCE as −(y log(p + ε) + (1 − y) log(1 − p + ε)).mean(), where ε ≈ 1e−7 avoids log(0). Coding this yourself rather than relying on nn.BCELoss reinforces the connection between probabilities and loss.

LogisticRegression Module

The LogisticRegression module consists of a single nn.Linear layer producing a logit z = wᵀx + b, followed by a sigmoid during training to obtain probabilities. For BCEWithLogitsLoss you feed logits directly into the loss without applying sigmoid in the forward method, leaving stabilisation to the loss implementation. This pattern generalises to many other models.

Training on Iris and Plotting Boundaries

Selecting two Iris classes and two features yields a 2D problem that is easy to visualise. After training, you generate a dense grid over the feature space, pass each point through the model, and colour points by predicted class. Overlaying training data reveals how the linear boundary cuts the plane, which points are close to the boundary (uncertain), and which are far (confident).

One-vs-All Multi-Class and XOR

For three-class Iris you train three binary logistic regressions, each distinguishing one class from the rest. At prediction time you take the argmax over the three predicted probabilities. Finally, you apply the same implementation to XOR and confirm that training loss stalls above zero and accuracy saturates at 0.5, regardless of optimisation details. This makes the linear limitation of logistic regression concrete.

Examples

PyTorch LogisticRegression Module

Minimal binary logistic regression model with logits.

import torch
import torch.nn as nn

class LogisticRegression(nn.Module):
    def __init__(self, in_features: int):
        super().__init__()
        self.linear = nn.Linear(in_features, 1)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.linear(x)  # logits; use with BCEWithLogitsLoss

Decision Boundary Plotting Skeleton

Generate a grid over 2D features and classify each point for visualisation.

import numpy as np

# assume model, x_train, y_train exist and features are 2D
x_min, x_max = x_train[:, 0].min() - 1, x_train[:, 0].max() + 1
y_min, y_max = x_train[:, 1].min() - 1, x_train[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                     np.linspace(y_min, y_max, 200))

with torch.no_grad():
    grid = torch.from_numpy(np.c_[xx.ravel(), yy.ravel()]).float().to(device)
    logits = model(grid)
    probs = torch.sigmoid(logits).cpu().numpy().reshape(xx.shape)

# contour plot on probs to show boundary at 0.5

Common Mistakes

Applying sigmoid before using BCEWithLogitsLoss

Why: BCEWithLogitsLoss expects raw logits and applies sigmoid internally; a double sigmoid harms numerical stability and training.

Fix: Return logits from the model and feed them directly to BCEWithLogitsLoss; only apply sigmoid when you explicitly need probabilities for interpretation.

Forgetting to balance classes when they are imbalanced

Why: In strongly imbalanced datasets, the model can achieve high accuracy by predicting the majority class.

Fix: Use class weights in the loss, resampling strategies, or evaluation metrics beyond accuracy (precision, recall, F1).

Mini Exercises

1. Modify the implementation to use BCEWithLogitsLoss instead of a manual BCE implementation. What changes in the model’s forward method are required?

2. Train logistic regression on XOR and report training loss and accuracy. Explain the results.

Further Reading