Dimensionality Reduction — PCA, MNIST & Feature Selection

Summary

Tackling high-dimensional data with Principal Component Analysis: IncrementalPCA on the full MNIST dataset (70,000 digit images with 784 pixels each), explained variance analysis to choose how many components to keep, low-rank PCA via torch.pca_lowrank() for efficient computation, reconstructing digit images from principal components, and comparing MLP classification accuracy on raw pixels versus PCA-reduced features. Also covers feature selection using correlation coefficients and chi-squared tests on the Diabetes dataset.

Materials

THEORY

PCA & Dimensionality Reduction — Making Sense of High Dimensions

Why high-dimensional data is deceptively tricky, how PCA finds the most informative directions, and when to select features instead.

NOTEBOOK

Dimensionality Reduction Notebook — PCA on MNIST & Feature Selection

Apply IncrementalPCA and low-rank PCA to 70k MNIST digits, reconstruct images, compare classification on raw vs reduced features, and run feature selection on Diabetes data.

Includes notebook

Density Estimation — Gaussian Mixtures, EM & Vowel Classification Deep Learning — ResNet, Transfer Learning & Contrastive Learning