Dimensionality Reduction — PCA, MNIST & Feature Selection
Summary
Tackling high-dimensional data with Principal Component Analysis: IncrementalPCA on the full MNIST dataset (70,000 digit images with 784 pixels each), explained variance analysis to choose how many components to keep, low-rank PCA via torch.pca_lowrank() for efficient computation, reconstructing digit images from principal components, and comparing MLP classification accuracy on raw pixels versus PCA-reduced features. Also covers feature selection using correlation coefficients and chi-squared tests on the Diabetes dataset.
Materials
PCA & Dimensionality Reduction — Making Sense of High Dimensions
Why high-dimensional data is deceptively tricky, how PCA finds the most informative directions, and when to select features instead.
Dimensionality Reduction Notebook — PCA on MNIST & Feature Selection
Apply IncrementalPCA and low-rank PCA to 70k MNIST digits, reconstruct images, compare classification on raw vs reduced features, and run feature selection on Diabetes data.
Includes notebook