Machine Learning

Foundations of machine learning: supervised & unsupervised learning, model evaluation, feature engineering, and practical implementations with PyTorch and Scikit-learn.

Supervised LearningUnsupervised LearningRegressionClassificationClusteringNeural NetworksDeep LearningPyTorchScikit-learnPCADensity Estimation

Summary

A comprehensive journey through core machine learning concepts — from probability theory and linear models to neural networks and deep learning. Each topic builds on the previous, combining mathematical foundations with hands-on Python implementations. Covers regression, classification, clustering, density estimation, dimensionality reduction, and deep learning, all implemented from scratch and using industry-standard libraries like PyTorch and Scikit-learn.

Topics & Practice

Week 1: Intro + Setup

An introduction to the ML landscape: what machine learning is, the difference between supervised and unsupervised learning, and the types of problems ML can solve (regression, classification, clustering). The practical component covers Python fundamentals (variables, data types, control flow, functions) and PyTorch basics — tensor operations, matrix multiplication, broadcasting, reshaping, nn.Module, autograd, and building simple MLP pipelines for binary classification, multi-class classification, multi-label classification, and regression tasks.

THEORYPRACTICENOTEBOOKNOTEBOOK

Probability & Statistics for Machine Learning

Foundational probability concepts essential for ML: Bayes' theorem with real worked examples, Gaussian distributions and their parameters, covariance and Pearson correlation from scratch, and joint/conditional/marginal probabilities. Hands-on work involves computing Bayes' rule for practical scenarios, plotting Gaussians, analysing the Iris dataset with 2D histograms and probability matrices, and building an intuition for why probability is the language every ML algorithm speaks.

THEORYPRACTICE

Linear Regression

Linear regression from theory to complete PyTorch implementation: the hypothesis function y = wTx + b, mean squared error loss, gradient descent optimisation, z-score feature normalisation, learning rate experiments, weight interpretation on the Diabetes dataset, and extending to 5th-order polynomial regression with L2 (Ridge) regularisation to control overfitting.

THEORYPRACTICE

Classification I — Logistic Regression

From predicting numbers to predicting categories: the sigmoid function as a probability gate, binary cross-entropy loss, gradient descent for classification, decision boundary visualisation, one-vs-all multiclass classification on the Iris dataset, and the XOR problem — which reveals the fundamental limits of linear classifiers and motivates neural networks.

THEORYPRACTICE

Classification II — Trees, Bayes & Ensemble Methods

Beyond logistic regression: multinomial logistic regression with sklearn, Decision Trees that split on entropy, Gaussian Naive Bayes for fast probabilistic classification, and practical tools for the real world — confusion matrices, handling class imbalance with class_weight='balanced' and SMOTE, all applied to Iris and the large-scale Forest Covertype dataset.

THEORYPRACTICE

Neural Networks — Perceptrons, Backpropagation & MLPs

From single perceptrons to multi-layer networks: understanding how neurons compute (weights, input functions, sigmoid activation), why random weight initialization matters, and how backpropagation actually works through a manual 4-step process. Builds a feedforward neural network from scratch to solve XOR — the classic problem that single-layer networks cannot handle — then scales up to an MLP classifier on the Iris dataset, systematically varying hidden neuron counts (1, 2, 4, 8, 16, 32) to see exactly how capacity affects learning.

THEORYNOTEBOOK

Clustering — K-Means, Hierarchical Methods & Customer Segmentation

Unsupervised learning through clustering: building the K-means algorithm from scratch (distance computation, E-step centroid assignment, M-step mean recomputation), determining the right number of clusters with the Elbow method, agglomerative hierarchical clustering with dendrograms for multi-scale analysis, and a real-world customer segmentation project on the iFood marketing dataset (2,206 customers). Starts with Iris as a controlled testbed, then tackles messy real data.

THEORYNOTEBOOK

Density Estimation — Gaussian Mixtures, EM & Vowel Classification

Modeling probability distributions with Mixture of Gaussians (MoG) trained via the Expectation-Maximization algorithm. Covers the full EM loop — E-step soft responsibility assignments, M-step parameter updates for means, covariances, and mixing weights — applied to the Peterson & Barney vowel formant dataset (F1, F2 frequencies). Builds a Maximum Likelihood classifier from two class-conditional GMMs, visualizes decision boundaries on a meshgrid, confronts the singularity problem with linearly dependent features, and solves it with regularization. Achieves 95.07% accuracy with K=3 and 95.72% with K=6.

THEORYNOTEBOOK

Dimensionality Reduction — PCA, MNIST & Feature Selection

Tackling high-dimensional data with Principal Component Analysis: IncrementalPCA on the full MNIST dataset (70,000 digit images with 784 pixels each), explained variance analysis to choose how many components to keep, low-rank PCA via torch.pca_lowrank() for efficient computation, reconstructing digit images from principal components, and comparing MLP classification accuracy on raw pixels versus PCA-reduced features. Also covers feature selection using correlation coefficients and chi-squared tests on the Diabetes dataset.

THEORYNOTEBOOK

Deep Learning — ResNet, Transfer Learning & Contrastive Learning

Modern deep learning on CIFAR-10 (60,000 32x32 color images, 10 classes): training ResNet18 from scratch versus transfer learning with pretrained ImageNet weights (freezing convolutional layers and replacing the final fully-connected layer), data augmentation with random cropping and horizontal flips, SGD with learning rate scheduling, Supervised Contrastive Learning (SupCon loss from Khosla et al., NeurIPS 2020) for learning representations where same-class images cluster together, and evaluation via confusion matrices.

THEORYNOTEBOOK