← Projects

SVMs & Autoencoders

Comprehensive implementation of Support Vector Machines and Deep Autoencoders for image classification and dimensionality reduction on CIFAR-10 and MNIST datasets, built from scratch using NumPy, CuPy, and PyTorch.

github.com/pompos02/NeuralNetworks-DeepLearning

Project Overview

This project consists of two major assignments demonstrating the implementation of classical machine learning algorithms and modern deep learning approaches. Assignment 2 focuses on Support Vector Machines with different kernels and optimization techniques, while Assignment 3 explores autoencoders for dimensionality reduction and image reconstruction.

  • Assignment 2: SVM implementations using primal form with gradient descent, dual form with kernel methods, and Sequential Minimal Optimization (SMO)
  • Assignment 3: Autoencoder architectures for MNIST digit reconstruction, with comparison to PCA and classification performance analysis
  • All implementations built from scratch using NumPy and CuPy for GPU acceleration
  • Comprehensive hyperparameter tuning and performance evaluation with 150+ visualization figures

Assignment 2: Support Vector Machines

Implementation of SVM algorithms from scratch for binary classification of CIFAR-10 dataset (cats vs dogs), using only NumPy and CuPy.

1. Linear SVM with Primal Form

Implemented using Hinge Loss with L2 regularization and Stochastic Gradient Descent optimization. Features include GPU acceleration with CuPy, learning rate scheduling, standardization, and data augmentation.

Initial Linear SVM results on CIFAR-10

Initial Linear SVM results showing training progress and confusion matrix

Performance Optimization Results:

ConfigurationTrain AccTest Acc
Basic Implementation59.06%58.55%
With LR Scheduler56.74%56.60%
+ Data Augmentation60.53%60.55%
Final Optimized63.69%63.40%
Optimized Linear SVM performance

Final optimized Linear SVM with hyperparameter tuning

2. Polynomial SVM with Dual Form

Using polynomial kernel K(x,y) = (x·y + c)^d, implemented in dual form with gradient ascent optimization.

Polynomial SVM training

Performance by Configuration:

HyperparametersTrain AccTest Acc
Degree=3, C=3.063.38%62.60%
Best: Degree=2, C=10 (with tuning)73.49%64.25%
Polynomial SVM hyperparameter grid search

Comprehensive grid search results showing optimal hyperparameter combinations

3. RBF SVM with Sequential Minimal Optimization

Using RBF kernel K(x,y) = exp(-γ||x-y||²) with SMO algorithm for efficient training. Achieved the best overall performance among all SVM variants.

SMO Algorithm Performance:

ImplementationTrain AccTest AccTime
RBF SMO (Optimized)89.91%65.70%102 mins
RBF SMO (Basic)68.54%64.25%Long
RBF SVM with SMO results

RBF SVM training curves showing convergence and final performance metrics

Assignment 3: Autoencoders & Deep Learning

Implementation and comparison of autoencoder architectures for MNIST digit reconstruction, with evaluation against PCA and classification performance analysis using a CNN.

Autoencoder Architectures

1. Basic Autoencoder

Architecture: 784 → 128 → 32 → 128 → 784 with ReLU activation (hidden) and Sigmoid (output). Trained using MSE loss.

Results: Train Loss: 0.0103, Test Loss: 0.0099

Basic Autoencoder training loss

Basic Autoencoder training loss progression

Basic Autoencoder reconstructions

Sample digit reconstructions from Basic Autoencoder

2. Deep Autoencoder

Enhanced architecture: 784 → 512 → 256 → 64 → 256 → 512 → 784 with deeper layers for better feature learning.

Improved Results: Train Loss: 0.0045, Test Loss: 0.0047 (50% reduction in MSE)

Deep Autoencoder architecture

Deep Autoencoder showing superior convergence and lower loss

Latent Space Analysis

t-SNE visualization of the 32D and 64D latent spaces reduced to 2D, showing clear digit clustering patterns.

t-SNE latent space visualization

t-SNE visualization showing distinct digit clusters in latent space

Reconstruction Quality Comparison

Comparison of Methods:

MethodTest MSEBottleneck Size
Deep Autoencoder0.004764
Basic Autoencoder (64)0.006364
PCA Reconstruction0.009064
PCA vs Autoencoder comparison

Side-by-side comparison of PCA and Autoencoder reconstructions

CNN Classification Network

Implemented a CNN to classify MNIST digits using three different inputs: original data, autoencoder reconstructions, and PCA reconstructions.

CNN training curves

CNN training and validation accuracy/loss curves showing excellent convergence

Classification Performance:

Input TypeCNN AccuracyPerformance Drop
Original MNIST98.89%Baseline
Deep AE Reconstruction98.27%-0.62%
PCA Reconstruction98.22%-0.67%

Per-Digit Analysis

Analysis of reconstruction error by digit class reveals which digits are most challenging for the models.

Basic Autoencoder per-digit analysis

Basic Autoencoder per-digit loss

Deep Autoencoder per-digit performance

Deep Autoencoder per-digit performance

Key Findings:

  • Best Reconstructed: Digit 1 (0.001576 MSE) - Simple linear structure
  • Most Challenging: Digit 8 (0.006864 MSE) - Complex loops and overlapping regions
  • Consistent Pattern: Digits 0, 2, 5, 8 show higher reconstruction error across all models

Technical Implementation

High-Performance Computing

  • GPU Acceleration: CuPy integration achieving 100x speedup for SVM training
  • Efficient Memory Management: Batch processing and vectorized operations
  • PyTorch Integration: Automatic differentiation and GPU support for deep learning

Advanced Optimization Techniques

  • Learning Rate Scheduling: Up-then-down adaptive strategies
  • Regularization: L2 penalty, dropout, early stopping
  • Data Augmentation: Horizontal flip, random crop
  • Standardization: Z-score normalization for faster convergence

Key Results Summary

SVM Performance (CIFAR-10 Cats vs Dogs)

AlgorithmTest AccuracyComplexity
RBF SVM (SMO)65.70%High
Polynomial SVM64.25%Medium
Linear SVM63.40%Low
Linear SMO61.90%Medium
K-NN (k=1)57.85%Low

Autoencoder & CNN Results (MNIST)

ModelMetricPerformance
CNN on OriginalAccuracy98.89%
Deep AutoencoderMSE Loss0.0047
Basic AutoencoderMSE Loss0.0099
PCA (64 components)MSE Loss0.0090
CNN on Deep AE ReconstructionAccuracy98.27%
CNN on PCA ReconstructionAccuracy98.22%

Key Insights

1. Architecture Impact

Deeper autoencoders significantly outperform shallow ones (50% MSE reduction). Non-linear mappings (autoencoders) superior to linear methods (PCA) for complex image reconstruction.

2. Classification Robustness

Minimal accuracy drop (<1%) when classifying reconstructed images demonstrates excellent information preservation across all methods.

3. Computational Trade-offs

RBF SVM achieves highest accuracy (65.70%) but requires extensive training time (102+ minutes). Linear methods provide excellent baseline performance with fast training.

4. Method Complementarity

Classical methods (PCA, Linear SVM) provide interpretable and fast baselines. Modern methods (Deep AE, RBF SVM) achieve superior performance with increased complexity.

Technical Stack

Core Libraries

  • NumPy & CuPy for GPU-accelerated computing
  • PyTorch for deep learning implementations
  • scikit-learn for baseline comparisons
  • Matplotlib & Seaborn for visualizations

Datasets

  • CIFAR-10: 60,000 RGB images, 32x32 pixels, 10 categories
  • MNIST: 70,000 grayscale digit images, 28x28 pixels
  • Binary classification: cats vs dogs
  • 10-class digit recognition
← Back to Projects