taper

A lightweight neural network library in Rust with automatic differentiation.

Goal - >99% with CNN in 50 epochs

Features

Dynamic computational graph with tape-based autograd
SIMD-optimized tensor operations (AVX/SSE/NEON)
Neural network layers: Linear, ReLU, Sigmoid, Conv2D, MaxPool2D, AvgPool2D, Flatten
Optimizers: SGD, Adam, AdamW with learning rate scheduling
Loss functions: MSE, Cross-entropy, BCE
Post-Training Quantization (PTQ): Int8/Float16 model compression with <1% accuracy loss
MNIST dataset support with data loading utilities

Performance

Optional BLAS acceleration for matrix operations
Cross-platform SIMD optimizations
Memory-efficient gradient computation

Note - Following numbers are on MacBook Pro M4 Pro (12 cores) -

gets ~99% accuracy on MNIST with a simple MLP in 10 epochs under 2 seconds total time (with BLAS). Pytorch equivalent takes ~1.5 seconds per epoch, i.e. ~15 seconds total time.
gets ~96% accuracy on MNIST with a simple CNN in 50 epochs, around 13 seconds per epoch (with BLAS). Pytorch equivalent takes ~120 seconds per epoch.

Technical Implementation

Operation Fusion: Combines multiple ops (Conv+ReLU) into single kernels to reduce memory traffic
GEMM: Cache-blocked matrix multiplication with AVX vectorization for optimal CPU utilization
Convolution: Direct 3x3 kernels bypass im2col; 1x1 kernels use pure matrix multiplication

Usage

# Basic training
cargo run --example train_mnist

# With BLAS acceleration (98% accuracy in 10 epochs)
cargo run --release --features blas-accelerate --example train_mnist

# Or MNIST CNN training (around 96% accuracy in 50 epochs)
cargo run --release --features blas-accelerate --example train_mnist_cnn

Quantization

Storage-only quantization for model compression:

Int8: 4x smaller models, ~0.5% accuracy drop
Float16: 2x smaller models, <0.1% accuracy drop
Use cases: Deployment, memory-constrained devices, reducing storage/transfer costs
Note: Weights stored quantized, computation in f32 (no inference speedup)

# Train and quantize a model
cargo run --package taper --release --features blas-accelerate --example ptq_quantize

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.cargo		.cargo
data/mnist		data/mnist
examples		examples
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs
pytorch_mnist_benchmark.py		pytorch_mnist_benchmark.py
pytorch_mnist_cnn_benchmark.py		pytorch_mnist_cnn_benchmark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

taper

Goal - >99% with CNN in 50 epochs

Features

Performance

Technical Implementation

Usage

Quantization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

vaibhawvipul/taper

Folders and files

Latest commit

History

Repository files navigation

taper

Goal - >99% with CNN in 50 epochs

Features

Performance

Technical Implementation

Usage

Quantization

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages