9.1 KiB
🔥 smoltorch
A tiny autograd engine and neural network library built from first principles
Inspired by Andrej Karpathy's micrograd, built for learning
🎯 What is smoltorch?
smoltorch is a minimalist deep learning library that implements automatic differentiation (autograd) and neural networks from scratch using only NumPy. It's designed to be:
- Educational: Understand how modern deep learning frameworks work under the hood
- Transparent: Every operation is visible and understandable
- Functional: Train real models on real datasets with competitive performance
- Minimal: ~500 lines of readable, well-documented Python code
Why "smoltorch"?
"Smol" + PyTorch. It's a tiny implementation that captures the essence of modern deep learning frameworks.
✨ Features
Core Engine
- ✅ Automatic differentiation with dynamic computational graphs
- ✅ NumPy-backed tensors for efficient numerical computing
- ✅ Broadcasting support with proper gradient handling
- ✅ Topological sorting for correct backpropagation
Operations
- Arithmetic:
+,-,*,/,** - Matrix operations:
@(matmul) - Activations: ReLU, tanh, sigmoid
- Reductions: sum, mean
- Element-wise: log
Neural Networks
- Layers: Linear (fully connected)
- Models: Multi-layer perceptron (MLP)
- Loss functions: MSE, Binary Cross-Entropy
- Optimizers: SGD (Stochastic Gradient Descent)
📦 Installation
From PyPI (recommended)
uv add smoltorch
From source
git clone https://github.com/kashifulhaque/smoltorch.git
cd smoltorch
uv pip install -e .
Development installation
uv pip install -e ".[dev]"
🚀 Quick Start
Basic Tensor Operations
from smoltorch import Tensor
# Create tensors
x = Tensor([1.0, 2.0, 3.0])
y = Tensor([4.0, 5.0, 6.0])
# Operations
z = x + y # Element-wise addition
w = x * y # Element-wise multiplication
a = x @ y.T # Matrix multiplication
# Backward pass
a.backward()
print(x.grad) # Gradients computed automatically!
Training a Neural Network (Regression)
from smoltorch import Tensor, MLP, SGD
from sklearn.datasets import make_regression
import numpy as np
# Generate data
X, y = make_regression(n_samples=100, n_features=5, noise=10)
y = y.reshape(-1, 1)
# Create model
model = MLP([5, 16, 16, 1]) # 5 inputs -> 16 -> 16 -> 1 output
optimizer = SGD(model.parameters(), lr=0.001)
# Training loop
for epoch in range(100):
# Forward pass
X_tensor = Tensor(X)
y_tensor = Tensor(y)
y_pred = model(X_tensor)
# Compute loss (MSE)
loss = ((y_pred - y_tensor) ** 2).mean()
# Backward pass
optimizer.zero_grad()
loss.backward()
# Update weights
optimizer.step()
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch + 1}, Loss: {loss.data:.4f}")
Binary Classification
from smoltorch import Tensor, MLP, SGD, binary_cross_entropy
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
# Load and preprocess data
data = load_breast_cancer()
X, y = data.data, data.target.reshape(-1, 1)
scaler = StandardScaler()
X = scaler.fit_transform(X)
# Create classifier with sigmoid output
class BinaryClassifier(MLP):
def __call__(self, x):
x = super().__call__(x)
return x.sigmoid() # Output probabilities
model = BinaryClassifier([30, 16, 8, 1])
optimizer = SGD(model.parameters(), lr=0.01)
# Training loop
for epoch in range(200):
X_tensor = Tensor(X)
y_tensor = Tensor(y)
y_pred = model(X_tensor)
loss = binary_cross_entropy(y_pred, y_tensor)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch + 1) % 20 == 0:
accuracy = ((y_pred.data > 0.5) == y).mean()
print(f"Epoch {epoch + 1}, Loss: {loss.data:.4f}, Acc: {accuracy:.4f}")
# Result: ~96% test accuracy on breast cancer dataset! 🎉
📊 Real-World Performance
smoltorch achieves competitive results on standard benchmarks:
| Dataset | Task | Test Accuracy | Epochs |
|---|---|---|---|
| Breast Cancer | Binary Classification | 96.5% | 200 |
| Synthetic Regression | Regression | MSE: 95.7 | 100 |
🏗️ Architecture
Computational Graph
smoltorch builds a dynamic computational graph during the forward pass:
x = Tensor([2.0])
y = Tensor([3.0])
z = (x * y) + (x ** 2) # Graph: z -> [+] -> [*, **] -> [x, y]
z.backward() # Backpropagate through graph
print(x.grad) # dz/dx = y + 2x = 3 + 4 = 7.0
How Autograd Works
- Forward pass: Build computational graph with operations as nodes
- Topological sort: Order nodes for correct gradient flow
- Backward pass: Apply chain rule in reverse topological order
- Gradient accumulation: Sum gradients from multiple paths
Example with broadcasting:
x = Tensor([[1, 2, 3]]) # shape (1, 3)
y = Tensor([[1], [2]]) # shape (2, 1)
z = x + y # shape (2, 3) - broadcasting!
z.backward()
# x.grad sums over broadcast dimensions: shape (1, 3)
# y.grad sums over broadcast dimensions: shape (2, 1)
🧠 Supported Operations
Element-wise Operations
z = x + y # Addition with broadcasting
z = x - y # Subtraction
z = x * y # Multiplication
z = x / y # Division
z = x ** 2 # Power
Matrix Operations
z = x @ y # Matrix multiplication (with batch support)
Activation Functions
z = x.relu() # ReLU: max(0, x)
z = x.tanh() # Tanh: (e^2x - 1) / (e^2x + 1)
z = x.sigmoid() # Sigmoid: 1 / (1 + e^-x)
Reductions
z = x.sum() # Sum all elements
z = x.sum(axis=0) # Sum along axis
z = x.mean() # Mean of all elements
z = x.mean(axis=1) # Mean along axis
Other
z = x.log() # Natural logarithm
📚 Examples
Check out the examples/ directory:
train_regression.py- Train on synthetic regression datatrain_classification.py- Binary classification on breast cancer dataset
Run them:
uv run examples/train_regression.py
uv run examples/train_classification.py
🧪 Testing
Run the test suite:
uv run pytest
Tests cover:
- ✅ Addition with broadcasting
- ✅ Multiplication with broadcasting
- ✅ Matrix multiplication
- ✅ Activation functions (ReLU, tanh, sigmoid)
- ✅ Reductions (sum, mean)
- ✅ Linear layers
- ✅ Multi-layer perceptrons
- ✅ End-to-end training
🗺️ Roadmap
Coming Soon
- More optimizers: Adam, RMSprop with momentum
- More activations: Leaky ReLU, ELU, Softmax
- Regularization: Dropout, L2 weight decay
- Mini-batch training: Efficient batch processing
- Multi-class classification: Softmax + Cross-Entropy loss
Future
- Convolutional layers: CNN support for images
- Model serialization: Save/load weights in safetensors format
- GPU acceleration: Explore Metal Performance Shaders for Apple Silicon
- Better initialization: He initialization for ReLU networks
- Learning rate scheduling: Decay strategies
🎓 Learning Resources
If you're learning from smoltorch, these resources complement it well:
- Andrej Karpathy's micrograd - The original inspiration
- Neural Networks: Zero to Hero - Video series by Andrej Karpathy
- The Matrix Calculus You Need For Deep Learning - Paper on backpropagation math
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes:
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Andrej Karpathy for micrograd and the brilliant educational content
- PyTorch team for API design inspiration
- The deep learning community for making knowledge accessible
📬 Contact
Created by Kashif - feel free to reach out!
- GitHub: @kashifulhaque
- Twitter: @notifkash
⭐ Star this repo if you found it helpful!
Built with ❤️ for learners and tinkerers