Update README to include blog link

Added a link to the blog for smoltorch.
Release v0.1.0
2025-12-06 07:02:51 +00:00 · 2025-11-20 13:10:30 +05:30 · 2025-11-17 22:20:13 +05:30 · 2025-11-17 22:16:49 +05:30
23 changed files with 492 additions and 1191 deletions
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -0,0 +1,37 @@
+name: Publish to PyPI
+
+on:
+  push:
+    tags:
+      - 'v*'  # Trigger on version tags like v0.1.0, v1.0.0, etc.
+  workflow_dispatch:  # Allow manual triggering
+
+jobs:
+  build-and-publish:
+    runs-on: ubuntu-latest
+    
+    steps:
+    - name: Checkout code
+      uses: actions/checkout@v4
+    
+    - name: Set up Python
+      uses: actions/setup-python@v5
+      with:
+        python-version: '3.12'
+    
+    - name: Install build dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install build twine
+    
+    - name: Build package
+      run: python -m build
+    
+    - name: Check package
+      run: twine check dist/*
+    
+    - name: Publish to PyPI
+      env:
+        TWINE_USERNAME: __token__
+        TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
+      run: twine upload dist/*
--- a/README.md
+++ b/README.md
@@ -1 +1,359 @@
-# **smoltorch**
+# 🔥 smoltorch • [blog](https://blog.ifkash.dev/smoltorch)
+
+<div align="center">
+
+**A tiny autograd engine and neural network library built from first principles**
+
+[![PyPI version](https://badge.fury.io/py/smoltorch.svg)](https://badge.fury.io/py/smoltorch)
+[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+
+*Inspired by Andrej Karpathy's micrograd, built for learning*
+
+</div>
+
+---
+
+## 🎯 What is smoltorch?
+
+smoltorch is a minimalist deep learning library that implements automatic differentiation (autograd) and neural networks from scratch using only NumPy. It's designed to be:
+
+- **Educational**: Understand how modern deep learning frameworks work under the hood
+- **Transparent**: Every operation is visible and understandable
+- **Functional**: Train real models on real datasets with competitive performance
+- **Minimal**: ~500 lines of readable, well-documented Python code
+
+### Why "smoltorch"?
+
+"Smol" + PyTorch. It's a tiny implementation that captures the essence of modern deep learning frameworks.
+
+---
+
+## ✨ Features
+
+### Core Engine
+- ✅ **Automatic differentiation** with dynamic computational graphs
+- ✅ **NumPy-backed tensors** for efficient numerical computing
+- ✅ **Broadcasting support** with proper gradient handling
+- ✅ **Topological sorting** for correct backpropagation
+
+### Operations
+- **Arithmetic**: `+`, `-`, `*`, `/`, `**`
+- **Matrix operations**: `@` (matmul)
+- **Activations**: ReLU, tanh, sigmoid
+- **Reductions**: sum, mean
+- **Element-wise**: log
+
+### Neural Networks
+- **Layers**: Linear (fully connected)
+- **Models**: Multi-layer perceptron (MLP)
+- **Loss functions**: MSE, Binary Cross-Entropy
+- **Optimizers**: SGD (Stochastic Gradient Descent)
+
+---
+
+## 📦 Installation
+
+### From PyPI (recommended)
+```bash
+uv add smoltorch
+```
+
+### From source
+```bash
+git clone https://github.com/kashifulhaque/smoltorch.git
+cd smoltorch
+uv pip install -e .
+```
+
+### Development installation
+```bash
+uv pip install -e ".[dev]"
+```
+
+---
+
+## 🚀 Quick Start
+
+### Basic Tensor Operations
+```python
+from smoltorch import Tensor
+
+# Create tensors
+x = Tensor([1.0, 2.0, 3.0])
+y = Tensor([4.0, 5.0, 6.0])
+
+# Operations
+z = x + y           # Element-wise addition
+w = x * y           # Element-wise multiplication
+a = x @ y.T         # Matrix multiplication
+
+# Backward pass
+a.backward()
+print(x.grad)       # Gradients computed automatically!
+```
+
+### Training a Neural Network (Regression)
+```python
+from smoltorch import Tensor, MLP, SGD
+from sklearn.datasets import make_regression
+import numpy as np
+
+# Generate data
+X, y = make_regression(n_samples=100, n_features=5, noise=10)
+y = y.reshape(-1, 1)
+
+# Create model
+model = MLP([5, 16, 16, 1])  # 5 inputs -> 16 -> 16 -> 1 output
+optimizer = SGD(model.parameters(), lr=0.001)
+
+# Training loop
+for epoch in range(100):
+    # Forward pass
+    X_tensor = Tensor(X)
+    y_tensor = Tensor(y)
+    y_pred = model(X_tensor)
+    
+    # Compute loss (MSE)
+    loss = ((y_pred - y_tensor) ** 2).mean()
+    
+    # Backward pass
+    optimizer.zero_grad()
+    loss.backward()
+    
+    # Update weights
+    optimizer.step()
+    
+    if (epoch + 1) % 10 == 0:
+        print(f"Epoch {epoch + 1}, Loss: {loss.data:.4f}")
+```
+
+### Binary Classification
+```python
+from smoltorch import Tensor, MLP, SGD, binary_cross_entropy
+from sklearn.datasets import load_breast_cancer
+from sklearn.preprocessing import StandardScaler
+
+# Load and preprocess data
+data = load_breast_cancer()
+X, y = data.data, data.target.reshape(-1, 1)
+scaler = StandardScaler()
+X = scaler.fit_transform(X)
+
+# Create classifier with sigmoid output
+class BinaryClassifier(MLP):
+    def __call__(self, x):
+        x = super().__call__(x)
+        return x.sigmoid()  # Output probabilities
+
+model = BinaryClassifier([30, 16, 8, 1])
+optimizer = SGD(model.parameters(), lr=0.01)
+
+# Training loop
+for epoch in range(200):
+    X_tensor = Tensor(X)
+    y_tensor = Tensor(y)
+    
+    y_pred = model(X_tensor)
+    loss = binary_cross_entropy(y_pred, y_tensor)
+    
+    optimizer.zero_grad()
+    loss.backward()
+    optimizer.step()
+    
+    if (epoch + 1) % 20 == 0:
+        accuracy = ((y_pred.data > 0.5) == y).mean()
+        print(f"Epoch {epoch + 1}, Loss: {loss.data:.4f}, Acc: {accuracy:.4f}")
+
+# Result: ~96% test accuracy on breast cancer dataset! 🎉
+```
+
+---
+
+## 📊 Real-World Performance
+
+smoltorch achieves competitive results on standard benchmarks:
+
+| Dataset | Task | Test Accuracy | Epochs |
+|---------|------|---------------|--------|
+| Breast Cancer | Binary Classification | 96.5% | 200 |
+| Synthetic Regression | Regression | MSE: 95.7 | 100 |
+
+---
+
+## 🏗️ Architecture
+
+### Computational Graph
+
+smoltorch builds a dynamic computational graph during the forward pass:
+```python
+x = Tensor([2.0])
+y = Tensor([3.0])
+z = (x * y) + (x ** 2)  # Graph: z -> [+] -> [*, **] -> [x, y]
+
+z.backward()  # Backpropagate through graph
+print(x.grad)  # dz/dx = y + 2x = 3 + 4 = 7.0
+```
+
+### How Autograd Works
+
+1. **Forward pass**: Build computational graph with operations as nodes
+2. **Topological sort**: Order nodes for correct gradient flow
+3. **Backward pass**: Apply chain rule in reverse topological order
+4. **Gradient accumulation**: Sum gradients from multiple paths
+
+Example with broadcasting:
+```python
+x = Tensor([[1, 2, 3]])    # shape (1, 3)
+y = Tensor([[1], [2]])      # shape (2, 1)
+z = x + y                   # shape (2, 3) - broadcasting!
+
+z.backward()
+# x.grad sums over broadcast dimensions: shape (1, 3)
+# y.grad sums over broadcast dimensions: shape (2, 1)
+```
+
+---
+
+## 🧠 Supported Operations
+
+### Element-wise Operations
+```python
+z = x + y      # Addition with broadcasting
+z = x - y      # Subtraction
+z = x * y      # Multiplication
+z = x / y      # Division
+z = x ** 2     # Power
+```
+
+### Matrix Operations
+```python
+z = x @ y      # Matrix multiplication (with batch support)
+```
+
+### Activation Functions
+```python
+z = x.relu()     # ReLU: max(0, x)
+z = x.tanh()     # Tanh: (e^2x - 1) / (e^2x + 1)
+z = x.sigmoid()  # Sigmoid: 1 / (1 + e^-x)
+```
+
+### Reductions
+```python
+z = x.sum()              # Sum all elements
+z = x.sum(axis=0)        # Sum along axis
+z = x.mean()             # Mean of all elements
+z = x.mean(axis=1)       # Mean along axis
+```
+
+### Other
+```python
+z = x.log()    # Natural logarithm
+```
+
+---
+
+## 📚 Examples
+
+Check out the `examples/` directory:
+
+- [`train_regression.py`](examples/train_regression.py) - Train on synthetic regression data
+- [`train_classification.py`](examples/train_classification.py) - Binary classification on breast cancer dataset
+
+Run them:
+```bash
+uv run examples/train_regression.py
+uv run examples/train_classification.py
+```
+
+---
+
+## 🧪 Testing
+
+Run the test suite:
+```bash
+uv run pytest
+```
+
+Tests cover:
+- ✅ Addition with broadcasting
+- ✅ Multiplication with broadcasting
+- ✅ Matrix multiplication
+- ✅ Activation functions (ReLU, tanh, sigmoid)
+- ✅ Reductions (sum, mean)
+- ✅ Linear layers
+- ✅ Multi-layer perceptrons
+- ✅ End-to-end training
+
+---
+
+## 🗺️ Roadmap
+
+### Coming Soon
+- [ ] **More optimizers**: Adam, RMSprop with momentum
+- [ ] **More activations**: Leaky ReLU, ELU, Softmax
+- [ ] **Regularization**: Dropout, L2 weight decay
+- [ ] **Mini-batch training**: Efficient batch processing
+- [ ] **Multi-class classification**: Softmax + Cross-Entropy loss
+
+### Future
+- [ ] **Convolutional layers**: CNN support for images
+- [ ] **Model serialization**: Save/load weights in safetensors format
+- [ ] **GPU acceleration**: Explore Metal Performance Shaders for Apple Silicon
+- [ ] **Better initialization**: He initialization for ReLU networks
+- [ ] **Learning rate scheduling**: Decay strategies
+
+---
+
+## 🎓 Learning Resources
+
+If you're learning from smoltorch, these resources complement it well:
+
+- [Andrej Karpathy's micrograd](https://github.com/karpathy/micrograd) - The original inspiration
+- [Neural Networks: Zero to Hero](https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ) - Video series by Andrej Karpathy
+- [The Matrix Calculus You Need For Deep Learning](https://arxiv.org/abs/1802.01528) - Paper on backpropagation math
+
+---
+
+## 🤝 Contributing
+
+Contributions are welcome! Please feel free to submit a Pull Request. For major changes:
+
+1. Fork the repository
+2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
+3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
+4. Push to the branch (`git push origin feature/AmazingFeature`)
+5. Open a Pull Request
+
+---
+
+## 📄 License
+
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+
+---
+
+## 🙏 Acknowledgments
+
+- **Andrej Karpathy** for [micrograd](https://github.com/karpathy/micrograd) and the brilliant educational content
+- **PyTorch team** for API design inspiration
+- The deep learning community for making knowledge accessible
+
+---
+
+## 📬 Contact
+
+Created by Kashif - feel free to reach out!
+
+- GitHub: [@kashifulhaque](https://github.com/kashifulhaque)
+- Twitter: [@notifkash](https://twitter.com/notifkash)
+
+---
+
+<div align="center">
+
+**⭐ Star this repo if you found it helpful!**
+
+Built with ❤️ for learners and tinkerers
+
+</div>
--- a/examples/train_classification.py
+++ b/examples/train_classification.py
@@ -2,8 +2,8 @@ import numpy as np
 from sklearn.datasets import load_breast_cancer
 from sklearn.model_selection import train_test_split
 from sklearn.preprocessing import StandardScaler
-from nanotorch.tensor import Tensor
-from nanotorch.nn import MLP, SGD, binary_cross_entropy
+from smoltorch.tensor import Tensor
+from smoltorch.nn import MLP, SGD, binary_cross_entropy

 # Load breast cancer dataset (binary classification)
 print("Loading breast cancer dataset...")
--- a/examples/train_regression.py
+++ b/examples/train_regression.py
@@ -1,7 +1,7 @@
 from sklearn.datasets import make_regression
 from sklearn.model_selection import train_test_split
-from nanotorch.tensor import Tensor
-from nanotorch.nn import MLP, SGD
+from smoltorch.tensor import Tensor
+from smoltorch.nn import MLP, SGD

 # Generate synthetic regression data
 print("Generating data...")
--- a/micrograd/init.py
+++ b/micrograd/init.py
--- a/micrograd/engine.py
+++ b/micrograd/engine.py
@@ -1,104 +0,0 @@
-import math
-
-class Value:
-  def __init__(self, data, _parents=(), _op=''):
-    self.data = data
-    self._parents = _parents
-    self._op = _op
-
-    # gradient
-    self.grad = 0.0 # at init, the value does not affect the output
-    self._backward = lambda: None
-  
-  def __repr__(self):
-    return f"Value(data={self.data})"
-  
-  def __add__(self, other: 'Value') -> 'Value':
-    other = other if isinstance(other, Value) else Value(other)
-    out = Value(self.data + other.data, (self, other), '+')
-
-    def _backward():
-      self.grad += 1.0 * out.grad
-      other.grad += 1.0 * out.grad
-    out._backward = _backward
-
-    return out
-  
-  def __radd__(self, other: 'Value') -> 'Value':
-    return self + other
-  
-  def __mul__(self, other: 'Value') -> 'Value':
-    other = other if isinstance(other, Value) else Value(other)
-    out = Value(self.data * other.data, (self, other), '*')
-
-    def _backward():
-      self.grad += other.data * out.grad
-      other.grad += self.data * out.grad
-    out._backward = _backward
-
-    return out
-  
-  def __neg__(self) -> 'Value':
-    return -1 * self
-  
-  def __sub__(self, other: 'Value') -> 'Value':
-    return self + (-other)
-  
-  def __rsub__(self, other: 'Value') -> 'Value':
-    return Value(other) - self
-  
-  def __rmul__(self, other: 'Value') -> 'Value':
-    return self * other
-  
-  def __pow__(self, other: 'Value') -> 'Value':
-    assert isinstance(other, (int, float)), "only support int/float powers for now"
-    out = Value(self.data**other, (self, ), f'**{other}')
-
-    def _backward():
-      self.grad += (other * self.data**(other - 1)) * out.grad
-    out._backward = _backward
-
-    return out
-  
-  def __truediv__(self, other: 'Value') -> 'Value':
-    return self * other**-1
-  
-  def tanh(self) -> 'Value':
-    x = self.data
-    _tanh = (math.exp(2*x) - 1) / (math.exp(2*x) + 1)
-    out = Value(_tanh, (self, ), 'tanh')
-
-    def _backward():
-      self.grad += (1 - _tanh ** 2) * out.grad
-    out._backward = _backward
-
-    return out
-  
-  def exp(self) -> 'Value':
-    x = self.data
-    out = Value(math.exp(x), (self, ), 'exp')
-
-    def _backward():
-      self.grad += out.data * out.grad
-    out._backward = _backward
-
-    return out
-  
-  def backward(self):
-    topo = []
-    visited = set()
-
-    def build_topo(v: 'Value'):
-      if v not in visited:
-        visited.add(v)
-        
-        for child in v._parents:
-          build_topo(child)
-        
-        topo.append(v)
-    
-    build_topo(self)
-
-    self.grad = 1.0
-    for node in reversed(topo):
-      node._backward()
--- a/micrograd/nn.py
+++ b/micrograd/nn.py
@@ -1,39 +0,0 @@
-import random
-from engine import Value
-
-class Neuron:
-  def __init__(self, n_inputs: int):
-    self.w = [Value(random.uniform(-1, 1)) for _ in range(n_inputs)]
-    self.b = Value(random.uniform(-1, 1))
-  
-  def __call__(self, x: list) -> Value:
-    activations = sum((w_i * x_i for w_i, x_i in zip(self.w, x)), self.b)
-    out = activations.tanh()
-    return out
-  
-  def parameters(self):
-    return self.w + [self.b]
-
-class Layer:
-  def __init__(self, n_inputs: int, n_outputs: int):
-    self.neurons = [Neuron(n_inputs) for _ in range(n_outputs)]
-  
-  def __call__(self, x: list) -> list[Value]:
-    outs = [n(x) for n in self.neurons]
-    return outs
-  
-  def parameters(self):
-    return [p for n in self.neurons for p in n.parameters()]
-
-class MLP:
-  def __init__(self, n_inputs: int, n_outputs: int):
-    sz = [n_inputs] + n_outputs
-    self.layers = [Layer(sz[i], sz[i + 1]) for i in range(len(n_outputs))]
-  
-  def __call__(self, x):
-    for layer in self.layers:
-      x = layer(x)
-    return x
-  
-  def parameters(self):
-    return [p for layer in self.layers for p in layer.parameters()]
--- a/nanotorch/init.py
+++ b/nanotorch/init.py
--- a/notebooks/nb.ipynb
+++ b/notebooks/nb.ipynb
--- a/notebooks/nb.py
+++ b/notebooks/nb.py
@@ -1,338 +0,0 @@
-#!/usr/bin/env python
-# coding: utf-8
-
-# In[1]:
-
-
-import math
-import mlx.core as mx
-import matplotlib.pyplot as plt
-
-get_ipython().run_line_magic('matplotlib', 'inline')
-
-
-# In[2]:
-
-
-def f(x):
-  return 3*x**2 - 4*x + 5
-
-
-# In[3]:
-
-
-f(3.0)
-
-
-# In[4]:
-
-
-xs = mx.arange(-5, 5, 0.25)
-ys = f(xs)
-
-plt.plot(xs, ys)
-
-
-# **Simple refresher on differentiation**
-# $$
-# L = \lim_{h \rightarrow 0}\frac{f(x + h) - f(x)}{h}
-# $$
-
-# In[5]:
-
-
-h = 0.0001
-x = 3.0
-
-
-# In[6]:
-
-
-f(x), f(x + h)
-
-
-# In[7]:
-
-
-(f(x + h) - f(x)) / h
-
-
-# ### **micrograd implementation**
-
-# In[8]:
-
-
-class Value:
-  def __init__(self, data, _parents=(), _op=''):
-    self.data = data
-    self._parents = _parents
-    self._op = _op
-
-    # gradient
-    self.grad = 0.0 # at init, the value does not affect the output
-    self._backward = lambda: None
-
-  def __repr__(self):
-    return f"Value(data={self.data})"
-
-  def __add__(self, other: 'Value') -> 'Value':
-    other = other if isinstance(other, Value) else Value(other)
-    out = Value(self.data + other.data, (self, other), '+')
-
-    def _backward():
-      self.grad += 1.0 * out.grad
-      other.grad += 1.0 * out.grad
-    out._backward = _backward
-
-    return out
-
-  def __radd__(self, other: 'Value') -> 'Value':
-    return self + other
-
-  def __mul__(self, other: 'Value') -> 'Value':
-    other = other if isinstance(other, Value) else Value(other)
-    out = Value(self.data * other.data, (self, other), '*')
-
-    def _backward():
-      self.grad += other.data * out.grad
-      other.grad += self.data * out.grad
-    out._backward = _backward
-
-    return out
-
-  def __neg__(self) -> 'Value':
-    return -1 * self
-
-  def __sub__(self, other: 'Value') -> 'Value':
-    return self + (-other)
-
-  def __rsub__(self, other: 'Value') -> 'Value':
-    return Value(other) - self
-
-  def __rmul__(self, other: 'Value') -> 'Value':
-    return self * other
-
-  def __pow__(self, other: 'Value') -> 'Value':
-    assert isinstance(other, (int, float)), "only support int/float powers for now"
-    out = Value(self.data**other, (self, ), f'**{other}')
-
-    def _backward():
-      self.grad += (other * self.data**(other - 1)) * out.grad
-    out._backward = _backward
-
-    return out
-
-  def __truediv__(self, other: 'Value') -> 'Value':
-    return self * other**-1
-
-  def tanh(self) -> 'Value':
-    x = self.data
-    _tanh = (math.exp(2*x) - 1) / (math.exp(2*x) + 1)
-    out = Value(_tanh, (self, ), 'tanh')
-
-    def _backward():
-      self.grad += (1 - _tanh ** 2) * out.grad
-    out._backward = _backward
-
-    return out
-
-  def exp(self) -> 'Value':
-    x = self.data
-    out = Value(math.exp(x), (self, ), 'exp')
-
-    def _backward():
-      self.grad += out.data * out.grad
-    out._backward = _backward
-
-    return out
-
-  def backward(self):
-    topo = []
-    visited = set()
-
-    def build_topo(v: 'Value'):
-      if v not in visited:
-        visited.add(v)
-
-        for child in v._parents:
-          build_topo(child)
-
-        topo.append(v)
-
-    build_topo(self)
-
-    self.grad = 1.0
-    for node in reversed(topo):
-      node._backward()
-
-
-# In[9]:
-
-
-# manual backprop
-a = Value(2.0)
-b = Value(-3.0)
-c = Value(10.0)
-d = a*b + c
-
-# If we change 'a' by a small amount 'h'
-# How would the gradient change?
-a = Value(a.data + h)
-d_ = a*b + c
-print(f"Gradient: {(d_.data - d.data)/h}")
-
-
-# **autograd example**
-
-# In[10]:
-
-
-x1 = Value(2.0)
-x2 = Value(0.0)
-
-w1 = Value(-3.0)
-w2 = Value(1.0)
-
-b = Value(6.8813735870195432)
-
-x1w1 = x1*w1
-x2w2 = x2*w2
-x1w1x2w2 = x1w1 + x2w2
-n = x1w1x2w2 + b
-o = n.tanh()
-
-
-# ### **Neural Network, using micrograd**
-
-# In[11]:
-
-
-import random
-
-class Neuron:
-  def __init__(self, n_inputs: int):
-    self.w = [Value(random.uniform(-1, 1)) for _ in range(n_inputs)]
-    self.b = Value(random.uniform(-1, 1))
-
-  def __call__(self, x: list) -> Value:
-    activations = sum((w_i * x_i for w_i, x_i in zip(self.w, x)), self.b)
-    out = activations.tanh()
-    return out
-
-  def parameters(self):
-    return self.w + [self.b]
-
-
-# In[12]:
-
-
-class Layer:
-  def __init__(self, n_inputs: int, n_outputs: int):
-    self.neurons = [Neuron(n_inputs) for _ in range(n_outputs)]
-
-  def __call__(self, x: list) -> list[Value]:
-    outs = [n(x) for n in self.neurons]
-    return outs
-
-  def parameters(self):
-    return [p for n in self.neurons for p in n.parameters()]
-
-
-# In[13]:
-
-
-class MLP:
-  def __init__(self, n_inputs: int, n_outputs: int):
-    sz = [n_inputs] + n_outputs
-    self.layers = [Layer(sz[i], sz[i + 1]) for i in range(len(n_outputs))]
-
-  def __call__(self, x):
-    for layer in self.layers:
-      x = layer(x)
-    return x
-
-  def parameters(self):
-    return [p for layer in self.layers for p in layer.parameters()]
-
-
-# In[14]:
-
-
-# single neuron example
-x = [2.5, 3.5]
-n = Neuron(len(x))
-n(x)
-
-
-# In[15]:
-
-
-# layer of neurons example
-x = [1.5, 4.5]
-nn  = Layer(2, 3)
-nn(x)
-
-
-# In[16]:
-
-
-# MLP example: input with 3 neurons, first layers with 4 neurons, second layer with 4 neurons, last output layer with 1 neuron
-x = [2.0, 3.0, -1.0]
-nn = MLP(3, [4, 4, 1])
-nn(x)
-
-
-# ### **Tune weights of our neural net**
-
-# In[72]:
-
-
-nn = MLP(3, [4, 4, 1])
-
-
-# In[73]:
-
-
-xs = [
-  [2.0, 3.0, -1.0],
-  [3.0, -1.0, 0.5],
-  [0.5, 1.0, 1.0],
-  [1.0, 1.0, -1.0]
-]
-ys = [1.0, -1.0, -1.0, 1.0]
-
-
-# In[74]:
-
-
-# Training loop
-lr = 0.05
-epochs = 50
-for epoch in range(epochs):
-  # forward pass
-  y_preds = [nn(x) for x in xs]
-  loss = sum((y_pred[0] - y_true)**2 for y_true, y_pred in zip(ys, y_preds))
-
-  # backward pass
-  for p in nn.parameters(): # zero grad
-    p.grad = 0.0
-  loss.backward()
-
-  # update
-  for p in nn.parameters():
-    p.data += -lr * p.grad
-
-  print(epoch, loss.data)
-
-
-# In[75]:
-
-
-y_preds
-
-
-# In[ ]:
-
-
-
-
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,9 +1,13 @@
 [project]
-name = "karpathy-micrograd"
+name = "smoltorch"
 version = "0.1.0"
-description = "Add your description here"
+description = "A tiny autograd engine and neural network library built from first principles"
 readme = "README.md"
 requires-python = ">=3.12"
+license = {text = "MIT"}
+authors = [
+    {name = "Kashif", email = "me@ifkash.dev"}
+]
 dependencies = [
    "ipykernel>=7.1.0",
    "ipython>=9.7.0",
@@ -20,4 +24,9 @@ dev = [
 ]

 [tool.setuptools]
-packages = ["nanotorch", "micrograd"]
+packages = ["smoltorch"]
+
+[project.urls]
+Homepage = "https://github.com/kashifulhaque/smoltorch"
+Repository = "https://github.com/kashifulhaque/smoltorch"
+Issues = "https://github.com/kashifulhaque/smoltorch/issues"
--- a/smoltorch/init.py
+++ b/smoltorch/init.py
@@ -0,0 +1,18 @@
+"""
+smoltorch: A tiny autograd engine and neural network library
+Built from first principles for educational purposes
+"""
+
+__version__ = "0.1.0"
+
+from smoltorch.optim import SGD
+from smoltorch.tensor import Tensor
+from smoltorch.nn import Linear, MLP, binary_cross_entropy
+
+__all__ = [
+  "Tensor",
+  "Linear",
+  "MLP",
+  "binary_cross_entropy",
+  "SGD"
+]
--- a/smoltorch/nn.py
+++ b/smoltorch/nn.py
@@ -1,5 +1,5 @@
 import numpy as np
-from nanotorch.tensor import Tensor
+from smoltorch.tensor import Tensor

 # helper functions
 def binary_cross_entropy(y_pred, y_true):
@@ -90,21 +90,3 @@ class MLP:
    for layer in self.layers:
      params.extend(layer.parameters())
    return params
-
-class SGD:
-  def __init__(self, parameters, lr=0.01):
-    """
-    Args
-      parameters: list of Tensor objects to minimize
-      lr: learning rate
-    """
-    self.parameters = parameters
-    self.lr = lr
-  
-  def step(self):
-    for param in self.parameters:
-      param.data -= self.lr * param.grad
-  
-  def zero_grad(self):
-    for param in self.parameters:
-      param.grad = np.zeros_like(param.data, dtype=np.float64)
--- a/smoltorch/optim.py
+++ b/smoltorch/optim.py
@@ -0,0 +1,19 @@
+import numpy as np
+
+class SGD:
+  def __init__(self, parameters, lr=0.01):
+    """
+    Args
+      parameters: list of Tensor objects to minimize
+      lr: learning rate
+    """
+    self.parameters = parameters
+    self.lr = lr
+  
+  def step(self):
+    for param in self.parameters:
+      param.data -= self.lr * param.grad
+  
+  def zero_grad(self):
+    for param in self.parameters:
+      param.grad = np.zeros_like(param.data, dtype=np.float64)
--- a/smoltorch/tensor.py
+++ b/smoltorch/tensor.py
--- a/tests/test_activations.py
+++ b/tests/test_activations.py
@@ -1,4 +1,4 @@
-from nanotorch.tensor import Tensor
+from smoltorch.tensor import Tensor

 # Test 1: ReLU
 print("Test 1 - ReLU:")
--- a/tests/test_add.py
+++ b/tests/test_add.py
@@ -1,4 +1,4 @@
-from nanotorch.tensor import Tensor
+from smoltorch.tensor import Tensor

 # Test 1: Simple addition (no broadcasting)
 a = Tensor([1.0, 2.0, 3.0])
--- a/tests/test_linear.py
+++ b/tests/test_linear.py
@@ -1,5 +1,5 @@
-from nanotorch.nn import Linear
-from nanotorch.tensor import Tensor
+from smoltorch.nn import Linear
+from smoltorch.tensor import Tensor

 # Test 1: Single sample forward pass
 print("Test 1 - Single forward pass:")
--- a/tests/test_matmul.py
+++ b/tests/test_matmul.py
@@ -1,4 +1,4 @@
-from nanotorch.tensor import Tensor
+from smoltorch.tensor import Tensor

 # Test 1: Simple 2D matmul
 print("Test 1 - Simple 2D matmul:")
--- a/tests/test_mlp.py
+++ b/tests/test_mlp.py
@@ -1,6 +1,6 @@
 import numpy as np
-from nanotorch.tensor import Tensor
-from nanotorch.nn import MLP
+from smoltorch.tensor import Tensor
+from smoltorch.nn import MLP

 # Test 1: MLP forward pass
 print("Test 1 - MLP forward pass:")
--- a/tests/test_mul.py
+++ b/tests/test_mul.py
@@ -1,4 +1,4 @@
-from nanotorch.tensor import Tensor
+from smoltorch.tensor import Tensor

 # Test 1: Simple multiplication (no broadcasting)
 print("Test 1 - No broadcasting:")
--- a/tests/test_reductions.py
+++ b/tests/test_reductions.py
@@ -1,4 +1,4 @@
-from nanotorch.tensor import Tensor
+from smoltorch.tensor import Tensor

 # Test 1: Sum all elements
 print("Test 1 - Sum (all elements):")
--- a/uv.lock
+++ b/uv.lock
@@ -471,39 +471,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/b1/dd/ead9d8ea85bf202d90cc513b533f9c363121c7792674f78e0d8a854b63b4/jupyterlab_pygments-0.3.0-py3-none-any.whl", hash = "sha256:841a89020971da1d8693f1a99997aefc5dc424bb1b251fd6322462a1b8842780", size = 15884, upload-time = "2023-11-23T09:26:34.325Z" },
 ]

-[[package]]
-name = "karpathy-micrograd"
-version = "0.1.0"
-source = { virtual = "." }
-dependencies = [
-    { name = "ipykernel" },
-    { name = "ipython" },
-    { name = "matplotlib" },
-    { name = "mlx" },
-    { name = "nbconvert" },
-    { name = "numpy" },
-    { name = "scikit-learn" },
-]
-
-[package.dev-dependencies]
-dev = [
-    { name = "pytest" },
-]
-
-[package.metadata]
-requires-dist = [
-    { name = "ipykernel", specifier = ">=7.1.0" },
-    { name = "ipython", specifier = ">=9.7.0" },
-    { name = "matplotlib", specifier = ">=3.10.7" },
-    { name = "mlx", specifier = ">=0.29.4" },
-    { name = "nbconvert", specifier = ">=7.16.6" },
-    { name = "numpy", specifier = ">=2.3.4" },
-    { name = "scikit-learn", specifier = ">=1.7.2" },
-]
-
-[package.metadata.requires-dev]
-dev = [{ name = "pytest", specifier = ">=9.0.1" }]
-
 [[package]]
 name = "kiwisolver"
 version = "1.4.9"
@@ -1347,6 +1314,39 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl", hash = "sha256:4721f391ed90541fddacab5acf947aa0d3dc7d27b2e1e8eda2be8970586c3274", size = 11050, upload-time = "2024-12-04T17:35:26.475Z" },
 ]

+[[package]]
+name = "smoltorch"
+version = "0.1.0"
+source = { virtual = "." }
+dependencies = [
+    { name = "ipykernel" },
+    { name = "ipython" },
+    { name = "matplotlib" },
+    { name = "mlx" },
+    { name = "nbconvert" },
+    { name = "numpy" },
+    { name = "scikit-learn" },
+]
+
+[package.dev-dependencies]
+dev = [
+    { name = "pytest" },
+]
+
+[package.metadata]
+requires-dist = [
+    { name = "ipykernel", specifier = ">=7.1.0" },
+    { name = "ipython", specifier = ">=9.7.0" },
+    { name = "matplotlib", specifier = ">=3.10.7" },
+    { name = "mlx", specifier = ">=0.29.4" },
+    { name = "nbconvert", specifier = ">=7.16.6" },
+    { name = "numpy", specifier = ">=2.3.4" },
+    { name = "scikit-learn", specifier = ">=1.7.2" },
+]
+
+[package.metadata.requires-dev]
+dev = [{ name = "pytest", specifier = ">=9.0.1" }]
+
 [[package]]
 name = "soupsieve"
 version = "2.8"
Author	SHA1	Message	Date
Kashif	97505b15f6	Update README to include blog link Added a link to the blog for smoltorch.	2025-11-20 13:10:30 +05:30
kashifulhaque	978a7e7751	Release v0.1.0	2025-11-17 22:20:13 +05:30
kashifulhaque	eb7146e578	camera ready version	2025-11-17 22:16:49 +05:30