PicoGrad is a tiny autograd engine that implements backpropagation (reverse-mode autodiff) over a dynamically built DAG. It's a supercharged version of micrograd with a few extra bells and whistles.
We called it "pico" for the same reason you might call your gaming PC a "little setup" – pure, delightful understatement.
- Implements a general-purpose Tensor class backed by NumPy
- Distributed computing support, automatic parallelization across CPU cores
- Supports dynamic computational graph construction
- Provides automatic differentiation (autograd) capabilities
- Includes basic neural network building blocks (Neuron, Layer, MLP)
- Offers graph optimization for improved performance
- Graph visualization with Graphviz
# Clone the repository
git clone https://github.com/yourusername/picograd.git
cd picograd
# Install dependencies
pip install -r requirements.txtfrom picograd import Value, Tensor
# Create values and perform operations
a = Value(2.0, label='a')
b = Value(3.0, label='b')
c = a * b + a**2
c.backward()
print(f"c = {c.data.data}") # Forward pass result
print(f"dc/da = {a.grad.data}") # Gradient of c with respect to aPicoGrad can be used to build, train neural networks and visualize the computational graph.
from picograd.nn import MLP, Value
# Create a multi-layer perceptron: 2 inputs, two hidden layers of 8 neurons, 1 output
model = MLP(2, [8, 8, 1])
# Forward pass
x = [Value(1.0), Value(2.0)]
output = model(x)PicoGrad supports distributed computing for large tensors, automatically parallelizing operations across multiple CPU cores using Python's multiprocessing.
from picograd import Tensor, set_distributed_config
import numpy as np
# Enable distributed computing globally
set_distributed_config(
enabled=True,
num_workers=4, # Number of parallel workers (defaults to CPU count)
min_elements_for_parallel=10000, # Minimum tensor size to trigger parallelization
use_processes=True # Use processes (True) or threads (False)
)
# Create a large tensor - automatically uses distributed operations
large_tensor = Tensor(np.random.randn(1000000, 100))
print(large_tensor) # Tensor(shape=(1000000, 100), dtype=float32, distributed=True)
# Operations are automatically parallelized
result = large_tensor * 2 + 1
summed = result.sum()You can also control distributed computing on a per-tensor basis:
# Force a tensor to use distributed computing
tensor = Tensor(data, distributed=True)
# Disable distributed for a specific tensor
small_tensor = Tensor(data, distributed=False)
# Toggle distributed mode
tensor.enable_distributed()
tensor.disable_distributed()Distributed operations include:
- Element-wise:
+,-,*,/,**, negation - Reductions:
sum(),mean(),max(),min() - Utilities:
reshape(),transpose(), indexing
PicoGrad includes graph optimization techniques to improve computational efficiency:
python test.py

