π Support This Project: Your sponsorship fuels innovation in CNN research and computer vision technologies. Become a sponsor to help maintain and expand this valuable resource!
Welcome to one of the most comprehensive collections of Convolutional Neural Network (CNN) techniques, architectures, and implementations for computer vision research. This repository serves as a hub for cutting-edge CNN architectures and optimization methods, designed to help researchers and practitioners achieve state-of-the-art results.
π Cutting-edge Updates | π‘ Expert Insights | π― Production-Ready Code | π Benchmark Results
Related Repositories:
- π Vision Transformers (ViT) - Transformer-based computer vision
- π― Object Detection Techniques - YOLO, Faster R-CNN, RetinaNet
- πΊοΈ Semantic Segmentation - U-Net, DeepLab, SegFormer
- π Transfer Learning Hub - Pretrained models and fine-tuning
- 40+ CNN Techniques organized by category
- Complete Implementations with runnable code
- Benchmark Comparisons across architectures
- Training Strategies and optimization tips
- Production Deployment guides
- Evaluation Metrics and testing frameworks
- Dataset Guides for popular benchmarks
- Regular Updates with latest research
Classic CNNs that established core principles
State-of-the-art networks with advanced features
Channel and spatial attention for better features
Data augmentation, transfer learning, schedulers
Advanced optimizers, mixed precision, batch norm
Detection, segmentation, lightweight models
Pruning, quantization, knowledge distillation
Performance measurement and benchmarking
Pioneer of CNNs for digit recognition
- Paper: Gradient-Based Learning Applied to Document Recognition
- Key Innovation: First successful CNN with backpropagation
- Architecture: 2 conv layers, 2 pooling layers, 3 FC layers
- Parameters: ~60K
- Best For: Learning CNN fundamentals, MNIST
Implementation: scripts/lenet5.py
# Quick Start
python scripts/lenet5.py --dataset mnist --epochs 10When to Use:
- Learning CNN basics
- Teaching/educational purposes
- Simple digit/character recognition
- Proof of concept prototypes
ImageNet champion that sparked the deep learning revolution
- Paper: ImageNet Classification with Deep CNNs
- Key Innovations: ReLU, Dropout, Data Augmentation, GPU training
- Architecture: 5 conv layers, 3 FC layers
- Parameters: 60M
- ImageNet Top-5 Error: 15.3%
Implementation: scripts/alexnet.py
Modern Improvements:
- Add Batch Normalization
- Replace LRN with BatchNorm
- Use AdamW optimizer
- Apply modern data augmentation
Simple and deep architecture with 3x3 convolutions
- Paper: Very Deep Convolutional Networks
- Key Innovation: Depth through repeated 3x3 conv blocks
- Variants: VGG-16 (138M params), VGG-19 (144M params)
- ImageNet Top-5 Error: 7.3%
Implementation: scripts/vggnet.py
Trade-offs:
- β Excellent for transfer learning
- β Simple, uniform architecture
- β High memory requirements (528MB for VGG-16)
- β Slow inference
Efficient multi-scale feature extraction
- Paper: Going Deeper with Convolutions
- Key Innovation: Inception modules with parallel convolutions
- Parameters: 7M (much less than VGG!)
- ImageNet Top-5 Error: 6.7%
Implementation: scripts/inception_v1.py
Inception Evolution:
- Inception-v2: Batch Normalization
- Inception-v3: Factorized convolutions
- Inception-v4: Residual connections
- Inception-ResNet: Hybrid architecture
Skip connections enable very deep networks
- Paper: Deep Residual Learning
- Key Innovation: Residual blocks solve vanishing gradients
- Variants: ResNet-18, 34, 50, 101, 152
- ImageNet Top-5 Error: 3.57% (ResNet-152)
Implementation: scripts/resnet.py
# Training ResNet-50
python scripts/resnet.py --arch resnet50 --dataset imagenet --epochs 90ResNet Variants:
| Model | Layers | Params | Top-1 Acc | GFLOPs |
|---|---|---|---|---|
| ResNet-18 | 18 | 11.7M | 69.8% | 1.8 |
| ResNet-34 | 34 | 21.8M | 73.3% | 3.7 |
| ResNet-50 | 50 | 25.6M | 76.1% | 4.1 |
| ResNet-101 | 101 | 44.5M | 77.4% | 7.8 |
| ResNet-152 | 152 | 60.2M | 78.3% | 11.6 |
When to Use: Default choice for most tasks requiring depth
Dense connections for maximum feature reuse
- Paper: Densely Connected CNNs
- Key Innovation: Each layer connects to all subsequent layers
- Variants: DenseNet-121, 169, 201, 264
- Parameters: 8M (DenseNet-121) - very efficient!
Implementation: scripts/densenet.py
Benefits:
- Stronger gradient flow
- Feature reuse reduces parameters
- Implicit deep supervision
Lightweight CNNs for mobile devices
- Paper: MobileNets: Efficient CNNs
- Key Innovation: Depthwise separable convolutions
- Parameters: 4.2M (v1), 3.4M (v2), 5.4M (v3)
- Speed: 600 imgs/sec on modern GPUs
Implementation: scripts/mobilenet.py
MobileNet Family:
| Version | Key Features | Top-1 Acc | Latency (ms) |
|---|---|---|---|
| v1 | Depthwise conv | 70.6% | 113 |
| v2 | Inverted residuals | 72.0% | 75 |
| v3 | NAS + SE blocks | 75.2% | 51 |
Systematically scaled for optimal efficiency
- Paper: EfficientNet: Rethinking Model Scaling
- Key Innovation: Compound scaling (depth Γ width Γ resolution)
- Parameters: 5.3M (B0) to 66M (B7)
- ImageNet Top-1: 77.1% (B0) to 84.4% (B7)
Implementation: scripts/efficientnet.py
Scaling Coefficients:
# EfficientNet scaling formula
depth = Ξ±^Ο
width = Ξ²^Ο
resolution = Ξ³^Ο
where Ξ±Β·Ξ²Β²Β·Ξ³Β² β 2Faster training with Fused-MBConv
- Paper: EfficientNetV2
- Improvements: Progressive learning, NAS optimization
- Speed: 5-11x faster training than EfficientNet
- Top-1 Accuracy: 85.7% (EfficientNetV2-L)
Modernized CNN competing with Vision Transformers
- Paper: A ConvNet for the 2020s
- Key Innovations: Larger kernels (7x7), GELU, LayerNorm
- Top-1 Accuracy: 87.8% (ConvNeXt-XL)
- Performance: Matches Swin Transformers
Implementation: scripts/convnext.py
Channel attention for feature recalibration
- Paper: Squeeze-and-Excitation Networks
- Overhead: <1% additional parameters
- Improvement: 1-2% accuracy boost on ResNet
Implementation: scripts/senet.py
SE Block Code:
class SEBlock(nn.Module):
def __init__(self, channels, reduction=16):
super().__init__()
self.squeeze = nn.AdaptiveAvgPool2d(1)
self.excitation = nn.Sequential(
nn.Linear(channels, channels // reduction),
nn.ReLU(),
nn.Linear(channels // reduction, channels),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
y = self.squeeze(x).view(b, c)
y = self.excitation(y).view(b, c, 1, 1)
return x * y.expand_as(x)Channel + Spatial Attention
- Paper: CBAM: Convolutional Block Attention Module
- Improvements: 1.5-2.5% over baselines
- Cost: Negligible overhead
Implementation: scripts/cbam.py
Techniques:
- Basic: Flip, Rotate, Crop, Color Jitter
- Advanced: Mixup, CutMix, CutOut, AutoAugment
- Domain-Specific: Medical imaging, satellite imagery
Implementation: scripts/data_augmentation.py
Augmentation Comparison:
| Technique | Description | Accuracy Gain | Use Case |
|---|---|---|---|
| Random Crop | Random patches | +1-2% | General |
| Mixup | Blend two images | +1-3% | Small datasets |
| CutMix | Paste patches | +1-2% | ImageNet scale |
| AutoAugment | Learned policies | +2-4% | Large datasets |
| RandAugment | Simple random | +1-3% | Any dataset |
Strategies:
- Feature Extraction: Freeze backbone
- Fine-tuning: Unfreeze gradually
- Domain Adaptation: Adjust for domain shift
Implementation: scripts/transfer_learning.py
Fine-tuning Guidelines:
# Strategy 1: Feature Extraction
for param in model.parameters():
param.requires_grad = False
model.fc = nn.Linear(2048, num_classes) # Only train classifier
# Strategy 2: Fine-tune last N layers
for param in model.layer4.parameters():
param.requires_grad = True
# Strategy 3: Differential learning rates
optimizer = torch.optim.SGD([
{'params': model.layer4.parameters(), 'lr': 1e-3},
{'params': model.fc.parameters(), 'lr': 1e-2}
])Schedulers:
- Step Decay: Reduce by factor every N epochs
- Cosine Annealing: Smooth decrease to zero
- OneCycle: Warm up then anneal
- ReduceLROnPlateau: Adaptive based on metrics
Implementation: scripts/lr_scheduling.py
| Optimizer | Best For | Learning Rate | Notes |
|---|---|---|---|
| SGD + Momentum | Final training | 0.1 β 0.001 | Most stable |
| Adam | Quick experiments | 1e-3 β 1e-4 | Fast convergence |
| AdamW | Most tasks | 1e-3 β 1e-5 | Weight decay fixed |
| RAdam | Warmup-free | 1e-3 | Rectified Adam |
| LAMB | Large batch | Scaled | Distributed training |
| AdaBelief | Better generalization | 1e-3 | Recent research |
Implementation: scripts/optimizers.py
Architectures:
| Model | Type | Speed | Accuracy | Best For |
|---|---|---|---|---|
| YOLO v8 | One-stage | β‘β‘β‘ | 53.9 mAP | Real-time |
| Faster R-CNN | Two-stage | β‘ | 42.0 mAP | Accuracy |
| RetinaNet | One-stage | β‘β‘ | 40.8 mAP | Small objects |
| EfficientDet | One-stage | β‘β‘ | 55.1 mAP | Efficiency |
| DETR | Transformer | β‘ | 42.0 mAP | Novel approach |
Implementation: scripts/object_detection/
Architectures:
| Model | Year | Best Use Case | Params | Speed |
|---|---|---|---|---|
| U-Net | 2015 | Medical imaging | 7.8M | Fast |
| FCN | 2015 | Scene segmentation | 134M | Medium |
| DeepLab v3+ | 2018 | High accuracy | 40M | Slow |
| PSPNet | 2017 | Context-aware | 250M | Slow |
| SegFormer | 2021 | SOTA efficiency | 3-84M | Fast |
Implementation: scripts/segmentation/
Key Models:
- Mask R-CNN: Extends Faster R-CNN with masks
- YOLACT: Real-time instance segmentation
- SOLOv2: Segments by object locations
Implementation: scripts/instance_segmentation/
Process:
- Train large teacher model
- Train small student with soft targets
- Combine hard and soft loss
Implementation: scripts/distillation.py
Results:
- 50-70% size reduction
- 1-3% accuracy drop
- 2-5x speed improvement
Types:
- Unstructured: Remove individual weights
- Structured: Remove entire filters/channels
- Iterative: Prune β Retrain β Repeat
Implementation: scripts/pruning.py
Typical Results:
- 50-90% sparsity achievable
- 2-10x speedup possible
- Minimal accuracy loss (<2%)
Methods:
| Method | Precision | Accuracy Loss | Speed | Size Reduction |
|---|---|---|---|---|
| FP32 | Full | Baseline | 1x | Baseline |
| FP16 | Half | <0.1% | 2x | 50% |
| INT8 | 8-bit | 0.5-1% | 4x | 75% |
| INT4 | 4-bit | 1-3% | 8x | 87.5% |
Implementation: scripts/quantization.py
Classification Metrics:
| Metric | Formula | Use Case |
|---|---|---|
| Accuracy | (TP+TN)/(TP+TN+FP+FN) | Balanced datasets |
| Precision | TP/(TP+FP) | Minimize false positives |
| Recall | TP/(TP+FN) | Minimize false negatives |
| F1-Score | 2Γ(PΓR)/(P+R) | Imbalanced data |
| AUC-ROC | Area under ROC curve | Binary classification |
| Top-5 Accuracy | Correct in top 5 predictions | ImageNet |
Implementation: scripts/evaluation_metrics.py
| Dataset | Images | Classes | Resolution | Size | Use Case |
|---|---|---|---|---|---|
| MNIST | 70K | 10 | 28Γ28 | 11MB | Digits |
| CIFAR-10 | 60K | 10 | 32Γ32 | 163MB | Objects |
| CIFAR-100 | 60K | 100 | 32Γ32 | 169MB | Fine-grained |
| ImageNet | 1.2M | 1000 | 224Γ224 | 150GB | General |
| COCO | 330K | 80 | Variable | 25GB | Detection |
| Pascal VOC | 11K | 20 | Variable | 2GB | Segmentation |
| CelebA | 200K | 40 attrs | 178Γ218 | 1.4GB | Faces |
Dataset Links: docs/datasets.md
| Model | Params | FLOPs | Top-1 | Top-5 | Speed (img/s) |
|---|---|---|---|---|---|
| EfficientNetV2-L | 119M | 53B | 85.7% | 97.5% | 180 |
| ConvNeXt-XL | 350M | 60B | 87.8% | 98.5% | 120 |
| ResNet-152 | 60M | 11.6B | 78.3% | 94.2% | 220 |
| EfficientNet-B7 | 66M | 37B | 84.4% | 97.1% | 150 |
| MobileNetV3 | 5.4M | 0.22B | 75.2% | 92.2% | 600 |
| ViT-H/14 | 632M | 167B | 88.6% | 98.7% | 50 |
| Model | Params | Accuracy | Training Time |
|---|---|---|---|
| ResNet-50 | 25.6M | 95.3% | 2h (V100) |
| EfficientNet-B0 | 5.3M | 96.7% | 1.5h |
| MobileNetV3 | 5.4M | 94.8% | 1h |
| Vision Transformer | 86M | 98.1% | 4h |
| Framework | Pros | Cons | Best For |
|---|---|---|---|
| PyTorch | Pythonic, dynamic graphs, research-friendly | Less mobile support | Research, prototyping |
| TensorFlow | Production-ready, mobile support, ecosystem | Steeper learning curve | Production deployment |
| Keras | Simple API, beginner-friendly | Less flexibility | Quick experiments |
| JAX | Fast, functional, automatic differentiation | Smaller ecosystem | Research, speed |
| ONNX | Cross-platform model format | Not for training | Model deployment |
| Tool | Purpose | Key Features |
|---|---|---|
| Weights & Biases | Experiment tracking | Real-time logging, hyperparameter tuning |
| TensorBoard | Visualization | Graphs, histograms, embeddings |
| MLflow | ML lifecycle | Model registry, reproducibility |
| Neptune.ai | Experiment management | Team collaboration, model versioning |
| Comet.ml | Model monitoring | Production monitoring, debugging |
| Tool | Purpose | Best For |
|---|---|---|
| Albumentations | Data augmentation | Fast, comprehensive |
| imgaug | Image augmentation | Research prototypes |
| torchvision.transforms | PyTorch augmentation | PyTorch projects |
| tf.image | TensorFlow augmentation | TensorFlow projects |
| DALI | Data loading | High-throughput training |
# Clone repository
git clone https://github.com/yourusername/CNN_Research_Techniques.git
cd CNN_Research_Techniques
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Train ResNet-50 on CIFAR-10
python scripts/resnet.py --dataset cifar10 --arch resnet50 --epochs 100
# Transfer learning from ImageNet
python scripts/transfer_learning.py --pretrained --dataset custom --data_path /path/to/data
# Evaluate model
python scripts/evaluate.py --model_path checkpoints/best_model.pth --dataset cifar10
# Export to ONNX
python scripts/export_onnx.py --model_path checkpoints/best_model.pth --output model.onnx# scripts/train_example.py
import torch
import torchvision
from torch import nn, optim
# Load data
train_loader = torch.utils.data.DataLoader(
torchvision.datasets.CIFAR10(root='./data', train=True, download=True,
transform=transforms.ToTensor()),
batch_size=128, shuffle=True
)
# Create model
model = torchvision.models.resnet50(pretrained=False, num_classes=10)
criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.parameters(), lr=1e-3)
# Training loop
for epoch in range(100):
for images, labels in train_loader:
outputs = model(images)
loss = criterion(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()We welcome contributions! See CONTRIBUTING.md for guidelines.
Ways to Contribute:
- π Report bugs
- π‘ Suggest new techniques
- π Improve documentation
- π§ Add implementations
- π Share benchmarks
This project is licensed under the MIT License - see LICENSE for details.
- Research community for pioneering work
- Open-source contributors
- PyTorch and TensorFlow teams
- Author: Deyaa Khateeb
- Email: Deyaanaser88@gmail.com
- LinkedIn: (https://www.linkedin.com/in/deyaa-al-khatib-090b84211/)
βοΈ If you find this repository helpful, please star it!
Keywords: CNN, Convolutional Neural Networks, Deep Learning, Computer Vision, ResNet, EfficientNet, Object Detection, Image Classification, PyTorch, TensorFlow, Transfer Learning, Model Optimization, Neural Architecture