Skip to content

The go to repo for CNN best practices and techniques

Notifications You must be signed in to change notification settings

deyaa1251/CNN-techniques

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 

Repository files navigation

🧠 CNN Research Techniques

License Python PyTorch TensorFlow Contributions Welcome

🌟 Support This Project: Your sponsorship fuels innovation in CNN research and computer vision technologies. Become a sponsor to help maintain and expand this valuable resource!


πŸŽ“ Welcome

Welcome to one of the most comprehensive collections of Convolutional Neural Network (CNN) techniques, architectures, and implementations for computer vision research. This repository serves as a hub for cutting-edge CNN architectures and optimization methods, designed to help researchers and practitioners achieve state-of-the-art results.

πŸš€ Cutting-edge Updates | πŸ’‘ Expert Insights | 🎯 Production-Ready Code | πŸ“Š Benchmark Results


🀝 Community & Related Resources

Related Repositories:


πŸ“š What You'll Find Here

  • 40+ CNN Techniques organized by category
  • Complete Implementations with runnable code
  • Benchmark Comparisons across architectures
  • Training Strategies and optimization tips
  • Production Deployment guides
  • Evaluation Metrics and testing frameworks
  • Dataset Guides for popular benchmarks
  • Regular Updates with latest research

πŸ“– Technique Categories

1. πŸ—οΈ Foundational Architectures (1998-2015)

Classic CNNs that established core principles

2. πŸš€ Modern Architectures (2015-2023)

State-of-the-art networks with advanced features

3. πŸ‘οΈ Attention Mechanisms

Channel and spatial attention for better features

4. πŸŽ“ Training Techniques

Data augmentation, transfer learning, schedulers

5. ⚑ Optimization Methods

Advanced optimizers, mixed precision, batch norm

6. 🎯 Specialized Architectures

Detection, segmentation, lightweight models

7. πŸ”§ Model Compression

Pruning, quantization, knowledge distillation

8. πŸ“Š Evaluation & Metrics

Performance measurement and benchmarking


πŸ—οΈ FOUNDATIONAL ARCHITECTURES

1. LeNet-5 (1998) 🌱

Pioneer of CNNs for digit recognition

Implementation: scripts/lenet5.py

# Quick Start
python scripts/lenet5.py --dataset mnist --epochs 10

When to Use:

  • Learning CNN basics
  • Teaching/educational purposes
  • Simple digit/character recognition
  • Proof of concept prototypes

2. AlexNet (2012) πŸ†

ImageNet champion that sparked the deep learning revolution

  • Paper: ImageNet Classification with Deep CNNs
  • Key Innovations: ReLU, Dropout, Data Augmentation, GPU training
  • Architecture: 5 conv layers, 3 FC layers
  • Parameters: 60M
  • ImageNet Top-5 Error: 15.3%

Implementation: scripts/alexnet.py

Modern Improvements:

  • Add Batch Normalization
  • Replace LRN with BatchNorm
  • Use AdamW optimizer
  • Apply modern data augmentation

3. VGGNet (2014) πŸ“

Simple and deep architecture with 3x3 convolutions

  • Paper: Very Deep Convolutional Networks
  • Key Innovation: Depth through repeated 3x3 conv blocks
  • Variants: VGG-16 (138M params), VGG-19 (144M params)
  • ImageNet Top-5 Error: 7.3%

Implementation: scripts/vggnet.py

Trade-offs:

  • βœ… Excellent for transfer learning
  • βœ… Simple, uniform architecture
  • ❌ High memory requirements (528MB for VGG-16)
  • ❌ Slow inference

4. GoogLeNet / Inception-v1 (2014) 🌐

Efficient multi-scale feature extraction

  • Paper: Going Deeper with Convolutions
  • Key Innovation: Inception modules with parallel convolutions
  • Parameters: 7M (much less than VGG!)
  • ImageNet Top-5 Error: 6.7%

Implementation: scripts/inception_v1.py

Inception Evolution:

  • Inception-v2: Batch Normalization
  • Inception-v3: Factorized convolutions
  • Inception-v4: Residual connections
  • Inception-ResNet: Hybrid architecture

πŸš€ MODERN ARCHITECTURES

5. ResNet (2015) πŸ”„

Skip connections enable very deep networks

  • Paper: Deep Residual Learning
  • Key Innovation: Residual blocks solve vanishing gradients
  • Variants: ResNet-18, 34, 50, 101, 152
  • ImageNet Top-5 Error: 3.57% (ResNet-152)

Implementation: scripts/resnet.py

# Training ResNet-50
python scripts/resnet.py --arch resnet50 --dataset imagenet --epochs 90

ResNet Variants:

Model Layers Params Top-1 Acc GFLOPs
ResNet-18 18 11.7M 69.8% 1.8
ResNet-34 34 21.8M 73.3% 3.7
ResNet-50 50 25.6M 76.1% 4.1
ResNet-101 101 44.5M 77.4% 7.8
ResNet-152 152 60.2M 78.3% 11.6

When to Use: Default choice for most tasks requiring depth


6. DenseNet (2017) πŸ•ΈοΈ

Dense connections for maximum feature reuse

  • Paper: Densely Connected CNNs
  • Key Innovation: Each layer connects to all subsequent layers
  • Variants: DenseNet-121, 169, 201, 264
  • Parameters: 8M (DenseNet-121) - very efficient!

Implementation: scripts/densenet.py

Benefits:

  • Stronger gradient flow
  • Feature reuse reduces parameters
  • Implicit deep supervision

7. MobileNet (2017-2019) πŸ“±

Lightweight CNNs for mobile devices

  • Paper: MobileNets: Efficient CNNs
  • Key Innovation: Depthwise separable convolutions
  • Parameters: 4.2M (v1), 3.4M (v2), 5.4M (v3)
  • Speed: 600 imgs/sec on modern GPUs

Implementation: scripts/mobilenet.py

MobileNet Family:

Version Key Features Top-1 Acc Latency (ms)
v1 Depthwise conv 70.6% 113
v2 Inverted residuals 72.0% 75
v3 NAS + SE blocks 75.2% 51

8. EfficientNet (2019) ⚑

Systematically scaled for optimal efficiency

  • Paper: EfficientNet: Rethinking Model Scaling
  • Key Innovation: Compound scaling (depth Γ— width Γ— resolution)
  • Parameters: 5.3M (B0) to 66M (B7)
  • ImageNet Top-1: 77.1% (B0) to 84.4% (B7)

Implementation: scripts/efficientnet.py

Scaling Coefficients:

# EfficientNet scaling formula
depth = Ξ±^Ο†
width = Ξ²^Ο†
resolution = Ξ³^Ο†
where Ξ±Β·Ξ²Β²Β·Ξ³Β² β‰ˆ 2

9. EfficientNetV2 (2021) ⚑⚑

Faster training with Fused-MBConv

  • Paper: EfficientNetV2
  • Improvements: Progressive learning, NAS optimization
  • Speed: 5-11x faster training than EfficientNet
  • Top-1 Accuracy: 85.7% (EfficientNetV2-L)

10. ConvNeXt (2022) πŸ†•

Modernized CNN competing with Vision Transformers

  • Paper: A ConvNet for the 2020s
  • Key Innovations: Larger kernels (7x7), GELU, LayerNorm
  • Top-1 Accuracy: 87.8% (ConvNeXt-XL)
  • Performance: Matches Swin Transformers

Implementation: scripts/convnext.py


πŸ‘οΈ ATTENTION MECHANISMS

11. SENet (Squeeze-and-Excitation) (2018) 🎯

Channel attention for feature recalibration

Implementation: scripts/senet.py

SE Block Code:

class SEBlock(nn.Module):
    def __init__(self, channels, reduction=16):
        super().__init__()
        self.squeeze = nn.AdaptiveAvgPool2d(1)
        self.excitation = nn.Sequential(
            nn.Linear(channels, channels // reduction),
            nn.ReLU(),
            nn.Linear(channels // reduction, channels),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.squeeze(x).view(b, c)
        y = self.excitation(y).view(b, c, 1, 1)
        return x * y.expand_as(x)

12. CBAM (2018) πŸ‘οΈ

Channel + Spatial Attention

Implementation: scripts/cbam.py


πŸŽ“ TRAINING TECHNIQUES

13. Data Augmentation πŸ”„

Techniques:

  • Basic: Flip, Rotate, Crop, Color Jitter
  • Advanced: Mixup, CutMix, CutOut, AutoAugment
  • Domain-Specific: Medical imaging, satellite imagery

Implementation: scripts/data_augmentation.py

Augmentation Comparison:

Technique Description Accuracy Gain Use Case
Random Crop Random patches +1-2% General
Mixup Blend two images +1-3% Small datasets
CutMix Paste patches +1-2% ImageNet scale
AutoAugment Learned policies +2-4% Large datasets
RandAugment Simple random +1-3% Any dataset

14. Transfer Learning πŸŽ“

Strategies:

  1. Feature Extraction: Freeze backbone
  2. Fine-tuning: Unfreeze gradually
  3. Domain Adaptation: Adjust for domain shift

Implementation: scripts/transfer_learning.py

Fine-tuning Guidelines:

# Strategy 1: Feature Extraction
for param in model.parameters():
    param.requires_grad = False
model.fc = nn.Linear(2048, num_classes)  # Only train classifier

# Strategy 2: Fine-tune last N layers
for param in model.layer4.parameters():
    param.requires_grad = True

# Strategy 3: Differential learning rates
optimizer = torch.optim.SGD([
    {'params': model.layer4.parameters(), 'lr': 1e-3},
    {'params': model.fc.parameters(), 'lr': 1e-2}
])

15. Learning Rate Scheduling πŸ“Š

Schedulers:

  • Step Decay: Reduce by factor every N epochs
  • Cosine Annealing: Smooth decrease to zero
  • OneCycle: Warm up then anneal
  • ReduceLROnPlateau: Adaptive based on metrics

Implementation: scripts/lr_scheduling.py


16. Advanced Optimizers πŸš€

Optimizer Best For Learning Rate Notes
SGD + Momentum Final training 0.1 β†’ 0.001 Most stable
Adam Quick experiments 1e-3 β†’ 1e-4 Fast convergence
AdamW Most tasks 1e-3 β†’ 1e-5 Weight decay fixed
RAdam Warmup-free 1e-3 Rectified Adam
LAMB Large batch Scaled Distributed training
AdaBelief Better generalization 1e-3 Recent research

Implementation: scripts/optimizers.py


🎯 SPECIALIZED ARCHITECTURES

17. Object Detection 🎯

Architectures:

Model Type Speed Accuracy Best For
YOLO v8 One-stage ⚑⚑⚑ 53.9 mAP Real-time
Faster R-CNN Two-stage ⚑ 42.0 mAP Accuracy
RetinaNet One-stage ⚑⚑ 40.8 mAP Small objects
EfficientDet One-stage ⚑⚑ 55.1 mAP Efficiency
DETR Transformer ⚑ 42.0 mAP Novel approach

Implementation: scripts/object_detection/


18. Semantic Segmentation πŸ—ΊοΈ

Architectures:

Model Year Best Use Case Params Speed
U-Net 2015 Medical imaging 7.8M Fast
FCN 2015 Scene segmentation 134M Medium
DeepLab v3+ 2018 High accuracy 40M Slow
PSPNet 2017 Context-aware 250M Slow
SegFormer 2021 SOTA efficiency 3-84M Fast

Implementation: scripts/segmentation/


19. Instance Segmentation πŸ‘₯

Key Models:

  • Mask R-CNN: Extends Faster R-CNN with masks
  • YOLACT: Real-time instance segmentation
  • SOLOv2: Segments by object locations

Implementation: scripts/instance_segmentation/


πŸ”§ MODEL COMPRESSION

20. Knowledge Distillation πŸŽ“

Process:

  1. Train large teacher model
  2. Train small student with soft targets
  3. Combine hard and soft loss

Implementation: scripts/distillation.py

Results:

  • 50-70% size reduction
  • 1-3% accuracy drop
  • 2-5x speed improvement

21. Pruning βœ‚οΈ

Types:

  • Unstructured: Remove individual weights
  • Structured: Remove entire filters/channels
  • Iterative: Prune β†’ Retrain β†’ Repeat

Implementation: scripts/pruning.py

Typical Results:

  • 50-90% sparsity achievable
  • 2-10x speedup possible
  • Minimal accuracy loss (<2%)

22. Quantization πŸ“‰

Methods:

Method Precision Accuracy Loss Speed Size Reduction
FP32 Full Baseline 1x Baseline
FP16 Half <0.1% 2x 50%
INT8 8-bit 0.5-1% 4x 75%
INT4 4-bit 1-3% 8x 87.5%

Implementation: scripts/quantization.py


πŸ“Š EVALUATION & BENCHMARKS

23. Evaluation Metrics

Classification Metrics:

Metric Formula Use Case
Accuracy (TP+TN)/(TP+TN+FP+FN) Balanced datasets
Precision TP/(TP+FP) Minimize false positives
Recall TP/(TP+FN) Minimize false negatives
F1-Score 2Γ—(PΓ—R)/(P+R) Imbalanced data
AUC-ROC Area under ROC curve Binary classification
Top-5 Accuracy Correct in top 5 predictions ImageNet

Implementation: scripts/evaluation_metrics.py


24. Benchmark Datasets

Dataset Images Classes Resolution Size Use Case
MNIST 70K 10 28Γ—28 11MB Digits
CIFAR-10 60K 10 32Γ—32 163MB Objects
CIFAR-100 60K 100 32Γ—32 169MB Fine-grained
ImageNet 1.2M 1000 224Γ—224 150GB General
COCO 330K 80 Variable 25GB Detection
Pascal VOC 11K 20 Variable 2GB Segmentation
CelebA 200K 40 attrs 178Γ—218 1.4GB Faces

Dataset Links: docs/datasets.md


πŸ“ˆ PERFORMANCE BENCHMARKS

ImageNet Classification (2025)

Model Params FLOPs Top-1 Top-5 Speed (img/s)
EfficientNetV2-L 119M 53B 85.7% 97.5% 180
ConvNeXt-XL 350M 60B 87.8% 98.5% 120
ResNet-152 60M 11.6B 78.3% 94.2% 220
EfficientNet-B7 66M 37B 84.4% 97.1% 150
MobileNetV3 5.4M 0.22B 75.2% 92.2% 600
ViT-H/14 632M 167B 88.6% 98.7% 50

CIFAR-10 Performance

Model Params Accuracy Training Time
ResNet-50 25.6M 95.3% 2h (V100)
EfficientNet-B0 5.3M 96.7% 1.5h
MobileNetV3 5.4M 94.8% 1h
Vision Transformer 86M 98.1% 4h

πŸ› οΈ FRAMEWORKS & TOOLS

Deep Learning Frameworks

Framework Pros Cons Best For
PyTorch Pythonic, dynamic graphs, research-friendly Less mobile support Research, prototyping
TensorFlow Production-ready, mobile support, ecosystem Steeper learning curve Production deployment
Keras Simple API, beginner-friendly Less flexibility Quick experiments
JAX Fast, functional, automatic differentiation Smaller ecosystem Research, speed
ONNX Cross-platform model format Not for training Model deployment

Training Tools

Tool Purpose Key Features
Weights & Biases Experiment tracking Real-time logging, hyperparameter tuning
TensorBoard Visualization Graphs, histograms, embeddings
MLflow ML lifecycle Model registry, reproducibility
Neptune.ai Experiment management Team collaboration, model versioning
Comet.ml Model monitoring Production monitoring, debugging

Data Tools

Tool Purpose Best For
Albumentations Data augmentation Fast, comprehensive
imgaug Image augmentation Research prototypes
torchvision.transforms PyTorch augmentation PyTorch projects
tf.image TensorFlow augmentation TensorFlow projects
DALI Data loading High-throughput training

πŸš€ GETTING STARTED

Installation

# Clone repository
git clone https://github.com/yourusername/CNN_Research_Techniques.git
cd CNN_Research_Techniques

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Quick Start Examples

# Train ResNet-50 on CIFAR-10
python scripts/resnet.py --dataset cifar10 --arch resnet50 --epochs 100

# Transfer learning from ImageNet
python scripts/transfer_learning.py --pretrained --dataset custom --data_path /path/to/data

# Evaluate model
python scripts/evaluate.py --model_path checkpoints/best_model.pth --dataset cifar10

# Export to ONNX
python scripts/export_onnx.py --model_path checkpoints/best_model.pth --output model.onnx

πŸ“ CODE EXAMPLES

Training Pipeline

# scripts/train_example.py
import torch
import torchvision
from torch import nn, optim

# Load data
train_loader = torch.utils.data.DataLoader(
    torchvision.datasets.CIFAR10(root='./data', train=True, download=True,
                                  transform=transforms.ToTensor()),
    batch_size=128, shuffle=True
)

# Create model
model = torchvision.models.resnet50(pretrained=False, num_classes=10)
criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.parameters(), lr=1e-3)

# Training loop
for epoch in range(100):
    for images, labels in train_loader:
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

🀝 CONTRIBUTING

We welcome contributions! See CONTRIBUTING.md for guidelines.

Ways to Contribute:

  • πŸ› Report bugs
  • πŸ’‘ Suggest new techniques
  • πŸ“ Improve documentation
  • πŸ”§ Add implementations
  • πŸ“Š Share benchmarks

πŸ“„ LICENSE

This project is licensed under the MIT License - see LICENSE for details.


πŸ™ ACKNOWLEDGMENTS

  • Research community for pioneering work
  • Open-source contributors
  • PyTorch and TensorFlow teams

πŸ“ž CONTACT


⭐️ If you find this repository helpful, please star it!

Keywords: CNN, Convolutional Neural Networks, Deep Learning, Computer Vision, ResNet, EfficientNet, Object Detection, Image Classification, PyTorch, TensorFlow, Transfer Learning, Model Optimization, Neural Architecture

About

The go to repo for CNN best practices and techniques

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published