This project leverages the BEiT (Bidirectional Encoder representation from Image Transformers) model for classifying malware images. By utilizing advanced deep learning techniques, the framework processes and identifies patterns in malware images for accurate classification.
- BEiT Integration: Uses a pre-trained BEiT model for robust image feature extraction and classification.
- Efficient Training Pipeline: Employs PyTorch Lightning for streamlined training.
- Custom Dataset Handling: Supports custom datasets with preprocessing and augmentation.
- Visualization: Generates detailed visualizations for model performance.
- Programming Language: Python
- Frameworks and Libraries:
- PyTorch: Deep learning framework.
- PyTorch Lightning: Training abstraction for PyTorch.
- Transformers: BEiT model implementation.
- Pandas: Data manipulation and preprocessing.
- NumPy: Numerical operations.
- Matplotlib and Seaborn: Visualization libraries.
- The dataset should include labeled malware images.
- Preprocessing steps include resizing, normalizing, and augmenting the images.
- The framework splits the dataset into training, validation, and testing sets using stratified sampling.
- The BEiT model is fine-tuned on the preprocessed dataset.
- Training includes gradient computation, backpropagation, and optimizer updates.
- The framework logs training metrics, including loss and accuracy.
- The trained model is evaluated on the test set.
- A custom function generates classification reports, including precision, recall, and F1-score.