A simplified implementation of the DINOv2 (Self-supervised Vision Transformers).
- Easy-to-use DINOv2 implementation
- Image similarity computation using cosine similarity
- Clone this repository:
git clone https://github.com/yourusername/mini_dino.git
cd mini_dino- Install required dependencies:
pip install torch torchvision pillow matplotlibRun the model using:
python dino.pymodel_name = "dinov2_vits14"
dino = torch.hub.load("facebookresearch/dinov2", model_name)
model = dino_small_vit()
model = load_dino_weights(model, dino.state_dict(), N=12)
img1 = Image.open("image.png")
transforms = T.Compose([
T.Resize((224, 224)),
T.ToTensor()
])
img1 = transforms(img1).unsqueeze(0)
out_dino = dino(img1)
img2 = Image.open("image1.png")
img2 = transforms(img2).unsqueeze(0)
out = model(img2)
similarity = F.cosine_similarity(img1, img2)- Based on DINOv2 ViT-Small/14 architecture
- Input image size: 224x224
- Output: Feature embeddings for similarity comparison