hi, i am prithiv! i am a graduate engineer [UG 2024] in information technology from gcee, focused on working with LLM enhancements, computer vision models, and improving multimodal AI capabilities.
(Open LLM Leaderboard)
hi, i am prithiv! i am a graduate engineer [UG 2024] in information technology from gcee, focused on working with LLM enhancements, computer vision models, and improving multimodal AI capabilities.
(Open LLM Leaderboard)
Tiny VLMs Lab is a Hugging Face Space and open-source project showcasing lightweight Vision-Language Models for image captioning, OCR, reasoning, and multimodal understanding. It offers a simple Gr…
Python 3
This repository contains a curated collection of notebooks for implementing state-of-the-art multimodal Vision-Language Models (VLMs).
Fine-Tuning SigLIP 2 for Single/Multi-Label Image Classification. Image classification vision-language encoder model fine-tuned for Image Classification Tasks
A dedicated Colab notebooks to experiment (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B & more..) On T4 GPU - free tier
Experience the power of the FLUX.1-dev diffusion model combined with a massive collection of 255+ community-created LoRAs! This Gradio application provides an easy-to-use interface to explore diver…
The Qwen2.5-VL-7B-Instruct model is a multimodal AI model developed by Alibaba Cloud that excels at understanding both text and images. It's a Vision-Language Model (VLM) designed to handle various…
Python 5