Skip to content
View PRITHIVSAKTHIUR's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@Stranger-Zone @Stranger-Guard

Block or report PRITHIVSAKTHIUR

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
PRITHIVSAKTHIUR/README.md

Hey đź‘‹ What's up?

hi, i am prithiv!

i am a graduate engineer [UG 2024] in information technology from gcee,  
focused on working with LLM enhancements, computer vision models, and improving multimodal AI capabilities. 

Multimodal Models

Camel-Doc-OCR-080125 Lumian2-VLR-7B-Thinking
HuggingFace HuggingFace
Collection Collection
Advanced Qwen2.5-VL fine-tuned specialist for document retrieval, content extraction, and analysis recognition, delivering superior document comprehension with Opendoc2-Analysis-Recognition training. High-fidelity vision-language reasoning system with explicit grounded reasoning, producing structured reasoning traces aligned with visual coordinates for explainable multimodal understanding and step-by-step analysis.
Qwen2.5-VL-7B-Abliterated-Caption-it DeepCaption-VLA-7B
HuggingFace HuggingFace
Collection Collection
Uncensored captioning specialist generating highly detailed descriptions across complex visual categories and sensitive content, optimized for comprehensive image analysis across varying aspect ratios. Precision image captioning model focused on defining visual properties, object attributes, and scene details with exceptional descriptive accuracy across diverse image spectrums and dimensional variations.
Megalodon-OCR-Sync-0713 DREX-062225-exp
HuggingFace HuggingFace
Collection Collection
Qwen2.5-VL-3B specialist trained on 200K image pairs including 70K Corvus-OCR-Caption-Mix samples, optimized for document OCR captioning, image reasoning, and visual analysis across variational dimensions. Document Retrieval and Extraction eXpert built on docscopeOCR-7B architecture, specialized for superior document analysis and information extraction with Opendoc2-Analysis-Recognition training optimization.

For more, visit: HuggingFace

Ranking Models: Text Gen

(Open LLM Leaderboard)

Galactic-Qwen-14B-Exp2 Sombrero-Opus-14B-Elite5
HuggingFace HuggingFace
Leaderboard Leaderboard
Advanced reasoning powerhouse optimized for general-purpose intelligence, excelling in contextual understanding, logical deduction, and complex multi-step problem-solving with superior performance metrics. Elite conversational AI fine-tuned using advanced chain-of-thought reasoning methodologies and specialized datasets, delivering enhanced comprehension, structured responses, and intelligent dialogue capabilities.
Viper-Coder-v1.1 Dinobot-Opus-14B-Exp
HuggingFace HuggingFace
Leaderboard Leaderboard
Specialized coding AI fine-tuned on synthetic datasets leveraging cutting-edge coding logits and CoT methodologies, optimizing chain-of-thought reasoning and advanced logical problem-solving for programming tasks. High-performance abliterated model based on Qwen 2.5 architecture, engineered for superior reasoning, detailed explanations, and conversational excellence with enhanced contextual understanding and multi-step analytical capabilities.

For more, visit: HuggingFace

Other Pages

Stranger Zone Stranger Guard
HuggingFace HuggingFace
GitHub GitHub
Building illustration adapters for diffusion models, The Stranger Zone specializes in intelligence development, focusing on fine-tuning models for computer vision ; text-to-image specialized adapters (LoRA). Stranger Guard specializes in building strict content moderation models, with a core focus on advanced computer vision tasks. Our team develops precision-driven AI systems capable of detecting, classifying, and moderating visual content at scale.

Pinned Loading

  1. Tiny-VLMs-Lab Tiny-VLMs-Lab Public

    Tiny VLMs Lab is a Hugging Face Space and open-source project showcasing lightweight Vision-Language Models for image captioning, OCR, reasoning, and multimodal understanding. It offers a simple Gr…

    Python 3

  2. Multimodal-Outpost-Notebooks Multimodal-Outpost-Notebooks Public

    This repository contains a curated collection of notebooks for implementing state-of-the-art multimodal Vision-Language Models (VLMs).

    Jupyter Notebook 14 2

  3. FineTuning-SigLIP-2 FineTuning-SigLIP-2 Public

    Fine-Tuning SigLIP 2 for Single/Multi-Label Image Classification. Image classification vision-language encoder model fine-tuned for Image Classification Tasks

    Jupyter Notebook 39 5

  4. OCR-ReportLab-Notebooks OCR-ReportLab-Notebooks Public

    A dedicated Colab notebooks to experiment (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B & more..) On T4 GPU - free tier

    Jupyter Notebook 20 3

  5. Flux-LoRA-DLC Flux-LoRA-DLC Public

    Experience the power of the FLUX.1-dev diffusion model combined with a massive collection of 255+ community-created LoRAs! This Gradio application provides an easy-to-use interface to explore diver…

    Python 12 1

  6. Qwen2.5-VL-Video-Understanding Qwen2.5-VL-Video-Understanding Public

    The Qwen2.5-VL-7B-Instruct model is a multimodal AI model developed by Alibaba Cloud that excels at understanding both text and images. It's a Vision-Language Model (VLM) designed to handle various…

    Python 5