Show Lab

All

89 repositories

Awesome-Video-Diffusion
Public
A curated list of recent diffusion models for video generation, editing, and various other applications.
awesome video-editing video-generation diffusion-models motion-customization video-generation-evaluation
244•4.2k•1•0•Updated Mar 17, 2025Mar 17, 2025
Awesome-Robotics-Diffusion
Public
(In progress) A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.
3•81•0•0•Updated Mar 17, 2025Mar 17, 2025
MovieBench
Public
[CVPR 2025] A Hierarchical Movie Level Dataset for Long Video Generation
Python
•2•47•0•0•Updated Mar 16, 2025Mar 16, 2025
PhotoDoodle
Public
Code Implementation of "PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data"
Python
•
MIT License
•20•341•7•2•Updated Mar 15, 2025Mar 15, 2025
Awesome-Unified-Multimodal-Models
Public
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
17•407•0•0•Updated Mar 14, 2025Mar 14, 2025
ROICtrl
Public
Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation
Python
•0•103•1•0•Updated Mar 14, 2025Mar 14, 2025
MovieAgent
Public
MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning
Python
•2•76•3•0•Updated Mar 13, 2025Mar 13, 2025
ShowUI
Public
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
agent vision-language-model vision-language-action computer-use gui-agent
Python
•
Apache License 2.0
•69•1.1k•3•0•Updated Mar 13, 2025Mar 13, 2025
TPDiff
Public
TPDiff: Temporal Pyramid Video Diffusion Model
2•16•1•0•Updated Mar 13, 2025Mar 13, 2025
VLog
Public
[CVPR 2025] Video Narration as Vocabulary & Video as Long Document
vocabulary whisper video-language chatgpt langchain large-language-model
Python
•28•560•7•0•Updated Mar 13, 2025Mar 13, 2025
Show-o
Public
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
multimodal diffusion-models large-language-models
Python
•
Apache License 2.0
•56•1.3k•38•1•Updated Mar 12, 2025Mar 12, 2025
GUI-Thinker
Public
Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
gui-application agents large-multimodal-models gui-agent
Python
•4•52•1•0•Updated Mar 12, 2025Mar 12, 2025
MovieSeq
Public
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
Jupyter Notebook
•1•36•0•0•Updated Mar 11, 2025Mar 11, 2025
SMS
Public
Balanced Image Stylization with Style Matching Score
0•12•0•0•Updated Mar 11, 2025Mar 11, 2025
SAM-I2V
Public
Apache License 2.0
•0•0•0•0•Updated Mar 10, 2025Mar 10, 2025
computer_use_ootb
Public
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
Python
•
Apache License 2.0
•139•1.4k•26•6•Updated Mar 10, 2025Mar 10, 2025
Awesome-GUI-Agent
Public
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
awesome graphical-user-interface ai-assistant llm-agent gui-agents
32•562•1•0•Updated Mar 10, 2025Mar 10, 2025
VideoGUI
Public
[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
gui video-language llm-agent
JavaScript
•2•32•0•0•Updated Mar 8, 2025Mar 8, 2025
DoraCycle
Public
[CVPR 2025] DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
1•19•2•0•Updated Mar 6, 2025Mar 6, 2025
LOVA3
Public
(NeurIPS 2024) Official PyTorch implementation of LOVA3
benchmark visual-question-answering visual-question-generation multimodal-large-language-models large-multimodal-models
Python
•2•79•0•0•Updated Mar 3, 2025Mar 3, 2025
Impossible-Videos
Public
JavaScript
•0•1•1•0•Updated Feb 25, 2025Feb 25, 2025
MakeAnything
Public
Official code of "MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation"
Python
•
MIT License
•8•162•0•0•Updated Feb 24, 2025Feb 24, 2025
InterFeedback
Public
0•0•0•0•Updated Feb 24, 2025Feb 24, 2025
DiffSim
Public
Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
Python
•1•11•0•0•Updated Feb 21, 2025Feb 21, 2025
whisperV
Public
video speech-recognition face-detection speech-to-text whisper asr
Jupyter Notebook
•0•2•0•0•Updated Feb 18, 2025Feb 18, 2025
UniMoD
Public
The code repository of UniMoD
0•8•1•0•Updated Feb 10, 2025Feb 10, 2025
LayerTracer
Public
Official code of "LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer"
Python
•
MIT License
•2•35•2•0•Updated Feb 8, 2025Feb 8, 2025
FQGAN
Public
FQGAN: Factorized Visual Tokenization and Generation
Python
•
Other
•1•44•0•0•Updated Jan 5, 2025Jan 5, 2025
Tune-An-Ellipse
Public
[CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want
Python
•1•10•2•0•Updated Jan 5, 2025Jan 5, 2025
VideoLISA
Public
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Python
•
Apache License 2.0
•3•108•8•0•Updated Dec 26, 2024Dec 26, 2024