Skip to content

Commit 69cca7e

Browse files
committed
Revert "rebrand"
This reverts commit dcc47a3.
1 parent 37d10ec commit 69cca7e

File tree

1 file changed

+33
-18
lines changed

1 file changed

+33
-18
lines changed

index.html

+33-18
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,7 @@
99
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
1010
<meta charset="UTF-8">
1111
<meta name="viewport" content="width=device-width, initial-scale=1.0">
12-
<title>
13-
NVIDIA Cosmos Nemotron: Efficient Vision Language Models
14-
</title>
12+
<title>NVILA: Efficient Frontiers of Visual Language Models</title>
1513
<style>
1614
:root {
1715
color-scheme: light;
@@ -638,8 +636,8 @@
638636
<!-- </div>-->
639637
<div class="hero">
640638
<h2>
641-
<!-- <img src="asset/NVILA.png" alt="Logo" style="width: 160px; height: auto; margin-right: 3px;">: Efficient Frontier Visual Language Models -->
642-
NVIDIA Cosmos Nemotron: Efficient Vision Language Models
639+
<img src="asset/NVILA.png" alt="Logo" style="width: 160px; height: auto; margin-right: 3px;">: Efficient
640+
Frontier Visual Language Models
643641
</h2>
644642
<p>Train Cheaper, Run Faster, Perform Better!</p>
645643

@@ -649,16 +647,16 @@ <h2>
649647
<a href="https://zhijianliu.com" target="_blank" style="color: #76b900;">Zhijian Liu</a><sup>1,†</sup>,
650648
<a href="https://lzhu.me" target="_blank" style="color: #76b900;">Ligeng Zhu</a><sup>1,†</sup>,
651649
<a href="#" target="_blank" style="color: #76b900;">Baifeng Shi</a><sup>1,3</sup>,
652-
<a href="#" target="_blank" style="color: #76b900;">Zhuoyang Zhang</a><sup>2</sup>,
653-
<a href="#" target="_blank" style="color: #76b900;">Yuming Lou</a><sup>6</sup>,
654-
<a href="#" target="_blank" style="color: #76b900;">Shang Yang</a><sup>2</sup>,
650+
<a href="#" target="_blank" style="color: #76b900;">Zhuoyang Zhang</a><sup>1,2</sup>,
651+
<a href="#" target="_blank" style="color: #76b900;">Yuming Lou</a><sup>1,6</sup>,
652+
<a href="#" target="_blank" style="color: #76b900;">Shang Yang</a><sup>1,2</sup>,
655653
<a href="https://xijiu9.github.io" target="_blank" style="color: #76b900;">Haocheng
656-
Xi</a><sup>3</sup>,
657-
<a href="#" target="_blank" style="color: #76b900;">Shiyi Cao</a><sup>3</sup>,
654+
Xi</a><sup>1,3</sup>,
655+
<a href="#" target="_blank" style="color: #76b900;">Shiyi Cao</a><sup>1,3</sup>,
658656
<a href="#" target="_blank" style="color: #76b900;">Yuxian Gu</a><sup>2,6</sup>,
659-
<a href="#" target="_blank" style="color: #76b900;">Dacheng Li</a><sup>3</sup>,
660-
<a href="#" target="_blank" style="color: #76b900;">Xiuyu Li</a><sup>3</sup>,
661-
<a href="#" target="_blank" style="color: #76b900;">Yunhao Fang</a><sup>4</sup>,
657+
<a href="#" target="_blank" style="color: #76b900;">Dacheng Li</a><sup>1,3</sup>,
658+
<a href="#" target="_blank" style="color: #76b900;">Xiuyu Li</a><sup>1,3</sup>,
659+
<a href="#" target="_blank" style="color: #76b900;">Yunhao Fang</a><sup>1,4</sup>,
662660
<a href="#" target="_blank" style="color: #76b900;">Yukang Chen</a><sup>1</sup>,
663661
<a href="#" target="_blank" style="color: #76b900;">Cheng-Yu Hsieh</a><sup>5</sup>,
664662
<a href="#" target="_blank" style="color: #76b900;">De-An Huang</a><sup>1</sup>,
@@ -881,10 +879,20 @@ <h2>
881879

882880
<section class="description">
883881
<div class="description-content">
884-
<h1>About Cosmos Nemotron</h1>
882+
<h1>About NVILA</h1>
885883
<p style="margin-bottom: 20px;">
886-
Visual language models (VLMs) have made significant advances in accuracy in recent years. However, their efficiency has received much less attention. This paper introduces <strong>Cosmos Nemotron</strong>, a family of open VLMs designed to optimize both efficiency and accuracy. Building on top of research from <strong>NVIDIA including NVILA and VILA</strong>, we improve its model architecture by first scaling up the spatial and temporal resolutions, and then compressing visual tokens. This <strong>"scale-then-compress" approach</strong> enables these VLMs to efficiently process high-resolution images and long videos. We also conduct a systematic investigation to enhance the efficiency of VLMs throughout its entire lifecycle, from training and fine-tuning to deployment.
887-
In this paper, we’ll look at the latest NVILA research that serves as a foundation for Cosmos Nemotron and show how it matches or surpasses the accuracy of many leading open and proprietary VLMs across a wide range of image and video benchmarks. At the same time, it reduces training costs by 4.5×, fine-tuning memory usage by 3.4×, pre-filling latency by 1.6-2.2×, and decoding latency by 1.2-2.8×. We make our code and models available to facilitate reproducibility.
884+
Visual language models (VLMs) have made significant advances in accuracy in recent years. However, their
885+
efficiency has received much less attention. This paper introduces <strong>NVILA</strong>, a family of
886+
open VLMs designed to optimize both efficiency and accuracy. Building on top of VILA, we improve its
887+
model architecture by first <strong>scaling up</strong> the spatial and temporal resolutions, and then
888+
<strong>compressing</strong> visual tokens. This "scale-then-compress" approach enables NVILA to
889+
efficiently process high-resolution images and long videos. We also conduct a systematic investigation
890+
to enhance the efficiency of NVILA throughout its entire lifecycle, from training and fine-tuning to
891+
deployment. NVILA matches or surpasses the accuracy of many leading open and proprietary VLMs across a
892+
wide range of image and video benchmarks. At the same time, it reduces training costs by
893+
<strong>4.5×</strong>, fine-tuning memory usage by <strong>3.4×</strong>, pre-filling latency by
894+
<strong>1.6-2.2×</strong>, and decoding latency by <strong>1.2-2.8×</strong>. We make our code and
895+
models available to facilitate reproducibility.
888896
</p>
889897
</div>
890898
</section>
@@ -955,9 +963,16 @@ <h1>About Cosmos Nemotron</h1>
955963
<!-- -->
956964

957965
<div class="description-content">
958-
<h2>Cosmos Nemotron core design concept</h2>
966+
<h2>NVILA's core design concept</h2>
959967
<p>
960-
In this paper, we introduce Cosmos Nemotron, a family of open VLMs designed to optimize both efficiency and accuracy. Building on NVILA and VILA, we improve its model architecture by first scaling up the spatial and temporal resolution, followed by compressing visual tokens. "Scaling" preserves more details from visual inputs, raising the accuracy upper bound, while "compression" squeezes visual information to fewer tokens, improving computational efficiency. This "scale-then-compress" strategy allows VLMs to process high-resolution images and long videos both effectively and efficiently. In addition, we conduct a systematic study to optimize the efficiency of VLMs throughout its entire lifecycle, including training, fine-tuning, and deployment.
968+
In this paper, we introduce <strong>NVILA</strong>, a family of open VLMs designed to optimize both
969+
efficiency and accuracy. Building on VILA, we improve its model architecture by first scaling up the
970+
spatial and temporal resolution, followed by compressing visual tokens. "Scaling" preserves more details
971+
from visual inputs, raising the accuracy upper bound, while "compression" squeezes visual information to
972+
fewer tokens, improving computational efficiency. This "<em>scale-then-compress</em>" strategy allows
973+
NVILA to process high-resolution images and long videos both effectively and efficiently. In addition,
974+
we conduct a systematic study to optimize the efficiency of NVILA throughout its entire lifecycle,
975+
including training, fine-tuning, and deployment.
961976
</p>
962977
</div>
963978

0 commit comments

Comments
 (0)