|
| 1 | +<!DOCTYPE html> |
| 2 | +<html lang="en"> |
| 3 | +<head> |
| 4 | + <meta charset="UTF-8"> |
| 5 | + <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| 6 | + <title>Semanticist: PCA-Guided Visual Tokenization</title> |
| 7 | + <style> |
| 8 | + body { |
| 9 | + font-family: Arial, sans-serif; |
| 10 | + margin: 40px; |
| 11 | + line-height: 1.6; |
| 12 | + max-width: 800px; |
| 13 | + margin: auto; |
| 14 | + } |
| 15 | + h1, h2, h3 { |
| 16 | + color: #333; |
| 17 | + } |
| 18 | + a { |
| 19 | + color: #007bff; |
| 20 | + text-decoration: none; |
| 21 | + } |
| 22 | + a:hover { |
| 23 | + text-decoration: underline; |
| 24 | + } |
| 25 | + .section { |
| 26 | + margin-bottom: 40px; |
| 27 | + } |
| 28 | + </style> |
| 29 | +</head> |
| 30 | +<body> |
| 31 | + <h1>Semanticist: PCA-Guided Visual Tokenization with Structured Latents</h1> |
| 32 | + <h3>A New Paradigm for Compact and Interpretable Image Representations</h3> |
| 33 | + <p> |
| 34 | + <a href="#">[Read the Paper]</a> | |
| 35 | + <a href="#">[GitHub]</a> | |
| 36 | + <a href="#">[Colab Demo]</a> |
| 37 | + </p> |
| 38 | + |
| 39 | + <div class="section"> |
| 40 | + <h2>Introduction & Motivation</h2> |
| 41 | + <p>Deep generative models have revolutionized image synthesis, but how we tokenize visual data remains an open question. While classical methods like <b>Principal Component Analysis (PCA)</b> introduced compact, structured representations, modern <b>visual tokenizers</b>—from <b>VQ-VAE</b> to <b>latent diffusion models</b>—often prioritize <b>reconstruction fidelity</b> at the cost of interpretability and efficiency.</p> |
| 42 | + <h3>The Problem</h3> |
| 43 | + <ul> |
| 44 | + <li><b>Lack of Structure:</b> Tokens are arbitrarily learned, without an ordering that prioritizes important visual features first.</li> |
| 45 | + <li><b>Semantic-Spectrum Coupling:</b> Tokens entangle <i>high-level semantics</i> with <i>low-level spectral details</i>, leading to inefficiencies in downstream applications.</li> |
| 46 | + </ul> |
| 47 | + <p>Can we design a <b>compact, structured tokenizer</b> that retains the benefits of PCA while leveraging modern generative techniques?</p> |
| 48 | + </div> |
| 49 | + |
| 50 | + <div class="section"> |
| 51 | + <h2>Key Contributions (What’s New?)</h2> |
| 52 | + <ul> |
| 53 | + <li><b>📌 PCA-Guided Tokenization:</b> Introduces a <i>causal ordering</i> where earlier tokens capture the most important visual details, reducing redundancy.</li> |
| 54 | + <li><b>⚡ Semantic-Spectrum Decoupling:</b> Resolves the issue of semantic-spectrum coupling to ensure tokens focus on high-level semantic information.</li> |
| 55 | + <li><b>🌀 Diffusion-Based Decoding:</b> Uses a <i>spectral autoregressive diffusion decoder</i> to naturally separate semantic and spectral content.</li> |
| 56 | + <li><b>🚀 Compact & Interpretability-Friendly:</b> Enables <i>flexible token selection</i>, where fewer tokens can still yield high-quality reconstructions.</li> |
| 57 | + </ul> |
| 58 | + </div> |
| 59 | + |
| 60 | + <div class="section"> |
| 61 | + <h2>Visualizing the Problem: Semantic-Spectrum Coupling</h2> |
| 62 | + <p>Existing methods fail to separate <b>semantics from spectral details</b>, leading to inefficiencies in token usage.</p> |
| 63 | + <ul> |
| 64 | + <li><b>🚨 Current Tokenizers:</b> More tokens simultaneously increase both <i>semantic content</i> and <i>low-level spectral details</i>, making compression inefficient.</li> |
| 65 | + <li><b>✅ Our Approach:</b> Tokens capture <i>semantics first</i>, ensuring a <i>coarse-to-fine</i> hierarchical structure.</li> |
| 66 | + </ul> |
| 67 | + <p><b>📊 Power Spectrum Analysis (Visual)</b><br>➡️ <i>[Insert a figure similar to your spectral analysis plot]</i></p> |
| 68 | + <p><b>🖼 Comparison of Reconstructions</b><br>➡️ <i>[Insert a figure comparing VQ-VAE, TiTok, and Semanticist reconstructions at different token levels]</i></p> |
| 69 | + </div> |
| 70 | +</body> |
| 71 | +</html> |
0 commit comments