Tokenbridge: Bridging The Gap Between Continuous And Discrete Token Representations In Visual Generation

Trending 4 weeks ago
ARTICLE AD BOX

Autoregressive ocular procreation models person emerged arsenic a groundbreaking attack to image synthesis, drafting inspiration from connection exemplary token prediction mechanisms. These innovative models utilize image tokenizers to toggle shape ocular contented into discrete aliases continuous tokens. The attack facilitates elastic multimodal integrations and allows adjustment of architectural innovations from LLM research. However, nan section has a captious situation of determining nan optimal token practice strategy. The prime betwixt discrete and continuous token representations remains a basal dilemma, importantly impacting exemplary complexity and procreation quality.

Existing methods see ocular tokenization that explores 2 superior approaches: continuous and discrete token representations. Variational autoencoders found continuous latent spaces that support precocious ocular fidelity, becoming foundational successful diffusion exemplary development. Discrete methods for illustration VQ-VAE and VQGAN alteration straightforward autoregressive modeling but brushwood important limitations, including codebook illness and accusation loss. Autoregressive image procreation evolves from computationally intensive pixel-based methods to much businesslike token-based strategies. While models for illustration DALL-E show promising results, hybrid methods specified arsenic GIVT and MAR present analyzable architectural modifications to amended procreation quality, making nan accepted autoregressive modeling pipeline complex.

Researchers from nan University of Hong Kong, ByteDance Seed, Ecole Polytechnique, and Peking University person projected TokenBridge to span nan captious spread betwixt continuous and discrete token representations successful ocular generation. It utilizes nan beardown practice capacity of continuous tokens while maintaining nan modeling simplicity of discrete tokens. TokenBridge decouples nan discretization process from first tokenizer training by introducing a caller post-training quantization technique. Moreover, it implements a unsocial dimension-wise quantization strategy that independently discretizes each characteristic dimension, complemented by a lightweight autoregressive prediction mechanism. It efficiently manages nan expanded token abstraction while preserving high-quality ocular procreation capabilities.

TokenBridge introduces a training-free dimension-wise quantization method that operates independently connected each characteristic channel, efficaciously addressing erstwhile token practice limitations. The attack capitalizes connected 2 important properties of Variational Autoencoder features: their bounded quality owed to KL constraints and near-Gaussian distribution. The autoregressive exemplary adopts a Transformer architecture pinch 2 superior configurations: a default L exemplary comprising 32 blocks pinch 1024 width (approx 400 cardinal parameters) for first studies and a larger H exemplary pinch 40 blocks and 1280 width (around 910 cardinal parameters) for last evaluations. This creation allows a elaborate exploration of nan projected quantization strategy crossed different exemplary scales.

The results show that TokenBridge outperforms accepted discrete token models, achieving superior Frechet Inception Distance (FID) scores pinch importantly less parameters. For instance, TokenBridge-L secures an FID of 1.76 pinch only 486 cardinal parameters, compared to LlamaGen’s 2.18 utilizing 3.1 cardinal parameters. When benchmarked against continuous approaches, TokenBridge-L outperforms GIVT, achieving a FID of 1.76 versus 3.35. The H-model configuration further validates nan method’s effectiveness, matching MAR-H successful FID (1.55) while delivering superior Inception Score and Recall metrics pinch marginally less parameters. These results show TokenBridge’s capacity to span discrete and continuous token representations.

In conclusion, researchers introduced TokenBridge, which bridges nan longstanding spread betwixt discrete and continuous token representations. It achieves high-quality ocular procreation pinch singular ratio by introducing a post-training quantization attack and dimension-wise autoregressive decomposition. The investigation demonstrates that discrete token approaches utilizing modular cross-entropy nonaccomplishment tin compete pinch state-of-the-art continuous methods, eliminating nan request for analyzable distribution modeling techniques. The attack provides a promising pathway for early investigations, perchance transforming really researchers conceptualize and instrumentality token-based ocular synthesis technologies.


Check out the Paper, GitHub Page and Project. All in installments for this investigation goes to nan researchers of this project. Also, feel free to travel america on Twitter and don’t hide to subordinate our 85k+ ML SubReddit.

Sajjad Ansari is simply a last twelvemonth undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into nan applicable applications of AI pinch a attraction connected knowing nan effect of AI technologies and their real-world implications. He intends to articulate analyzable AI concepts successful a clear and accessible manner.

More