Go offline with the Player FM app!
An Image is Worth More Than 1616 Patches: Exploring Transformers on Individual Pixels
Manage episode 423485771 series 3524393
Vanilla Transformers can achieve high performance in computer vision by treating individual pixels as tokens, challenging the necessity of locality bias in modern architectures.
https://arxiv.org/abs//2406.09415
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1545 episodes
Manage episode 423485771 series 3524393
Vanilla Transformers can achieve high performance in computer vision by treating individual pixels as tokens, challenging the necessity of locality bias in modern architectures.
https://arxiv.org/abs//2406.09415
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1545 episodes
All episodes
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.