We're trying something different this week: a full post-show breakdown of every episode in the latest season of Black Mirror! Ari Romero is joined by Tudum's Black Mirror expert, Keisha Hatchett, to give you all the nuance, the insider commentary, and the details you might have missed in this incredible new season. Plus commentary from creator & showrunner Charlie Brooker! SPOILER ALERT: We're talking about the new season in detail and revealing key plot points. If you haven't watched yet, and you don't want to know what happens, turn back now! You can watch all seven seasons of Black Mirror now in your personalized virtual theater . Follow Netflix Podcasts and read more about Black Mirror on Tudum.com .…
Running out of time to catch up with new arXiv papers? We take the most impactful papers and present them as convenient podcasts. If you're a visual learner, we offer these papers in an engaging video format. Our service fills the gap between overly brief paper summaries and time-consuming full paper reads. You gain academic insights in a time-efficient, digestible format. Code behind this work: https://github.com/imelnyk/ArxivPapers
…
continue reading
Tina models achieve strong reasoning performance cost-effectively using minimal resources and efficient reinforcement learning techniques, surpassing existing models while significantly reducing post-training costs. https://arxiv.org/abs//2504.15777 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Pod…
…
continue reading
Tina models achieve strong reasoning performance cost-effectively using minimal resources and efficient reinforcement learning techniques, surpassing existing models while significantly reducing post-training costs. https://arxiv.org/abs//2504.15777 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Pod…
…
continue reading

1
[QA] LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
8:09
8:09
Play Later
Play Later
Lists
Like
Liked
8:09https://arxiv.org/abs//2504.16078 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
…
continue reading

1
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
15:38
15:38
Play Later
Play Later
Lists
Like
Liked
15:38https://arxiv.org/abs//2504.16078 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
…
continue reading
UFO2 is a multiagent AgentOS for Windows that enhances desktop automation using CUAs, featuring robust task execution, deep OS integration, and improved accuracy across various applications. https://arxiv.org/abs//2504.14603 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.a…
…
continue reading
UFO2 is a multiagent AgentOS for Windows that enhances desktop automation using CUAs, featuring robust task execution, deep OS integration, and improved accuracy across various applications. https://arxiv.org/abs//2504.14603 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.a…
…
continue reading

1
[QA] NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning
8:52
8:52
Play Later
Play Later
Lists
Like
Liked
8:52NEMOTRON-CROSSTHINK enhances reasoning in Large Language Models by integrating diverse data sources and structured templates, improving accuracy and efficiency across various reasoning tasks beyond mathematics. https://arxiv.org/abs//2504.13941 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts…
…
continue reading

1
NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning
31:19
31:19
Play Later
Play Later
Lists
Like
Liked
31:19NEMOTRON-CROSSTHINK enhances reasoning in Large Language Models by integrating diverse data sources and structured templates, improving accuracy and efficiency across various reasoning tasks beyond mathematics. https://arxiv.org/abs//2504.13941 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts…
…
continue reading

1
[QA] Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
7:54
7:54
Play Later
Play Later
Lists
Like
Liked
7:54PODS decouples reinforcement learning phases by parallelizing rollouts and selectively updating, using max-variance down-sampling to enhance performance on the GSM8K benchmark compared to standard GRPO. https://arxiv.org/abs//2504.13818 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https:…
…
continue reading

1
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
7:09
7:09
Play Later
Play Later
Lists
Like
Liked
7:09PODS decouples reinforcement learning phases by parallelizing rollouts and selectively updating, using max-variance down-sampling to enhance performance on the GSM8K benchmark compared to standard GRPO. https://arxiv.org/abs//2504.13818 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https:…
…
continue reading

1
[QA] Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
7:38
7:38
Play Later
Play Later
Lists
Like
Liked
7:38The paper presents a method to accelerate "grokking" in neural networks by using learned embeddings from a weaker model, enabling direct generalization without delay across various tasks. https://arxiv.org/abs//2504.13292 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.appl…
…
continue reading

1
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
16:13
16:13
Play Later
Play Later
Lists
Like
Liked
16:13The paper presents a method to accelerate "grokking" in neural networks by using learned embeddings from a weaker model, enabling direct generalization without delay across various tasks. https://arxiv.org/abs//2504.13292 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.appl…
…
continue reading

1
[QA] Reasoning Models Can Be Effective Without Thinking
7:29
7:29
Play Later
Play Later
Lists
Like
Liked
7:29This paper challenges the necessity of lengthy reasoning processes in LLMs, showing that simple prompting (NoThinking) can outperform traditional methods in various reasoning tasks, especially in low-budget scenarios. https://arxiv.org/abs//2504.09858 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…
…
continue reading

1
Reasoning Models Can Be Effective Without Thinking
20:05
20:05
Play Later
Play Later
Lists
Like
Liked
20:05This paper challenges the necessity of lengthy reasoning processes in LLMs, showing that simple prompting (NoThinking) can outperform traditional methods in various reasoning tasks, especially in low-budget scenarios. https://arxiv.org/abs//2504.09858 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…
…
continue reading

1
[QA] A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
8:27
8:27
Play Later
Play Later
Lists
Like
Liked
8:27This paper analyzes GRPO in reinforcement learning for language models, revealing that a simple rejection sampling method, RAFT, performs competitively and suggesting improvements for future reward-based training approaches. https://arxiv.org/abs//2504.11343 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers …
…
continue reading

1
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
14:38
14:38
Play Later
Play Later
Lists
Like
Liked
14:38This paper analyzes GRPO in reinforcement learning for language models, revealing that a simple rejection sampling method, RAFT, performs competitively and suggesting improvements for future reward-based training approaches. https://arxiv.org/abs//2504.11343 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers …
…
continue reading

1
[QA] CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
7:14
7:14
Play Later
Play Later
Lists
Like
Liked
7:14https://arxiv.org/abs//2504.13161 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
…
continue reading

1
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
20:35
20:35
Play Later
Play Later
Lists
Like
Liked
20:35https://arxiv.org/abs//2504.13161 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
…
continue reading
Antidistillation sampling modifies token probability distributions to weaken reasoning traces for model distillation, enhancing model security while maintaining performance. https://arxiv.org/abs//2504.13146 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podca…
…
continue reading
Antidistillation sampling modifies token probability distributions to weaken reasoning traces for model distillation, enhancing model security while maintaining performance. https://arxiv.org/abs//2504.13146 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podca…
…
continue reading

1
[QA] Position: The Most Expensive Part of an LLM should be its Training Data
7:16
7:16
Play Later
Play Later
Lists
Like
Liked
7:16This paper argues that compensating human labor for training data is the largest cost in developing Large Language Models, significantly exceeding model training expenses, and suggests fairer practices for the future. https://arxiv.org/abs//2504.12427 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…
…
continue reading

1
Position: The Most Expensive Part of an LLM should be its Training Data
20:05
20:05
Play Later
Play Later
Lists
Like
Liked
20:05This paper argues that compensating human labor for training data is the largest cost in developing Large Language Models, significantly exceeding model training expenses, and suggests fairer practices for the future. https://arxiv.org/abs//2504.12427 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…
…
continue reading

1
[QA] Activated LoRA: Fine-tuned LLMs for Intrinsics
8:16
8:16
Play Later
Play Later
Lists
Like
Liked
8:16Activated LoRA (aLoRA) enhances LoRA by adapting weights only for relevant tokens, allowing instant activation without recomputing the KV cache, improving efficiency in multiturn settings. https://arxiv.org/abs//2504.12397 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.app…
…
continue reading

1
Activated LoRA: Fine-tuned LLMs for Intrinsics
18:55
18:55
Play Later
Play Later
Lists
Like
Liked
18:55Activated LoRA (aLoRA) enhances LoRA by adapting weights only for relevant tokens, allowing instant activation without recomputing the KV cache, improving efficiency in multiturn settings. https://arxiv.org/abs//2504.12397 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.app…
…
continue reading

1
[QA] COLORBENCH: Can VLMs See and Understand the Colorful World?
7:49
7:49
Play Later
Play Later
Lists
Like
Liked
7:49The paper presents COLORBENCH, a benchmark to evaluate vision-language models' color understanding, revealing limitations and emphasizing the need for improved color comprehension in multimodal AI. https://arxiv.org/abs//2504.10514 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://pod…
…
continue reading

1
COLORBENCH: Can VLMs See and Understand the Colorful World?
20:40
20:40
Play Later
Play Later
Lists
Like
Liked
20:40The paper presents COLORBENCH, a benchmark to evaluate vision-language models' color understanding, revealing limitations and emphasizing the need for improved color comprehension in multimodal AI. https://arxiv.org/abs//2504.10514 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://pod…
…
continue reading

1
[QA] ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
8:33
8:33
Play Later
Play Later
Lists
Like
Liked
8:33https://arxiv.org/abs//2504.11536 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
…
continue reading

1
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
14:57
14:57
Play Later
Play Later
Lists
Like
Liked
14:57https://arxiv.org/abs//2504.11536 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
…
continue reading
The paper presents TRELAWNEY, a method for rearranging training data to improve causal language models' performance in planning and reasoning without altering architecture, enhancing goal generation capabilities. https://arxiv.org/abs//2504.11336 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcas…
…
continue reading
The paper presents TRELAWNEY, a method for rearranging training data to improve causal language models' performance in planning and reasoning without altering architecture, enhancing goal generation capabilities. https://arxiv.org/abs//2504.11336 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcas…
…
continue reading

1
[QA] How to Predict Best Pretraining Data with Small Experiments
8:16
8:16
Play Later
Play Later
Lists
Like
Liked
8:16The paper introduces DATADECIDE, a suite for evaluating data selection methods, revealing that small-scale model rankings effectively predict larger model performance, enhancing cost-efficient pretraining decisions. https://arxiv.org/abs//2504.11393 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Pod…
…
continue reading

1
How to Predict Best Pretraining Data with Small Experiments
20:22
20:22
Play Later
Play Later
Lists
Like
Liked
20:22The paper introduces DATADECIDE, a suite for evaluating data selection methods, revealing that small-scale model rankings effectively predict larger model performance, enhancing cost-efficient pretraining decisions. https://arxiv.org/abs//2504.11393 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Pod…
…
continue reading

1
[QA] Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
7:18
7:18
Play Later
Play Later
Lists
Like
Liked
7:18This study evaluates OpenAI's GPT-4o, revealing limitations in semantic synthesis, instruction adherence, and reasoning, challenging assumptions about its multimodal capabilities and calling for improved benchmarks and training strategies. https://arxiv.org/abs//2504.08003 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com…
…
continue reading

1
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
7:07
7:07
Play Later
Play Later
Lists
Like
Liked
7:07This study evaluates OpenAI's GPT-4o, revealing limitations in semantic synthesis, instruction adherence, and reasoning, challenging assumptions about its multimodal capabilities and calling for improved benchmarks and training strategies. https://arxiv.org/abs//2504.08003 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com…
…
continue reading

1
[QA] DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training
7:39
7:39
Play Later
Play Later
Lists
Like
Liked
7:39This paper introduces a distribution-level curriculum learning framework for RL-based post-training of LLMs, enhancing reasoning capabilities by adaptively scheduling training across diverse data distributions. https://arxiv.org/abs//2504.09710 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts…
…
continue reading

1
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training
10:11
10:11
Play Later
Play Later
Lists
Like
Liked
10:11This paper introduces a distribution-level curriculum learning framework for RL-based post-training of LLMs, enhancing reasoning capabilities by adaptively scheduling training across diverse data distributions. https://arxiv.org/abs//2504.09710 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts…
…
continue reading

1
[QA] Steering CLIP's vision transformer with sparse autoencoders
8:11
8:11
Play Later
Play Later
Lists
Like
Liked
8:11This study explores sparse autoencoders in vision models, revealing unique processing patterns and enhancing steerability, leading to improved performance in vision disentanglement tasks and defense strategies. https://arxiv.org/abs//2504.08729 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts…
…
continue reading

1
Steering CLIP's vision transformer with sparse autoencoders
17:53
17:53
Play Later
Play Later
Lists
Like
Liked
17:53This study explores sparse autoencoders in vision models, revealing unique processing patterns and enhancing steerability, leading to improved performance in vision disentanglement tasks and defense strategies. https://arxiv.org/abs//2504.08729 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts…
…
continue reading

1
[QA] Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
7:58
7:58
Play Later
Play Later
Lists
Like
Liked
7:58Genius is an unsupervised self-training framework that enhances LLM reasoning without external supervision, using stepwise foresight re-sampling and advantage-calibrated optimization to improve performance. https://arxiv.org/abs//2504.08672 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: ht…
…
continue reading

1
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
18:11
18:11
Play Later
Play Later
Lists
Like
Liked
18:11Genius is an unsupervised self-training framework that enhances LLM reasoning without external supervision, using stepwise foresight re-sampling and advantage-calibrated optimization to improve performance. https://arxiv.org/abs//2504.08672 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: ht…
…
continue reading
The study reveals that language models develop self-correcting abilities during pre-training, enhancing their problem-solving skills, as demonstrated by the OLMo-2-7B model's performance on self-reflection tasks. https://arxiv.org/abs//2504.04022 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcas…
…
continue reading
The study reveals that language models develop self-correcting abilities during pre-training, enhancing their problem-solving skills, as demonstrated by the OLMo-2-7B model's performance on self-reflection tasks. https://arxiv.org/abs//2504.04022 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcas…
…
continue reading
DISCIPL enables language models to generate task-specific inference programs, improving reasoning efficiency and verifiability, and outperforming larger models on constrained generation tasks without requiring finetuning. https://arxiv.org/abs//2504.07081 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers App…
…
continue reading
DISCIPL enables language models to generate task-specific inference programs, improving reasoning efficiency and verifiability, and outperforming larger models on constrained generation tasks without requiring finetuning. https://arxiv.org/abs//2504.07081 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers App…
…
continue reading

1
[QA] Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
7:45
7:45
Play Later
Play Later
Lists
Like
Liked
7:45The study reveals that reasoning LLMs struggle with ill-posed questions, leading to excessive, ineffective responses, while non-reasoning LLMs perform better, highlighting flaws in current training methods.https://arxiv.org/abs//2504.06514YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https:…
…
continue reading

1
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
16:23
16:23
Play Later
Play Later
Lists
Like
Liked
16:23The study reveals that reasoning LLMs struggle with ill-posed questions, leading to excessive, ineffective responses, while non-reasoning LLMs perform better, highlighting flaws in current training methods.https://arxiv.org/abs//2504.06514YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https:…
…
continue reading
The proposed Diffusion Transformer (DDT) improves generation quality and inference speed by decoupling semantic encoding and high-frequency decoding, achieving state-of-the-art performance on ImageNet with faster training convergence.https://arxiv.org/abs//2504.05741YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_…
…
continue reading

1
[QA] Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
7:56
7:56
Play Later
Play Later
Lists
Like
Liked
7:56Dynamic Cheatsheet (DC) enhances language models with persistent memory, improving performance on various tasks by enabling test-time learning and efficient reuse of problem-solving insights without altering model parameters. https://arxiv.org/abs//2504.07952 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers…
…
continue reading

1
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
15:48
15:48
Play Later
Play Later
Lists
Like
Liked
15:48Dynamic Cheatsheet (DC) enhances language models with persistent memory, improving performance on various tasks by enabling test-time learning and efficient reuse of problem-solving insights without altering model parameters. https://arxiv.org/abs//2504.07952 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers…
…
continue reading