Best Igor Melnyk Podcasts (2025)

1
[QA] Tina: Tiny Reasoning Models via LoRA 7:48

Play Pause

about 12 hours ago7:48

7:48

Tina models achieve strong reasoning performance cost-effectively using minimal resources and efficient reinforcement learning techniques, surpassing existing models while significantly reducing post-training costs. https://arxiv.org/abs//2504.15777 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Pod…

1
Tina: Tiny Reasoning Models via LoRA 17:19

Play Pause

about 12 hours ago17:19

17:19

Tina models achieve strong reasoning performance cost-effectively using minimal resources and efficient reinforcement learning techniques, surpassing existing models while significantly reducing post-training costs. https://arxiv.org/abs//2504.15777 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Pod…

1
[QA] LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities 8:09

Play Pause

about 12 hours ago8:09

8:09

https://arxiv.org/abs//2504.16078 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

1
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities 15:38

Play Pause

about 12 hours ago15:38

15:38

https://arxiv.org/abs//2504.16078 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

1
[QA] UFO2: The Desktop AgentOS 8:33

a day ago8:33

8:33

UFO2 is a multiagent AgentOS for Windows that enhances desktop automation using CUAs, featuring robust task execution, deep OS integration, and improved accuracy across various applications. https://arxiv.org/abs//2504.14603 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.a…

1
UFO2: The Desktop AgentOS 57:07

a day ago57:07

57:07

UFO2 is a multiagent AgentOS for Windows that enhances desktop automation using CUAs, featuring robust task execution, deep OS integration, and improved accuracy across various applications. https://arxiv.org/abs//2504.14603 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.a…

1
[QA] NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning 8:52

a day ago8:52

8:52

NEMOTRON-CROSSTHINK enhances reasoning in Large Language Models by integrating diverse data sources and structured templates, improving accuracy and efficiency across various reasoning tasks beyond mathematics. https://arxiv.org/abs//2504.13941 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts…

1
NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning 31:19

a day ago31:19

31:19

NEMOTRON-CROSSTHINK enhances reasoning in Large Language Models by integrating diverse data sources and structured templates, improving accuracy and efficiency across various reasoning tasks beyond mathematics. https://arxiv.org/abs//2504.13941 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts…

1
[QA] Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning 7:54

3 days ago7:54

7:54

PODS decouples reinforcement learning phases by parallelizing rollouts and selectively updating, using max-variance down-sampling to enhance performance on the GSM8K benchmark compared to standard GRPO. https://arxiv.org/abs//2504.13818 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https:…

1
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning 7:09

3 days ago7:09

7:09

PODS decouples reinforcement learning phases by parallelizing rollouts and selectively updating, using max-variance down-sampling to enhance performance on the GSM8K benchmark compared to standard GRPO. https://arxiv.org/abs//2504.13818 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https:…

1
[QA] Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model 7:38

3 days ago7:38

7:38

The paper presents a method to accelerate "grokking" in neural networks by using learned embeddings from a weaker model, enabling direct generalization without delay across various tasks. https://arxiv.org/abs//2504.13292 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.appl…

1
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model 16:13

3 days ago16:13

16:13

The paper presents a method to accelerate "grokking" in neural networks by using learned embeddings from a weaker model, enabling direct generalization without delay across various tasks. https://arxiv.org/abs//2504.13292 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.appl…

1
[QA] Reasoning Models Can Be Effective Without Thinking 7:29

4 days ago7:29

7:29

This paper challenges the necessity of lengthy reasoning processes in LLMs, showing that simple prompting (NoThinking) can outperform traditional methods in various reasoning tasks, especially in low-budget scenarios. https://arxiv.org/abs//2504.09858 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…

1
Reasoning Models Can Be Effective Without Thinking 20:05

4 days ago20:05

20:05

This paper challenges the necessity of lengthy reasoning processes in LLMs, showing that simple prompting (NoThinking) can outperform traditional methods in various reasoning tasks, especially in low-budget scenarios. https://arxiv.org/abs//2504.09858 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…

1
[QA] A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce 8:27

4 days ago8:27

8:27

This paper analyzes GRPO in reinforcement learning for language models, revealing that a simple rejection sampling method, RAFT, performs competitively and suggesting improvements for future reward-based training approaches. https://arxiv.org/abs//2504.11343 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers …

1
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce 14:38

4 days ago14:38

14:38

This paper analyzes GRPO in reinforcement learning for language models, revealing that a simple rejection sampling method, RAFT, performs competitively and suggesting improvements for future reward-based training approaches. https://arxiv.org/abs//2504.11343 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers …

1
[QA] CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training 7:14

5 days ago7:14

7:14

https://arxiv.org/abs//2504.13161 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

1
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training 20:35

5 days ago20:35

20:35

https://arxiv.org/abs//2504.13161 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

1
[QA] Antidistillation Sampling 7:21

5 days ago7:21

7:21

Antidistillation sampling modifies token probability distributions to weaken reasoning traces for model distillation, enhancing model security while maintaining performance. https://arxiv.org/abs//2504.13146 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podca…

1
Antidistillation Sampling 10:44

5 days ago10:44

10:44

Antidistillation sampling modifies token probability distributions to weaken reasoning traces for model distillation, enhancing model security while maintaining performance. https://arxiv.org/abs//2504.13146 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podca…

1
[QA] Position: The Most Expensive Part of an LLM should be its Training Data 7:16

6 days ago7:16

7:16

This paper argues that compensating human labor for training data is the largest cost in developing Large Language Models, significantly exceeding model training expenses, and suggests fairer practices for the future. https://arxiv.org/abs//2504.12427 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…

1
Position: The Most Expensive Part of an LLM should be its Training Data 20:05

6 days ago20:05

20:05

This paper argues that compensating human labor for training data is the largest cost in developing Large Language Models, significantly exceeding model training expenses, and suggests fairer practices for the future. https://arxiv.org/abs//2504.12427 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…

1
[QA] Activated LoRA: Fine-tuned LLMs for Intrinsics 8:16

6 days ago8:16

8:16

Activated LoRA (aLoRA) enhances LoRA by adapting weights only for relevant tokens, allowing instant activation without recomputing the KV cache, improving efficiency in multiturn settings. https://arxiv.org/abs//2504.12397 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.app…

1
Activated LoRA: Fine-tuned LLMs for Intrinsics 18:55

6 days ago18:55

18:55

Activated LoRA (aLoRA) enhances LoRA by adapting weights only for relevant tokens, allowing instant activation without recomputing the KV cache, improving efficiency in multiturn settings. https://arxiv.org/abs//2504.12397 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.app…

1
[QA] COLORBENCH: Can VLMs See and Understand the Colorful World? 7:49

6 days ago7:49

7:49

The paper presents COLORBENCH, a benchmark to evaluate vision-language models' color understanding, revealing limitations and emphasizing the need for improved color comprehension in multimodal AI. https://arxiv.org/abs//2504.10514 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://pod…

1
COLORBENCH: Can VLMs See and Understand the Colorful World? 20:40

6 days ago20:40

20:40

The paper presents COLORBENCH, a benchmark to evaluate vision-language models' color understanding, revealing limitations and emphasizing the need for improved color comprehension in multimodal AI. https://arxiv.org/abs//2504.10514 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://pod…

1
[QA] ReTool: Reinforcement Learning for Strategic Tool Use in LLMs 8:33

6 days ago8:33

8:33

https://arxiv.org/abs//2504.11536 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

1
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs 14:57

6 days ago14:57

14:57

https://arxiv.org/abs//2504.11536 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

1
[QA] Looking beyond the next token 7:22

7 days ago7:22

7:22

The paper presents TRELAWNEY, a method for rearranging training data to improve causal language models' performance in planning and reasoning without altering architecture, enhancing goal generation capabilities. https://arxiv.org/abs//2504.11336 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcas…

1
Looking beyond the next token 16:58

7 days ago16:58

16:58

The paper presents TRELAWNEY, a method for rearranging training data to improve causal language models' performance in planning and reasoning without altering architecture, enhancing goal generation capabilities. https://arxiv.org/abs//2504.11336 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcas…

1
[QA] How to Predict Best Pretraining Data with Small Experiments 8:16

7 days ago8:16

8:16

The paper introduces DATADECIDE, a suite for evaluating data selection methods, revealing that small-scale model rankings effectively predict larger model performance, enhancing cost-efficient pretraining decisions. https://arxiv.org/abs//2504.11393 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Pod…

1
How to Predict Best Pretraining Data with Small Experiments 20:22

7 days ago20:22

20:22

The paper introduces DATADECIDE, a suite for evaluating data selection methods, revealing that small-scale model rankings effectively predict larger model performance, enhancing cost-efficient pretraining decisions. https://arxiv.org/abs//2504.11393 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Pod…

1
[QA] Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability 7:18

9 days ago7:18

7:18

This study evaluates OpenAI's GPT-4o, revealing limitations in semantic synthesis, instruction adherence, and reasoning, challenging assumptions about its multimodal capabilities and calling for improved benchmarks and training strategies. https://arxiv.org/abs//2504.08003 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com…

1
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability 7:07

9 days ago7:07

7:07

This study evaluates OpenAI's GPT-4o, revealing limitations in semantic synthesis, instruction adherence, and reasoning, challenging assumptions about its multimodal capabilities and calling for improved benchmarks and training strategies. https://arxiv.org/abs//2504.08003 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com…

1
[QA] DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training 7:39

9 days ago7:39

7:39

This paper introduces a distribution-level curriculum learning framework for RL-based post-training of LLMs, enhancing reasoning capabilities by adaptively scheduling training across diverse data distributions. https://arxiv.org/abs//2504.09710 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts…

1
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training 10:11

9 days ago10:11

10:11

This paper introduces a distribution-level curriculum learning framework for RL-based post-training of LLMs, enhancing reasoning capabilities by adaptively scheduling training across diverse data distributions. https://arxiv.org/abs//2504.09710 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts…

1
[QA] Steering CLIP's vision transformer with sparse autoencoders 8:11

9 days ago8:11

8:11

This study explores sparse autoencoders in vision models, revealing unique processing patterns and enhancing steerability, leading to improved performance in vision disentanglement tasks and defense strategies. https://arxiv.org/abs//2504.08729 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts…

1
Steering CLIP's vision transformer with sparse autoencoders 17:53

9 days ago17:53

17:53

This study explores sparse autoencoders in vision models, revealing unique processing patterns and enhancing steerability, leading to improved performance in vision disentanglement tasks and defense strategies. https://arxiv.org/abs//2504.08729 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts…

1
[QA] Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning 7:58

9 days ago7:58

7:58

Genius is an unsupervised self-training framework that enhances LLM reasoning without external supervision, using stepwise foresight re-sampling and advantage-calibrated optimization to improve performance. https://arxiv.org/abs//2504.08672 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: ht…

1
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning 18:11

9 days ago18:11

18:11

Genius is an unsupervised self-training framework that enhances LLM reasoning without external supervision, using stepwise foresight re-sampling and advantage-calibrated optimization to improve performance. https://arxiv.org/abs//2504.08672 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: ht…

1
[QA] Rethinking Reflection in Pre-Training 8:18

11 days ago8:18

8:18

The study reveals that language models develop self-correcting abilities during pre-training, enhancing their problem-solving skills, as demonstrated by the OLMo-2-7B model's performance on self-reflection tasks. https://arxiv.org/abs//2504.04022 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcas…

1
Rethinking Reflection in Pre-Training 17:47

11 days ago17:47

17:47

The study reveals that language models develop self-correcting abilities during pre-training, enhancing their problem-solving skills, as demonstrated by the OLMo-2-7B model's performance on self-reflection tasks. https://arxiv.org/abs//2504.04022 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcas…

1
[QA] Self-Steering Language Models 7:21

11 days ago7:21

7:21

DISCIPL enables language models to generate task-specific inference programs, improving reasoning efficiency and verifiability, and outperforming larger models on constrained generation tasks without requiring finetuning. https://arxiv.org/abs//2504.07081 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers App…

1
Self-Steering Language Models 8:43

11 days ago8:43

8:43

DISCIPL enables language models to generate task-specific inference programs, improving reasoning efficiency and verifiability, and outperforming larger models on constrained generation tasks without requiring finetuning. https://arxiv.org/abs//2504.07081 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers App…

1
[QA] Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? 7:45

12 days ago7:45

7:45

The study reveals that reasoning LLMs struggle with ill-posed questions, leading to excessive, ineffective responses, while non-reasoning LLMs perform better, highlighting flaws in current training methods.https://arxiv.org/abs//2504.06514YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https:…

1
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? 16:23

12 days ago16:23

16:23

The study reveals that reasoning LLMs struggle with ill-posed questions, leading to excessive, ineffective responses, while non-reasoning LLMs perform better, highlighting flaws in current training methods.https://arxiv.org/abs//2504.06514YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https:…

1
DDT: Decoupled Diffusion Transformer 8:07

12 days ago8:07

8:07

The proposed Diffusion Transformer (DDT) improves generation quality and inference speed by decoupling semantic encoding and high-frequency decoding, achieving state-of-the-art performance on ImageNet with faster training convergence.https://arxiv.org/abs//2504.05741YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_…

1
[QA] Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory 7:56

12 days ago7:56

7:56

Dynamic Cheatsheet (DC) enhances language models with persistent memory, improving performance on various tasks by enabling test-time learning and efficient reuse of problem-solving insights without altering model parameters. https://arxiv.org/abs//2504.07952 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers…

1
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory 15:48

12 days ago15:48

15:48

Dynamic Cheatsheet (DC) enhances language models with persistent memory, improving performance on various tasks by enabling test-time learning and efficient reuse of problem-solving insights without altering model parameters. https://arxiv.org/abs//2504.07952 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers…

Podcasts Worth a Listen

Igor Melnyk Podcasts

Podcasts Worth a Listen

Quick Reference Guide