Best Igor Melnyk Podcasts (2024)

1
[QA] Bootstrapping Language Models with DPO Implicit Rewards 8:41

9m ago8:41

8:41

The paper introduces DICE, a method for aligning large language models using implicit rewards from DPO. DICE outperforms Gemini Pro on AlpacaEval 2 with 8B parameters and no external feedback. https://arxiv.org/abs//2406.09760 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts…

1
Bootstrapping Language Models with DPO Implicit Rewards 16:02

10m ago16:02

16:02

The paper introduces DICE, a method for aligning large language models using implicit rewards from DPO. DICE outperforms Gemini Pro on AlpacaEval 2 with 8B parameters and no external feedback. https://arxiv.org/abs//2406.09760 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts…

1
[QA] Ad Auctions for LLMs via Retrieval Augmented Generation 10:03

28m ago10:03

10:03

Novel auction mechanisms for ad allocation and pricing in large language models (LLMs) are proposed, maximizing social welfare and ensuring fairness. Empirical evaluation supports the approach's feasibility and effectiveness. https://arxiv.org/abs//2406.09459 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers…

1
Ad Auctions for LLMs via Retrieval Augmented Generation 14:18

28m ago14:18

14:18

Novel auction mechanisms for ad allocation and pricing in large language models (LLMs) are proposed, maximizing social welfare and ensuring fairness. Empirical evaluation supports the approach's feasibility and effectiveness. https://arxiv.org/abs//2406.09459 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers…

1
[QA] An Empirical Study of Mamba-based Language Models 10:36

1h ago10:36

10:36

Mamba models challenge Transformers at larger scales, with Mamba-2-Hybrid surpassing Transformers on various tasks, showing potential for efficient token generation. https://arxiv.org/abs//2406.07887 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv…

1
An Empirical Study of Mamba-based Language Models 28:32

1h ago28:32

28:32

Mamba models challenge Transformers at larger scales, with Mamba-2-Hybrid surpassing Transformers on various tasks, showing potential for efficient token generation. https://arxiv.org/abs//2406.07887 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv…

1
[QA] Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback 7:22

1h ago7:22

7:22

Preference-based learning for language models is crucial for enhancing generation quality. This study explores key components' impact and suggests strategies for effective learning. https://arxiv.org/abs//2406.09279 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…

1
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback 9:29

1h ago9:29

9:29

Preference-based learning for language models is crucial for enhancing generation quality. This study explores key components' impact and suggests strategies for effective learning. https://arxiv.org/abs//2406.09279 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…

1
[QA] What If We Recaption Billions of Web Images with LLaMA-3? 10:11

2h ago10:11

10:11

The paper introduces Recap-DataComp-1B, an enhanced dataset created using LLaMA-3-8B to improve vision-language model training, showing benefits in performance across various tasks. https://arxiv.org/abs//2406.08478 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…

1
What If We Recaption Billions of Web Images with LLaMA-3? 12:27

4h ago12:27

12:27

The paper introduces Recap-DataComp-1B, an enhanced dataset created using LLaMA-3-8B to improve vision-language model training, showing benefits in performance across various tasks. https://arxiv.org/abs//2406.08478 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…

1
[QA] SAMBA: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling 9:34

4h ago9:34

9:34

SAMBA is a hybrid model combining Mamba and Sliding Window Attention for efficient sequence modeling with infinite context length, outperforming existing models. https://arxiv.org/abs//2406.07522 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-pap…

1
SAMBA: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling 13:02

4h ago13:02

13:02

SAMBA is a hybrid model combining Mamba and Sliding Window Attention for efficient sequence modeling with infinite context length, outperforming existing models. https://arxiv.org/abs//2406.07522 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-pap…

1
[QA] Why Warmup the Learning Rate? Underlying Mechanisms and Improvements 6:44

1d ago6:44

6:44

The paper explores the benefits of warmup in deep learning, showing how it improves performance by allowing networks to handle larger learning rates and suggesting alternative initialization methods. https://arxiv.org/abs//2406.09405 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://p…

1
Why Warmup the Learning Rate? Underlying Mechanisms and Improvements 21:40

1h ago21:40

21:40

The paper explores the benefits of warmup in deep learning, showing how it improves performance by allowing networks to handle larger learning rates and suggesting alternative initialization methods. https://arxiv.org/abs//2406.09405 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://p…

1
[QA] An Image is Worth More Than 1616 Patches: Exploring Transformers on Individual Pixels 9:36

8m ago9:36

9:36

Vanilla Transformers can achieve high performance in computer vision by treating individual pixels as tokens, challenging the necessity of locality bias in modern architectures. https://arxiv.org/abs//2406.09415 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/p…

1
An Image is Worth More Than 1616 Patches: Exploring Transformers on Individual Pixels 12:30

9m ago12:30

12:30

Vanilla Transformers can achieve high performance in computer vision by treating individual pixels as tokens, challenging the necessity of locality bias in modern architectures. https://arxiv.org/abs//2406.09415 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/p…

1
[QA] Large Language Models Must Be Taught to Know What They Don't Know 10:14

4d ago10:14

10:14

Prompting alone is insufficient for reliable uncertainty estimation in large language models. Fine-tuning on a small dataset of correct and incorrect answers can provide better calibration with low computational cost. https://arxiv.org/abs//2406.08391 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…

1
Large Language Models Must Be Taught to Know What They Don't Know 14:43

1h ago14:43

14:43

Prompting alone is insufficient for reliable uncertainty estimation in large language models. Fine-tuning on a small dataset of correct and incorrect answers can provide better calibration with low computational cost. https://arxiv.org/abs//2406.08391 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…

1
[QA] State Soup: In-Context Skill Learning, Retrieval and Mixing 7:06

11m ago7:06

7:06

Gated-linear recurrent neural networks excel in sequence modeling due to efficient handling of long sequences. Internal states as task vectors enable fast model merging, improving performance. https://arxiv.org/abs//2406.08423 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts…

1
State Soup: In-Context Skill Learning, Retrieval and Mixing 4:47

11m ago4:47

4:47

Gated-linear recurrent neural networks excel in sequence modeling due to efficient handling of long sequences. Internal states as task vectors enable fast model merging, improving performance. https://arxiv.org/abs//2406.08423 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts…

1
[QA] Estimating the Hallucination Rate of Generative AI 9:01

3d ago9:01

9:01

The paper introduces a method to estimate hallucination rates in in-context learning with Generative AI, focusing on Bayesian interpretation and empirical evaluations. https://arxiv.org/abs//2406.07457 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arx…

1
Estimating the Hallucination Rate of Generative AI 14:06

27m ago14:06

14:06

The paper introduces a method to estimate hallucination rates in in-context learning with Generative AI, focusing on Bayesian interpretation and empirical evaluations. https://arxiv.org/abs//2406.07457 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arx…

1
[QA] Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement 8:27

18m ago8:27

8:27

Generative models are used to fine-tune Large Language Models, but model collapse can occur. Feedback on synthesized data can prevent this, as shown in theoretical analysis and practical applications. https://arxiv.org/abs//2406.07515 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://…

1
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement 14:46

9h ago14:46

14:46

Generative models are used to fine-tune Large Language Models, but model collapse can occur. Feedback on synthesized data can prevent this, as shown in theoretical analysis and practical applications. https://arxiv.org/abs//2406.07515 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://…

1
Attention as a Hypernetwork 14:45

4d ago14:45

14:45

Transformers can generalize to novel compositions by using a low-dimensional latent code in multi-head attention, enhancing compositional generalization on abstract reasoning tasks. https://arxiv.org/abs//2406.05816 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…

1
[QA] Distributional Preference Alignment of LLMs via Optimal Transport 10:14

2h ago10:14

10:14

The paper introduces Alignment via Optimal Transport for distributional preference alignment of LLMs, achieving state-of-the-art results on various datasets and LLMs. https://arxiv.org/abs//2406.05882 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxi…

1
Distributional Preference Alignment of LLMs via Optimal Transport 13:10

2h ago13:10

13:10

The paper introduces Alignment via Optimal Transport for distributional preference alignment of LLMs, achieving state-of-the-art results on various datasets and LLMs. https://arxiv.org/abs//2406.05882 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxi…

1
[QA] How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad 9:53

2h ago9:53

9:53

The paper explores the learnability of new syllogisms by Transformers, introducing the concept of distribution locality to determine efficient learning, showing limitations in composing syllogisms on long chains. https://arxiv.org/abs//2406.06467 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcas…

1
How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad 17:48

2h ago17:48

17:48

The paper explores the learnability of new syllogisms by Transformers, introducing the concept of distribution locality to determine efficient learning, showing limitations in composing syllogisms on long chains. https://arxiv.org/abs//2406.06467 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcas…

1
[QA] Mixture-of-Agents Enhances Large Language Model Capabilities 10:43

2d ago10:43

10:43

The paper introduces a Mixture-of-Agents (MoA) approach to combine strengths of multiple large language models, outperforming GPT-4 Omni on various tasks. https://arxiv.org/abs//2406.04692 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1…

1
Mixture-of-Agents Enhances Large Language Model Capabilities 12:46

16m ago12:46

12:46

The paper introduces a Mixture-of-Agents (MoA) approach to combine strengths of multiple large language models, outperforming GPT-4 Omni on various tasks. https://arxiv.org/abs//2406.04692 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1…

1
[QA] Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? 7:56

26m ago7:56

7:56

The paper explores challenges in predicting downstream capabilities of scaled AI systems, identifying factors degrading the relationship between performance and scale, focusing on multiple-choice benchmarks. https://arxiv.org/abs//2406.04391 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: h…

1
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? 10:54

27m ago10:54

10:54

The paper explores challenges in predicting downstream capabilities of scaled AI systems, identifying factors degrading the relationship between performance and scale, focusing on multiple-choice benchmarks. https://arxiv.org/abs//2406.04391 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: h…

1
[QA] Improving Alignment and Robustness with Short Circuiting 11:25

44m ago11:25

11:25

Novel approach "short-circuits" AI models to prevent harmful outputs, outperforming refusal and adversarial training. Effective for text and multimodal models, even against powerful attacks. https://arxiv.org/abs//2406.04313 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.a…

1
Improving Alignment and Robustness with Short Circuiting 13:01

44m ago13:01

13:01

Novel approach "short-circuits" AI models to prevent harmful outputs, outperforming refusal and adversarial training. Effective for text and multimodal models, even against powerful attacks. https://arxiv.org/abs//2406.04313 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.a…

1
[QA] Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models 9:29

1h ago9:29

9:29

State-of-the-art large language models exhibit a dramatic breakdown in reasoning capabilities when faced with simple common sense problems, raising concerns about their claimed capabilities. https://arxiv.org/abs//2406.02061 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.a…

1
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models 15:55

1h ago15:55

15:55

State-of-the-art large language models exhibit a dramatic breakdown in reasoning capabilities when faced with simple common sense problems, raising concerns about their claimed capabilities. https://arxiv.org/abs//2406.02061 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.a…

1
[QA] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models 10:07

19m ago10:07

10:07

Buffer of Thoughts (BoT) enhances large language models with thought-augmented reasoning, achieving significant performance improvements on reasoning tasks with superior generalization and robustness. https://arxiv.org/abs//2406.04271 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://…

1
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models 16:14

19m ago16:14

16:14

Buffer of Thoughts (BoT) enhances large language models with thought-augmented reasoning, achieving significant performance improvements on reasoning tasks with superior generalization and robustness. https://arxiv.org/abs//2406.04271 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://…

1
[QA] Block Transformer: Global-to-Local Language Modeling for Fast Inference 9:22

23m ago9:22

9:22

The paper introduces the Block Transformer architecture, utilizing global-to-local modeling to improve autoregressive transformers and enhance inference throughput by 10-20x compared to vanilla transformers. https://arxiv.org/abs//2406.02657 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: h…

1
Block Transformer: Global-to-Local Language Modeling for Fast Inference 11:34

23m ago11:34

11:34

The paper introduces the Block Transformer architecture, utilizing global-to-local modeling to improve autoregressive transformers and enhance inference throughput by 10-20x compared to vanilla transformers. https://arxiv.org/abs//2406.02657 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: h…

1
[CHAT] The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning 4:27

15h ago4:27

4:27

Advances in speech decoding from brain activity have been hindered by individual differences and varied data sources. A new approach using self-supervised learning shows promise for generalization and improved performance. https://arxiv.org/abs//2406.04328 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Ap…

1
[QA] The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning 7:42

15h ago7:42

7:42

Advances in speech decoding from brain activity have been hindered by individual differences and varied data sources. A new approach using self-supervised learning shows promise for generalization and improved performance. https://arxiv.org/abs//2406.04328 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Ap…

1
The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning 12:36

15h ago12:36

12:36

Advances in speech decoding from brain activity have been hindered by individual differences and varied data sources. A new approach using self-supervised learning shows promise for generalization and improved performance. https://arxiv.org/abs//2406.04328 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Ap…

1
[CHAT] Verbalized Machine Learning: Revisiting Machine Learning with Language Models 4:36

15h ago4:36

4:36

The paper introduces Verbalized Machine Learning (VML), a framework where machine learning models are optimized over human-interpretable natural language, offering inductive bias encoding and automatic model selection. https://arxiv.org/abs//2406.04344 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple …

1
[QA] Verbalized Machine Learning: Revisiting Machine Learning with Language Models 9:48

15h ago9:48

9:48

The paper introduces Verbalized Machine Learning (VML), a framework where machine learning models are optimized over human-interpretable natural language, offering inductive bias encoding and automatic model selection. https://arxiv.org/abs//2406.04344 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple …

1
Verbalized Machine Learning: Revisiting Machine Learning with Language Models 19:18

15h ago19:18

19:18

The paper introduces Verbalized Machine Learning (VML), a framework where machine learning models are optimized over human-interpretable natural language, offering inductive bias encoding and automatic model selection. https://arxiv.org/abs//2406.04344 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple …

1
[QA] How Truncating Weights Improves Reasoning in Language Models 9:35

11d ago9:35

9:35

Large language models excel at basic logical reasoning tasks. Removing specific components from weight matrices in pre-trained models can enhance reasoning capabilities by eliminating detrimental global associations. https://arxiv.org/abs//2406.03068 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Po…

1
How Truncating Weights Improves Reasoning in Language Models 19:43

9d ago19:43

19:43

Large language models excel at basic logical reasoning tasks. Removing specific components from weight matrices in pre-trained models can enhance reasoning capabilities by eliminating detrimental global associations. https://arxiv.org/abs//2406.03068 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Po…

1
[QA] Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need 5:57

9d ago5:57

5:57

Research challenges the use of prompt tuning in Continual Learning (CL) methods, finding it hinders performance. Replacing it with LoRA improves accuracy across benchmarks, emphasizing the need for rigorous ablations. https://arxiv.org/abs//2406.03216 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…

Podcasts Worth a Listen

Igor Melnyk Podcasts

Podcasts Worth a Listen

Quick Reference Guide