Running out of time to catch up with new arXiv papers? We take the most impactful papers and present them as convenient podcasts. If you're a visual learner, we offer these papers in an engaging video format. Our service fills the gap between overly brief paper summaries and time-consuming full paper reads. You gain academic insights in a time-efficient, digestible format. Code behind this work: https://github.com/imelnyk/ArxivPapers Support this podcast: https://podcasters.spotify.com/pod/s ...
…
continue reading
1
[QA] Bootstrapping Language Models with DPO Implicit Rewards
8:41
8:41
Play later
Play later
Lists
Like
Liked
8:41
The paper introduces DICE, a method for aligning large language models using implicit rewards from DPO. DICE outperforms Gemini Pro on AlpacaEval 2 with 8B parameters and no external feedback. https://arxiv.org/abs//2406.09760 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts…
…
continue reading
1
Bootstrapping Language Models with DPO Implicit Rewards
16:02
16:02
Play later
Play later
Lists
Like
Liked
16:02
The paper introduces DICE, a method for aligning large language models using implicit rewards from DPO. DICE outperforms Gemini Pro on AlpacaEval 2 with 8B parameters and no external feedback. https://arxiv.org/abs//2406.09760 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts…
…
continue reading
1
[QA] Ad Auctions for LLMs via Retrieval Augmented Generation
10:03
10:03
Play later
Play later
Lists
Like
Liked
10:03
Novel auction mechanisms for ad allocation and pricing in large language models (LLMs) are proposed, maximizing social welfare and ensuring fairness. Empirical evaluation supports the approach's feasibility and effectiveness. https://arxiv.org/abs//2406.09459 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers…
…
continue reading
1
Ad Auctions for LLMs via Retrieval Augmented Generation
14:18
14:18
Play later
Play later
Lists
Like
Liked
14:18
Novel auction mechanisms for ad allocation and pricing in large language models (LLMs) are proposed, maximizing social welfare and ensuring fairness. Empirical evaluation supports the approach's feasibility and effectiveness. https://arxiv.org/abs//2406.09459 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers…
…
continue reading
1
[QA] An Empirical Study of Mamba-based Language Models
10:36
10:36
Play later
Play later
Lists
Like
Liked
10:36
Mamba models challenge Transformers at larger scales, with Mamba-2-Hybrid surpassing Transformers on various tasks, showing potential for efficient token generation. https://arxiv.org/abs//2406.07887 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv…
…
continue reading
1
An Empirical Study of Mamba-based Language Models
28:32
28:32
Play later
Play later
Lists
Like
Liked
28:32
Mamba models challenge Transformers at larger scales, with Mamba-2-Hybrid surpassing Transformers on various tasks, showing potential for efficient token generation. https://arxiv.org/abs//2406.07887 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv…
…
continue reading
1
[QA] Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
7:22
7:22
Play later
Play later
Lists
Like
Liked
7:22
Preference-based learning for language models is crucial for enhancing generation quality. This study explores key components' impact and suggests strategies for effective learning. https://arxiv.org/abs//2406.09279 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…
…
continue reading
1
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
9:29
9:29
Play later
Play later
Lists
Like
Liked
9:29
Preference-based learning for language models is crucial for enhancing generation quality. This study explores key components' impact and suggests strategies for effective learning. https://arxiv.org/abs//2406.09279 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…
…
continue reading
1
[QA] What If We Recaption Billions of Web Images with LLaMA-3?
10:11
10:11
Play later
Play later
Lists
Like
Liked
10:11
The paper introduces Recap-DataComp-1B, an enhanced dataset created using LLaMA-3-8B to improve vision-language model training, showing benefits in performance across various tasks. https://arxiv.org/abs//2406.08478 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…
…
continue reading
1
What If We Recaption Billions of Web Images with LLaMA-3?
12:27
12:27
Play later
Play later
Lists
Like
Liked
12:27
The paper introduces Recap-DataComp-1B, an enhanced dataset created using LLaMA-3-8B to improve vision-language model training, showing benefits in performance across various tasks. https://arxiv.org/abs//2406.08478 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…
…
continue reading
1
[QA] SAMBA: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
9:34
9:34
Play later
Play later
Lists
Like
Liked
9:34
SAMBA is a hybrid model combining Mamba and Sliding Window Attention for efficient sequence modeling with infinite context length, outperforming existing models. https://arxiv.org/abs//2406.07522 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-pap…
…
continue reading
1
SAMBA: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
13:02
13:02
Play later
Play later
Lists
Like
Liked
13:02
SAMBA is a hybrid model combining Mamba and Sliding Window Attention for efficient sequence modeling with infinite context length, outperforming existing models. https://arxiv.org/abs//2406.07522 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-pap…
…
continue reading
1
[QA] Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
6:44
6:44
Play later
Play later
Lists
Like
Liked
6:44
The paper explores the benefits of warmup in deep learning, showing how it improves performance by allowing networks to handle larger learning rates and suggesting alternative initialization methods. https://arxiv.org/abs//2406.09405 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://p…
…
continue reading
1
Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
21:40
21:40
Play later
Play later
Lists
Like
Liked
21:40
The paper explores the benefits of warmup in deep learning, showing how it improves performance by allowing networks to handle larger learning rates and suggesting alternative initialization methods. https://arxiv.org/abs//2406.09405 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://p…
…
continue reading
1
[QA] An Image is Worth More Than 1616 Patches: Exploring Transformers on Individual Pixels
9:36
9:36
Play later
Play later
Lists
Like
Liked
9:36
Vanilla Transformers can achieve high performance in computer vision by treating individual pixels as tokens, challenging the necessity of locality bias in modern architectures. https://arxiv.org/abs//2406.09415 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/p…
…
continue reading
1
An Image is Worth More Than 1616 Patches: Exploring Transformers on Individual Pixels
12:30
12:30
Play later
Play later
Lists
Like
Liked
12:30
Vanilla Transformers can achieve high performance in computer vision by treating individual pixels as tokens, challenging the necessity of locality bias in modern architectures. https://arxiv.org/abs//2406.09415 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/p…
…
continue reading
1
[QA] Large Language Models Must Be Taught to Know What They Don't Know
10:14
10:14
Play later
Play later
Lists
Like
Liked
10:14
Prompting alone is insufficient for reliable uncertainty estimation in large language models. Fine-tuning on a small dataset of correct and incorrect answers can provide better calibration with low computational cost. https://arxiv.org/abs//2406.08391 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…
…
continue reading
1
Large Language Models Must Be Taught to Know What They Don't Know
14:43
14:43
Play later
Play later
Lists
Like
Liked
14:43
Prompting alone is insufficient for reliable uncertainty estimation in large language models. Fine-tuning on a small dataset of correct and incorrect answers can provide better calibration with low computational cost. https://arxiv.org/abs//2406.08391 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…
…
continue reading
1
[QA] State Soup: In-Context Skill Learning, Retrieval and Mixing
7:06
7:06
Play later
Play later
Lists
Like
Liked
7:06
Gated-linear recurrent neural networks excel in sequence modeling due to efficient handling of long sequences. Internal states as task vectors enable fast model merging, improving performance. https://arxiv.org/abs//2406.08423 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts…
…
continue reading
1
State Soup: In-Context Skill Learning, Retrieval and Mixing
4:47
4:47
Play later
Play later
Lists
Like
Liked
4:47
Gated-linear recurrent neural networks excel in sequence modeling due to efficient handling of long sequences. Internal states as task vectors enable fast model merging, improving performance. https://arxiv.org/abs//2406.08423 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts…
…
continue reading
1
[QA] Estimating the Hallucination Rate of Generative AI
9:01
9:01
Play later
Play later
Lists
Like
Liked
9:01
The paper introduces a method to estimate hallucination rates in in-context learning with Generative AI, focusing on Bayesian interpretation and empirical evaluations. https://arxiv.org/abs//2406.07457 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arx…
…
continue reading
1
Estimating the Hallucination Rate of Generative AI
14:06
14:06
Play later
Play later
Lists
Like
Liked
14:06
The paper introduces a method to estimate hallucination rates in in-context learning with Generative AI, focusing on Bayesian interpretation and empirical evaluations. https://arxiv.org/abs//2406.07457 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arx…
…
continue reading
1
[QA] Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement
8:27
8:27
Play later
Play later
Lists
Like
Liked
8:27
Generative models are used to fine-tune Large Language Models, but model collapse can occur. Feedback on synthesized data can prevent this, as shown in theoretical analysis and practical applications. https://arxiv.org/abs//2406.07515 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://…
…
continue reading
1
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement
14:46
14:46
Play later
Play later
Lists
Like
Liked
14:46
Generative models are used to fine-tune Large Language Models, but model collapse can occur. Feedback on synthesized data can prevent this, as shown in theoretical analysis and practical applications. https://arxiv.org/abs//2406.07515 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://…
…
continue reading
Transformers can generalize to novel compositions by using a low-dimensional latent code in multi-head attention, enhancing compositional generalization on abstract reasoning tasks. https://arxiv.org/abs//2406.05816 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…
…
continue reading
1
[QA] Distributional Preference Alignment of LLMs via Optimal Transport
10:14
10:14
Play later
Play later
Lists
Like
Liked
10:14
The paper introduces Alignment via Optimal Transport for distributional preference alignment of LLMs, achieving state-of-the-art results on various datasets and LLMs. https://arxiv.org/abs//2406.05882 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxi…
…
continue reading
1
Distributional Preference Alignment of LLMs via Optimal Transport
13:10
13:10
Play later
Play later
Lists
Like
Liked
13:10
The paper introduces Alignment via Optimal Transport for distributional preference alignment of LLMs, achieving state-of-the-art results on various datasets and LLMs. https://arxiv.org/abs//2406.05882 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxi…
…
continue reading
1
[QA] How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad
9:53
9:53
Play later
Play later
Lists
Like
Liked
9:53
The paper explores the learnability of new syllogisms by Transformers, introducing the concept of distribution locality to determine efficient learning, showing limitations in composing syllogisms on long chains. https://arxiv.org/abs//2406.06467 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcas…
…
continue reading
1
How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad
17:48
17:48
Play later
Play later
Lists
Like
Liked
17:48
The paper explores the learnability of new syllogisms by Transformers, introducing the concept of distribution locality to determine efficient learning, showing limitations in composing syllogisms on long chains. https://arxiv.org/abs//2406.06467 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcas…
…
continue reading
1
[QA] Mixture-of-Agents Enhances Large Language Model Capabilities
10:43
10:43
Play later
Play later
Lists
Like
Liked
10:43
The paper introduces a Mixture-of-Agents (MoA) approach to combine strengths of multiple large language models, outperforming GPT-4 Omni on various tasks. https://arxiv.org/abs//2406.04692 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1…
…
continue reading
1
Mixture-of-Agents Enhances Large Language Model Capabilities
12:46
12:46
Play later
Play later
Lists
Like
Liked
12:46
The paper introduces a Mixture-of-Agents (MoA) approach to combine strengths of multiple large language models, outperforming GPT-4 Omni on various tasks. https://arxiv.org/abs//2406.04692 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1…
…
continue reading
1
[QA] Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
7:56
7:56
Play later
Play later
Lists
Like
Liked
7:56
The paper explores challenges in predicting downstream capabilities of scaled AI systems, identifying factors degrading the relationship between performance and scale, focusing on multiple-choice benchmarks. https://arxiv.org/abs//2406.04391 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: h…
…
continue reading
1
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
10:54
10:54
Play later
Play later
Lists
Like
Liked
10:54
The paper explores challenges in predicting downstream capabilities of scaled AI systems, identifying factors degrading the relationship between performance and scale, focusing on multiple-choice benchmarks. https://arxiv.org/abs//2406.04391 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: h…
…
continue reading
1
[QA] Improving Alignment and Robustness with Short Circuiting
11:25
11:25
Play later
Play later
Lists
Like
Liked
11:25
Novel approach "short-circuits" AI models to prevent harmful outputs, outperforming refusal and adversarial training. Effective for text and multimodal models, even against powerful attacks. https://arxiv.org/abs//2406.04313 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.a…
…
continue reading
1
Improving Alignment and Robustness with Short Circuiting
13:01
13:01
Play later
Play later
Lists
Like
Liked
13:01
Novel approach "short-circuits" AI models to prevent harmful outputs, outperforming refusal and adversarial training. Effective for text and multimodal models, even against powerful attacks. https://arxiv.org/abs//2406.04313 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.a…
…
continue reading
1
[QA] Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
9:29
9:29
Play later
Play later
Lists
Like
Liked
9:29
State-of-the-art large language models exhibit a dramatic breakdown in reasoning capabilities when faced with simple common sense problems, raising concerns about their claimed capabilities. https://arxiv.org/abs//2406.02061 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.a…
…
continue reading
1
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
15:55
15:55
Play later
Play later
Lists
Like
Liked
15:55
State-of-the-art large language models exhibit a dramatic breakdown in reasoning capabilities when faced with simple common sense problems, raising concerns about their claimed capabilities. https://arxiv.org/abs//2406.02061 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.a…
…
continue reading
1
[QA] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
10:07
10:07
Play later
Play later
Lists
Like
Liked
10:07
Buffer of Thoughts (BoT) enhances large language models with thought-augmented reasoning, achieving significant performance improvements on reasoning tasks with superior generalization and robustness. https://arxiv.org/abs//2406.04271 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://…
…
continue reading
1
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
16:14
16:14
Play later
Play later
Lists
Like
Liked
16:14
Buffer of Thoughts (BoT) enhances large language models with thought-augmented reasoning, achieving significant performance improvements on reasoning tasks with superior generalization and robustness. https://arxiv.org/abs//2406.04271 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://…
…
continue reading
1
[QA] Block Transformer: Global-to-Local Language Modeling for Fast Inference
9:22
9:22
Play later
Play later
Lists
Like
Liked
9:22
The paper introduces the Block Transformer architecture, utilizing global-to-local modeling to improve autoregressive transformers and enhance inference throughput by 10-20x compared to vanilla transformers. https://arxiv.org/abs//2406.02657 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: h…
…
continue reading
1
Block Transformer: Global-to-Local Language Modeling for Fast Inference
11:34
11:34
Play later
Play later
Lists
Like
Liked
11:34
The paper introduces the Block Transformer architecture, utilizing global-to-local modeling to improve autoregressive transformers and enhance inference throughput by 10-20x compared to vanilla transformers. https://arxiv.org/abs//2406.02657 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: h…
…
continue reading
1
[CHAT] The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
4:27
4:27
Play later
Play later
Lists
Like
Liked
4:27
Advances in speech decoding from brain activity have been hindered by individual differences and varied data sources. A new approach using self-supervised learning shows promise for generalization and improved performance. https://arxiv.org/abs//2406.04328 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Ap…
…
continue reading
1
[QA] The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
7:42
7:42
Play later
Play later
Lists
Like
Liked
7:42
Advances in speech decoding from brain activity have been hindered by individual differences and varied data sources. A new approach using self-supervised learning shows promise for generalization and improved performance. https://arxiv.org/abs//2406.04328 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Ap…
…
continue reading
1
The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
12:36
12:36
Play later
Play later
Lists
Like
Liked
12:36
Advances in speech decoding from brain activity have been hindered by individual differences and varied data sources. A new approach using self-supervised learning shows promise for generalization and improved performance. https://arxiv.org/abs//2406.04328 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Ap…
…
continue reading
1
[CHAT] Verbalized Machine Learning: Revisiting Machine Learning with Language Models
4:36
4:36
Play later
Play later
Lists
Like
Liked
4:36
The paper introduces Verbalized Machine Learning (VML), a framework where machine learning models are optimized over human-interpretable natural language, offering inductive bias encoding and automatic model selection. https://arxiv.org/abs//2406.04344 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple …
…
continue reading
1
[QA] Verbalized Machine Learning: Revisiting Machine Learning with Language Models
9:48
9:48
Play later
Play later
Lists
Like
Liked
9:48
The paper introduces Verbalized Machine Learning (VML), a framework where machine learning models are optimized over human-interpretable natural language, offering inductive bias encoding and automatic model selection. https://arxiv.org/abs//2406.04344 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple …
…
continue reading
1
Verbalized Machine Learning: Revisiting Machine Learning with Language Models
19:18
19:18
Play later
Play later
Lists
Like
Liked
19:18
The paper introduces Verbalized Machine Learning (VML), a framework where machine learning models are optimized over human-interpretable natural language, offering inductive bias encoding and automatic model selection. https://arxiv.org/abs//2406.04344 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple …
…
continue reading
1
[QA] How Truncating Weights Improves Reasoning in Language Models
9:35
9:35
Play later
Play later
Lists
Like
Liked
9:35
Large language models excel at basic logical reasoning tasks. Removing specific components from weight matrices in pre-trained models can enhance reasoning capabilities by eliminating detrimental global associations. https://arxiv.org/abs//2406.03068 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Po…
…
continue reading
1
How Truncating Weights Improves Reasoning in Language Models
19:43
19:43
Play later
Play later
Lists
Like
Liked
19:43
Large language models excel at basic logical reasoning tasks. Removing specific components from weight matrices in pre-trained models can enhance reasoning capabilities by eliminating detrimental global associations. https://arxiv.org/abs//2406.03068 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Po…
…
continue reading
1
[QA] Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need
5:57
5:57
Play later
Play later
Lists
Like
Liked
5:57
Research challenges the use of prompt tuning in Continual Learning (CL) methods, finding it hinders performance. Replacing it with LoRA improves accuracy across benchmarks, emphasizing the need for rigorous ablations. https://arxiv.org/abs//2406.03216 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…
…
continue reading