Activated LoRA: Fine-tuned LLMs For Intrinsics Arxiv Papers podcast

A

Arxiv Papers

1
[QA] Log-Linear Attention 7:50

9 hours ago7:50

7:50

This paper introduces log-linear attention, enhancing linear attention's efficiency by using a logarithmically growing set of hidden states, improving sequence modeling while maintaining computational efficiency. https://arxiv.org/abs//2506.04761 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Log-Linear Attention 21:59

9 hours ago21:59

21:59

This paper introduces log-linear attention, enhancing linear attention's efficiency by using a logarithmically growing set of hidden states, improving sequence modeling while maintaining computational efficiency. https://arxiv.org/abs//2506.04761 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening 7:45

9 hours ago7:45

7:45

This paper critiques GRPO's bias in training language models for theorem proving and introduces the unlikeliness reward to enhance performance and sample diversity, achieving competitive results. https://arxiv.org/abs//2506.02355 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening 16:56

9 hours ago16:56

16:56

This paper critiques GRPO's bias in training language models for theorem proving and introduces the unlikeliness reward to enhance performance and sample diversity, achieving competitive results. https://arxiv.org/abs//2506.02355 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Self-Challenging Language Model Agents 7:26

20 hours ago7:26

7:26

The Self-Challenging framework enables agents to generate and train on high-quality tasks autonomously, achieving significant performance improvements using self-generated data in tool-use benchmarks. https://arxiv.org/abs//2506.01716 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Self-Challenging Language Model Agents 22:33

20 hours ago22:33

22:33

The Self-Challenging framework enables agents to generate and train on high-quality tasks autonomously, achieving significant performance improvements using self-generated data in tool-use benchmarks. https://arxiv.org/abs//2506.01716 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Why Gradients Rapidly Increase Near the End of Training 7:00

20 hours ago7:00

7:00

https://arxiv.org/abs//2506.02285 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
Why Gradients Rapidly Increase Near the End of Training 11:24

20 hours ago11:24

11:24

https://arxiv.org/abs//2506.02285 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
[QA] GEM: Empowering LLM for both Embedding Generation and Language Understanding 7:41

2 days ago7:41

7:41

The paper introduces GEM, a self-supervised method enabling decoder-only LLMs to generate high-quality text embeddings, enhancing performance on embedding benchmarks while preserving original text generation capabilities. https://arxiv.org/abs//2506.04344 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
GEM: Empowering LLM for both Embedding Generation and Language Understanding 20:38

2 days ago20:38

20:38

The paper introduces GEM, a self-supervised method enabling decoder-only LLMs to generate high-quality text embeddings, enhancing performance on embedding benchmarks while preserving original text generation capabilities. https://arxiv.org/abs//2506.04344 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] HYPERSTEER: Activation Steering at Scale with Hypernetworks 7:49

3 days ago7:49

7:49

HYPERSTEER introduces hypernetwork architectures for generating effective steering vectors in language models, outperforming existing methods and achieving strong performance on unseen prompts. Code available at GitHub. https://arxiv.org/abs//2506.03292 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
HYPERSTEER: Activation Steering at Scale with Hypernetworks 9:15

3 days ago9:15

9:15

HYPERSTEER introduces hypernetwork architectures for generating effective steering vectors in language models, outperforming existing methods and achieving strong performance on unseen prompts. Code available at GitHub. https://arxiv.org/abs//2506.03292 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Data Recipes for Reasoning Models 8:06

3 days ago8:06

8:06

The OpenThoughts project creates open-source datasets for reasoning models, achieving state-of-the-art results with OpenThinker3-7B, trained on 1.2M examples, available at openthoughts.ai. https://arxiv.org/abs//2506.04178 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Data Recipes for Reasoning Models 18:07

3 days ago18:07

18:07

The OpenThoughts project creates open-source datasets for reasoning models, achieving state-of-the-art results with OpenThinker3-7B, trained on 1.2M examples, available at openthoughts.ai. https://arxiv.org/abs//2506.04178 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Accelerating Diffusion LLMs via Adaptive Parallel Decoding 8:08

4 days ago8:08

8:08

The paper introduces adaptive parallel decoding (APD), enhancing diffusion large language models' speed by dynamically adjusting token sampling, improving throughput while maintaining quality compared to autoregressive models. https://arxiv.org/abs//2506.00413 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones 7:26

10 days ago7:26

7:26

This paper explores optimal inference-time computation for large language models, revealing scenarios where sequential scaling significantly outperforms parallel scaling, particularly in graph connectivity problems. https://arxiv.org/abs//2505.21825 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones 24:00

10 days ago24:00

24:00

This paper explores optimal inference-time computation for large language models, revealing scenarios where sequential scaling significantly outperforms parallel scaling, particularly in graph connectivity problems. https://arxiv.org/abs//2505.21825 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Maximizing Confidence Alone Improves Reasoning 7:08

10 days ago7:08

7:08

The paper introduces RENT, an unsupervised reinforcement learning method using entropy minimization as intrinsic reward, enhancing reasoning abilities in language models without external supervision across various benchmarks. https://arxiv.org/abs//2505.22660 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Maximizing Confidence Alone Improves Reasoning 13:21

10 days ago13:21

13:21

The paper introduces RENT, an unsupervised reinforcement learning method using entropy minimization as intrinsic reward, enhancing reasoning abilities in language models without external supervision across various benchmarks. https://arxiv.org/abs//2505.22660 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Hardware-Efficient Attention for Fast Decoding 7:57

11 days ago7:57

7:57

This paper presents Grouped-Tied Attention and Grouped Latent Attention to enhance LLM decoding efficiency, reducing memory transfers and latency while maintaining model quality and improving throughput. https://arxiv.org/abs//2505.21487 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Hardware-Efficient Attention for Fast Decoding 30:59

11 days ago30:59

30:59

This paper presents Grouped-Tied Attention and Grouped Latent Attention to enhance LLM decoding efficiency, reducing memory transfers and latency while maintaining model quality and improving throughput. https://arxiv.org/abs//2505.21487 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Reinforcing General Reasoning without Verifiers 7:08

11 days ago7:08

7:08

The paper introduces VeriFree, a verifier-free reinforcement learning method that enhances large language models' reasoning capabilities, outperforming verifier-based methods while reducing computational demands. https://arxiv.org/abs//2505.21493 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Reinforcing General Reasoning without Verifiers 17:11

11 days ago17:11

17:11

The paper introduces VeriFree, a verifier-free reinforcement learning method that enhances large language models' reasoning capabilities, outperforming verifier-based methods while reducing computational demands. https://arxiv.org/abs//2505.21493 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] ENIGMATA: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles 8:16

12 days ago8:16

8:16

https://arxiv.org/abs//2505.19914 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
ENIGMATA: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles 23:54

12 days ago23:54

23:54

https://arxiv.org/abs//2505.19914 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
[QA] Temporal Sampling for Forgotten Reasoning in LLMs 7:04

12 days ago7:04

7:04

The paper introduces "Temporal Forgetting," where LLMs lose previously learned problem-solving skills, and proposes "Temporal Sampling" to recover these abilities, enhancing reasoning performance without retraining. https://arxiv.org/abs//2505.20196 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Temporal Sampling for Forgotten Reasoning in LLMs 10:43

12 days ago10:43

10:43

The paper introduces "Temporal Forgetting," where LLMs lose previously learned problem-solving skills, and proposes "Temporal Sampling" to recover these abilities, enhancing reasoning performance without retraining. https://arxiv.org/abs//2505.20196 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box Systems 10:15

13 days ago10:15

10:15

This paper examines how large language models (LLMs) can better identify black-box functions through active data collection, improving their reverse-engineering capabilities and aiding scientific discovery. https://arxiv.org/abs//2505.17968 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box Systems 17:21

13 days ago17:21

17:21

This paper examines how large language models (LLMs) can better identify black-box functions through active data collection, improving their reverse-engineering capabilities and aiding scientific discovery. https://arxiv.org/abs//2505.17968 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Generative Distribution Embeddings 7:54

13 days ago7:54

7:54

The paper introduces generative distribution embeddings (GDE), a framework for learning representations of distributions, demonstrating superior performance in various computational biology applications. https://arxiv.org/abs//2505.18150 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Accelerating Diffusion LLMs via Adaptive Parallel Decoding 21:09

4 days ago21:09

21:09

The paper introduces adaptive parallel decoding (APD), enhancing diffusion large language models' speed by dynamically adjusting token sampling, improving throughput while maintaining quality compared to autoregressive models. https://arxiv.org/abs//2506.00413 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning 7:34

4 days ago7:34

7:34

This paper presents a self-reflection and reinforcement learning method that enhances large language models' performance on complex tasks, achieving significant improvements even with limited feedback. https://arxiv.org/abs//2505.24726 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning 16:44

4 days ago16:44

16:44

This paper presents a self-reflection and reinforcement learning method that enhances large language models' performance on complex tasks, achieving significant improvements even with limited feedback. https://arxiv.org/abs//2505.24726 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Esoteric Language Models 8:08

5 days ago8:08

8:08

Eso-LMs combine autoregressive and masked diffusion models, improving perplexity and inference efficiency with KV caching, achieving state-of-the-art performance and significantly faster inference rates. Code and checkpoints available online. https://arxiv.org/abs//2506.01928 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Esoteric Language Models 34:16

5 days ago34:16

34:16

Eso-LMs combine autoregressive and masked diffusion models, improving perplexity and inference efficiency with KV caching, achieving state-of-the-art performance and significantly faster inference rates. Code and checkpoints available online. https://arxiv.org/abs//2506.01928 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning 8:08

5 days ago8:08

8:08

This study explores Reinforcement Learning with Verifiable Rewards (RLVR) through token entropy patterns, revealing that high-entropy tokens significantly enhance reasoning performance in Large Language Models. https://arxiv.org/abs//2506.01939 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning 23:02

5 days ago23:02

23:02

This study explores Reinforcement Learning with Verifiable Rewards (RLVR) through token entropy patterns, revealing that high-entropy tokens significantly enhance reasoning performance in Large Language Models. https://arxiv.org/abs//2506.01939 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] ALPHAONE: Reasoning Models Thinking Slow and Fast at Test Time 7:21

6 days ago7:21

7:21

ALPHAONE is a framework that enhances reasoning in large models by dynamically modulating thinking phases, improving efficiency and performance across various challenging benchmarks. https://arxiv.org/abs//2505.24863 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
ALPHAONE: Reasoning Models Thinking Slow and Fast at Test Time 17:12

6 days ago17:12

17:12

ALPHAONE is a framework that enhances reasoning in large models by dynamically modulating thinking phases, improving efficiency and performance across various challenging benchmarks. https://arxiv.org/abs//2505.24863 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models 7:40

6 days ago7:40

7:40

This paper introduces ProRL, a training method that enhances reasoning in language models through reinforcement learning, revealing novel strategies and outperforming base models in various evaluations. https://arxiv.org/abs//2505.24864 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models 23:32

6 days ago23:32

23:32

This paper introduces ProRL, a training method that enhances reasoning in language models through reinforcement learning, revealing novel strategies and outperforming base models in various evaluations. https://arxiv.org/abs//2505.24864 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Are Reasoning Models More Prone to Hallucination? 7:52

9 days ago7:52

7:52

This paper investigates hallucination in large reasoning models, analyzing post-training effects, cognitive behaviors, and model uncertainty, revealing insights into their impact on factual accuracy. https://arxiv.org/abs//2505.23646 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Are Reasoning Models More Prone to Hallucination? 20:24

9 days ago20:24

20:24

This paper investigates hallucination in large reasoning models, analyzing post-training effects, cognitive behaviors, and model uncertainty, revealing insights into their impact on factual accuracy. https://arxiv.org/abs//2505.23646 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] How does Transformer Learn Implicit Reasoning? 8:56

9 days ago8:56

8:56

This paper explores implicit multi-hop reasoning in large language models, revealing a developmental trajectory and introducing diagnostic tools to enhance interpretability and understanding of reasoning processes. https://arxiv.org/abs//2505.23653 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
How does Transformer Learn Implicit Reasoning? 23:21

9 days ago23:21

23:21

This paper explores implicit multi-hop reasoning in large language models, revealing a developmental trajectory and introducing diagnostic tools to enhance interpretability and understanding of reasoning processes. https://arxiv.org/abs//2505.23653 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

Similar to Arxiv Papers

Amazon Basics Multipurpose Copy Printer Paper, 20 lb, 92 Bright, 8.5" x 11", White, 8 Reams, 4000 Sheets (500 Sheets/Ream)

Bounty Quick Size Paper Towels, White, 8 Family Rolls = 20 Regular Rolls (Packaging May Vary)

Bounty Paper Towels Quick Size, White, 16 Family Rolls = 40 Regular Rolls (Packaging May Vary)

Podcasts Worth a Listen

Arxiv Papers « » Activated LoRA: Fine-tuned LLMs for Intrinsics

Activated LoRA: Fine-tuned LLMs for Intrinsics

Podcasts Worth a Listen

Welcome to Player FM!

The Let Them Theory: A Life-Changing Tool That Millions of People Can't Stop Talking About

The Let Them Theory: A Life-Changing Tool That Millions of People Can't Stop Talking About

Pink Pony Club

Amazon eGift Card - Bright Balloons (Animated)

The Let Them Theory: A Life-Changing Tool That Millions of People Can't Stop Talking About

Similar to Arxiv Papers

Quick Reference Guide

Arxiv Papers « »
Activated LoRA: Fine-tuned LLMs for Intrinsics