Best Olympiad Podcasts (2024)

1
#06 Science Olympiad 13:31

10h ago13:31

13:31

Three of our 5th graders, Abby, Elly, and Jerrick, recently competed at the Science Olympiad. They sit down to talk about their experiences in this hands-on Science fun!By Kenneth Blum

1
[QA] Can Language Models Solve Olympiad Programming? 11:43

11h ago11:43

11:43

This paper introduces the USACO benchmark for evaluating language models on computing olympiad problems, highlighting challenges and proposing novel inference methods. https://arxiv.org/abs//2404.10952 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arx…

1
[QA] OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework 8:13

1h ago8:13

8:13

OpenELM, a state-of-the-art open language model, enhances accuracy using layer-wise scaling. Released with complete training framework, it empowers open research community. Available on GitHub and HuggingFace. https://arxiv.org/abs//2404.14619 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts:…

1
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework 8:58

1h ago8:58

8:58

OpenELM, a state-of-the-art open language model, enhances accuracy using layer-wise scaling. Released with complete training framework, it empowers open research community. Available on GitHub and HuggingFace. https://arxiv.org/abs//2404.14619 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts:…

1
[QA] Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Perfect Reasoners 7:30

35m ago7:30

7:30

The paper introduces DUP prompting strategy to improve Large Language Models' performance on complex reasoning tasks, outperforming Zero-Shot CoT on diverse datasets, achieving state-of-the-art results. https://arxiv.org/abs//2404.14963 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https:…

1
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Perfect Reasoners 10:56

35m ago10:56

10:56

The paper introduces DUP prompting strategy to improve Large Language Models' performance on complex reasoning tasks, outperforming Zero-Shot CoT on diverse datasets, achieving state-of-the-art results. https://arxiv.org/abs//2404.14963 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https:…

1
[QA] SnapKV: LLM Knows What You are Looking for Before Generation 10:00

39m ago10:00

10:00

SnapKV is a fine-tuning-free method that efficiently reduces Key-Value cache size in Large Language Models, maintaining performance while enhancing memory and time efficiency for long input sequences. https://arxiv.org/abs//2404.14469 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://…

1
SnapKV: LLM Knows What You are Looking for Before Generation 17:09

40m ago17:09

17:09

SnapKV is a fine-tuning-free method that efficiently reduces Key-Value cache size in Large Language Models, maintaining performance while enhancing memory and time efficiency for long input sequences. https://arxiv.org/abs//2404.14469 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://…

1
[QA] Multi-Head Mixture-of-Experts 8:05

11m ago8:05

8:05

MH-MoE addresses low expert activation and lack of fine-grained analysis in SMoE by using a multi-head mechanism to enhance context understanding and expert activation. https://arxiv.org/abs//2404.15045 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/ar…

1
Multi-Head Mixture-of-Experts 14:15

12m ago14:15

14:15

MH-MoE addresses low expert activation and lack of fine-grained analysis in SMoE by using a multi-head mechanism to enhance context understanding and expert activation. https://arxiv.org/abs//2404.15045 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/ar…

1
[QA] The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions 9:21

1h ago9:21

9:21

LLMs are vulnerable to attacks due to equal priority given to all prompts. Proposed instruction hierarchy teaches models to ignore lower-priority instructions, enhancing robustness with minimal impact on capabilities. https://arxiv.org/abs//2404.13208 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…

1
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions 13:06

1h ago13:06

13:06

LLMs are vulnerable to attacks due to equal priority given to all prompts. Proposed instruction hierarchy teaches models to ignore lower-priority instructions, enhancing robustness with minimal impact on capabilities. https://arxiv.org/abs//2404.13208 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple P…

1
[QA] Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data 9:37

1h ago9:37

9:37

https://arxiv.org/abs//2404.14367 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/supp…

1
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data 18:47

1h ago18:47

18:47

https://arxiv.org/abs//2404.14367 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/supp…

1
[QA] Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone 9:02

1h ago9:02

9:02

Introducing phi-3-mini, a high-performing language model trained on a large dataset, with smaller versions phi-3-small and phi-3-medium showing even better performance. https://arxiv.org/abs//2404.14219 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/ar…

1
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone 5:58

1h ago5:58

5:58

Introducing phi-3-mini, a high-performing language model trained on a large dataset, with smaller versions phi-3-small and phi-3-medium showing even better performance. https://arxiv.org/abs//2404.14219 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/ar…

1
[QA] Towards Reliable Latent Knowledge Estimation in LLMs: In-Context Learning vs. Prompting Based Factual Knowledge Extraction 8:22

7h ago8:22

8:22

Approach estimates latent knowledge in large language models using in-context learning, showing differences in factual knowledge across models and sizes. https://arxiv.org/abs//2404.12957 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id16…

1
Towards Reliable Latent Knowledge Estimation in LLMs: In-Context Learning vs. Prompting Based Factual Knowledge Extraction 23:51

8m ago23:51

23:51

Approach estimates latent knowledge in large language models using in-context learning, showing differences in factual knowledge across models and sizes. https://arxiv.org/abs//2404.12957 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id16…

1
[QA] HalluciBot: Is There No Such Thing as a Bad Question? 11:45

13m ago11:45

11:45

HalluciBot predicts hallucination probability before generation in Large Language Models, aiding in query quality assessment and user accountability, potentially reducing computational waste. https://arxiv.org/abs//2404.12535 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.…

1
HalluciBot: Is There No Such Thing as a Bad Question? 14:33

13m ago14:33

14:33

HalluciBot predicts hallucination probability before generation in Large Language Models, aiding in query quality assessment and user accountability, potentially reducing computational waste. https://arxiv.org/abs//2404.12535 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.…

1
[QA] Stronger Random Baselines for In-Context Learning 9:48

13m ago9:48

9:48

Evaluating language models' in-context learning performance faces challenges. A stronger random baseline is proposed, improving evaluation accuracy and predicting held-out performance effectively. https://arxiv.org/abs//2404.13020 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podc…

1
Stronger Random Baselines for In-Context Learning 14:58

14m ago14:58

14:58

Evaluating language models' in-context learning performance faces challenges. A stronger random baseline is proposed, improving evaluation accuracy and predicting held-out performance effectively. https://arxiv.org/abs//2404.13020 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podc…

1
[QA] Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study 6:51

31m ago6:51

6:51

The paper compares Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO) in aligning large language models with human feedback, showing PPO outperforms DPO in various RLHF testbeds. https://arxiv.org/abs//2404.10719 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https…

1
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study 12:38

1h ago12:38

12:38

The paper compares Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO) in aligning large language models with human feedback, showing PPO outperforms DPO in various RLHF testbeds. https://arxiv.org/abs//2404.10719 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https…

1
[QA] The Illusion of State in State-Space Models 7:50

1h ago7:50

7:50

State-space models (SSMs) are not more expressive than transformers for state tracking due to limitations in computational complexity, as shown through analysis and experiments. https://arxiv.org/abs//2404.08819 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/p…

1
The Illusion of State in State-Space Models 19:21

1h ago19:21

19:21

State-space models (SSMs) are not more expressive than transformers for state tracking due to limitations in computational complexity, as shown through analysis and experiments. https://arxiv.org/abs//2404.08819 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/p…

1
[QA] Chinchilla Scaling: A replication attempt 7:55

1h ago7:55

7:55

Hoffmann et al. (2022) propose three methods for estimating a compute-optimal scaling law. Replication of their third method reveals inconsistencies and implausibly narrow confidence intervals. https://arxiv.org/abs//2404.10102 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcast…

1
Chinchilla Scaling: A replication attempt 8:35

1h ago8:35

8:35

Hoffmann et al. (2022) propose three methods for estimating a compute-optimal scaling law. Replication of their third method reveals inconsistencies and implausibly narrow confidence intervals. https://arxiv.org/abs//2404.10102 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcast…

1
[QA] From R to Q: Your Language Model is Secretly a Q-Function 8:06

28m ago8:06

8:06

The paper addresses the mismatch between Direct Preference Optimization (DPO) and standard Reinforcement Learning From Human Feedback (RLHF) setups, proposing a token-level approach for improved performance. https://arxiv.org/abs//2404.12358 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: h…

1
From R to Q: Your Language Model is Secretly a Q-Function 15:17

28m ago15:17

15:17

The paper addresses the mismatch between Direct Preference Optimization (DPO) and standard Reinforcement Learning From Human Feedback (RLHF) setups, proposing a token-level approach for improved performance. https://arxiv.org/abs//2404.12358 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: h…

1
[QA] Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing 9:38

33m ago9:38

9:38

ALPHALLM integrates Monte Carlo Tree Search with Large Language Models for self-improvement, enhancing reasoning abilities without additional annotations, addressing challenges in complex tasks. https://arxiv.org/abs//2404.12253 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcas…

1
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing 22:01

34m ago22:01

22:01

ALPHALLM integrates Monte Carlo Tree Search with Large Language Models for self-improvement, enhancing reasoning abilities without additional annotations, addressing challenges in complex tasks. https://arxiv.org/abs//2404.12253 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcas…

1
[QA] Dynamic Typography: Bringing Text to Life via Video Diffusion Prior 11:18

38m ago11:18

11:18

Automated Dynamic Typography scheme deforms letters to convey meaning and adds vibrant movements based on user prompts, maintaining legibility and coherence in text animations. https://arxiv.org/abs//2404.11614 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/po…

1
Dynamic Typography: Bringing Text to Life via Video Diffusion Prior 16:35

38m ago16:35

16:35

Automated Dynamic Typography scheme deforms letters to convey meaning and adds vibrant movements based on user prompts, maintaining legibility and coherence in text animations. https://arxiv.org/abs//2404.11614 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/po…

1
[QA] TRIFORCE: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding 8:12

41m ago8:12

8:12

TRIFORCE introduces a hierarchical speculative decoding system to improve efficiency in long-sequence generation with large language models, achieving impressive speedups and scalability while maintaining generation quality. https://arxiv.org/abs//2404.11912 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers …

1
TRIFORCE: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding 15:07

41m ago15:07

15:07

TRIFORCE introduces a hierarchical speculative decoding system to improve efficiency in long-sequence generation with large language models, achieving impressive speedups and scalability while maintaining generation quality. https://arxiv.org/abs//2404.11912 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers …

1
[QA] Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models 8:55

1h ago8:55

8:55

Reka introduces powerful multimodal language models - Core, Flash, and Edge - outperforming larger models in various tasks, approaching state-of-the-art performance. https://arxiv.org/abs//2404.12387 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv…

1
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models 14:17

1h ago14:17

14:17

Reka introduces powerful multimodal language models - Core, Flash, and Edge - outperforming larger models in various tasks, approaching state-of-the-art performance. https://arxiv.org/abs//2404.12387 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv…

1
[QA] BLINK: Multimodal Large Language Models Can See but Not Perceive 10:54

1h ago10:54

10:54

BLINK introduces a benchmark for multimodal language models focusing on visual perception tasks challenging for current models, with human accuracy significantly outperforming existing LLMs. https://arxiv.org/abs//2404.12390 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.a…

1
BLINK: Multimodal Large Language Models Can See but Not Perceive 13:10

1h ago13:10

13:10

BLINK introduces a benchmark for multimodal language models focusing on visual perception tasks challenging for current models, with human accuracy significantly outperforming existing LLMs. https://arxiv.org/abs//2404.12390 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.a…

1
[QA] Fewer Truncations Improve Language Modeling 7:40

11h ago7:40

7:40

Best-fit Packing method optimizes large language model training by packing documents into training sequences without unnecessary truncations, improving model coherence and performance significantly. https://arxiv.org/abs//2404.10830 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://po…

1
Fewer Truncations Improve Language Modeling 12:14

11h ago12:14

12:14

Best-fit Packing method optimizes large language model training by packing documents into training sequences without unnecessary truncations, improving model coherence and performance significantly. https://arxiv.org/abs//2404.10830 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://po…

1
[QA] Many-Shot In-Context Learning 8:40

3d ago8:40

8:40

https://arxiv.org/abs//2404.11018 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/supp…

1
Many-Shot In-Context Learning 21:06

20h ago21:06

21:06

https://arxiv.org/abs//2404.11018 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/supp…

1
Can Language Models Solve Olympiad Programming? 14:18

1d ago14:18

14:18

This paper introduces the USACO benchmark for evaluating language models on computing olympiad problems, highlighting challenges and proposing novel inference methods. https://arxiv.org/abs//2404.10952 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arx…

1
[QA] Social Choice for Al Alignment: Dealing with Diverse Human Feedback 8:29

1d ago8:29

8:29

The paper explores fine-tuning foundation models like GPT-4 to avoid problematic behavior, focusing on aggregating human input for collective preferences using social choice theory. https://arxiv.org/abs//2404.10271 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…

1
[short] Social Choice for Al Alignment: Dealing with Diverse Human Feedback 2:21

1d ago2:21

2:21

The paper explores fine-tuning foundation models like GPT-4 to avoid problematic behavior, focusing on aggregating human input for collective preferences using social choice theory. https://arxiv.org/abs//2404.10271 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…

1
Social Choice for Al Alignment: Dealing with Diverse Human Feedback 23:07

1d ago23:07

23:07

The paper explores fine-tuning foundation models like GPT-4 to avoid problematic behavior, focusing on aggregating human input for collective preferences using social choice theory. https://arxiv.org/abs//2404.10271 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…

1
[QA] Self-playing Adversarial Language Game Enhances LLM Reasoning 9:41

1d ago9:41

9:41

The paper explores self-play training of large language models in an adversarial language game to enhance reasoning ability, showing performance improvement on reasoning benchmarks. https://arxiv.org/abs//2404.10642 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…

1
[short] Self-playing Adversarial Language Game Enhances LLM Reasoning 2:33

1d ago2:33

2:33

The paper explores self-play training of large language models in an adversarial language game to enhance reasoning ability, showing performance improvement on reasoning benchmarks. https://arxiv.org/abs//2404.10642 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…

Podcasts Worth a Listen

Olympiad Podcasts

Podcasts Worth a Listen

Quick Reference Guide