Best Team JamBase Podcasts (2024)

1
[QA] Modularity in Transformers: Investigating Neuron Separability & Specialization 6:48

1h ago6:48

6:48

https://arxiv.org/abs//2408.17324 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/supp…

1
Modularity in Transformers: Investigating Neuron Separability & Specialization 6:42

1h ago6:42

6:42

https://arxiv.org/abs//2408.17324 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/supp…

1
[QA] Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling 7:30

13m ago7:30

7:30

https://arxiv.org/abs//2408.16737 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/supp…

1
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling 18:07

14m ago18:07

18:07

https://arxiv.org/abs//2408.16737 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/supp…

1
[QA] Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models 7:55

18m ago7:55

7:55

Dolphin is an energy-efficient decoder-decoder architecture for processing long contexts in language models, achieving significant improvements in energy efficiency and latency while maintaining response quality. https://arxiv.org/abs//2408.15518 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcas…

1
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models 16:54

19m ago16:54

16:54

Dolphin is an energy-efficient decoder-decoder architecture for processing long contexts in language models, achieving significant improvements in energy efficiency and latency while maintaining response quality. https://arxiv.org/abs//2408.15518 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcas…

1
[QA] CycleGAN with Better Cycles 7:17

22m ago7:17

7:17

This project proposes three modifications to CycleGAN's pixel-level cycle consistency, improving image quality and reducing artifacts in unpaired image-to-image translation tasks. https://arxiv.org/abs//2408.15374 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us…

1
CycleGAN with Better Cycles 9:14

23m ago9:14

9:14

This project proposes three modifications to CycleGAN's pixel-level cycle consistency, improving image quality and reducing artifacts in unpaired image-to-image translation tasks. https://arxiv.org/abs//2408.15374 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us…

1
[QA] The Mamba in the Llama: Distilling and Accelerating Hybrid Models 8:15

26m ago8:15

8:15

The paper demonstrates distilling large Transformer models into efficient linear RNNs, achieving competitive performance in language tasks while enhancing deployment efficiency and inference speed with limited resources. https://arxiv.org/abs//2408.15237 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Appl…

1
The Mamba in the Llama: Distilling and Accelerating Hybrid Models 22:21

26m ago22:21

22:21

The paper demonstrates distilling large Transformer models into efficient linear RNNs, achieving competitive performance in language tasks while enhancing deployment efficiency and inference speed with limited resources. https://arxiv.org/abs//2408.15237 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Appl…

1
[QA] Generative Verifiers: Reward Modeling as Next-Token Prediction 10:09

30m ago10:09

10:09

https://arxiv.org/abs//2408.15240 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/supp…

1
Generative Verifiers: Reward Modeling as Next-Token Prediction 16:23

30m ago16:23

16:23

https://arxiv.org/abs//2408.15240 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/supp…

1
[QA] Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler 8:25

1h ago8:25

8:25

This paper explores the correlation between learning rate, batch size, and training tokens, proposing a new Power scheduler that optimizes performance across various model sizes and architectures. https://arxiv.org/abs//2408.13359 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podc…

1
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler 13:02

1h ago13:02

13:02

This paper explores the correlation between learning rate, batch size, and training tokens, proposing a new Power scheduler that optimizes performance across various model sizes and architectures. https://arxiv.org/abs//2408.13359 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podc…

1
[QA] A Law of Next-Token Prediction in Large Language Models 7:40

1h ago7:40

7:40

This paper presents a quantitative law governing contextualized token embeddings in LLMs, revealing equal contributions from all layers to prediction accuracy, enhancing understanding and guiding LLM development practices. https://arxiv.org/abs//2408.13442 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Ap…

1
A Law of Next-Token Prediction in Large Language Models 6:45

1h ago6:45

6:45

This paper presents a quantitative law governing contextualized token embeddings in LLMs, revealing equal contributions from all layers to prediction accuracy, enhancing understanding and guiding LLM development practices. https://arxiv.org/abs//2408.13442 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Ap…

1
[QA] SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection 8:29

19m ago8:29

8:29

This paper presents a framework using a small language model for initial hallucination detection, followed by a large language model for detailed explanations, optimizing real-time interpretable detection. https://arxiv.org/abs//2408.12748 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: htt…

1
SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection 9:51

20m ago9:51

9:51

This paper presents a framework using a small language model for initial hallucination detection, followed by a large language model for detailed explanations, optimizing real-time interpretable detection. https://arxiv.org/abs//2408.12748 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: htt…

1
[QA] How Diffusion Models Learn to Factorize and Compose 8:14

18m ago8:14

8:14

This study explores how diffusion models learn compositional representations through controlled experiments, revealing their ability to encode features but limited interpolation over unseen values, enhancing training efficiency. https://arxiv.org/abs//2408.13256 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_pap…

1
How Diffusion Models Learn to Factorize and Compose 20:36

3d ago20:36

20:36

This study explores how diffusion models learn compositional representations through controlled experiments, revealing their ability to encode features but limited interpolation over unseen values, enhancing training efficiency. https://arxiv.org/abs//2408.13256 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_pap…

1
[QA] FERRET: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique 7:48

24m ago7:48

7:48

FERRET enhances adversarial prompt generation for large language models, improving attack success rates and efficiency over RAINBOW TEAMING while ensuring effective prompts across various model sizes. https://arxiv.org/abs//2408.10701 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://…

1
FERRET: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique 17:44

25m ago17:44

17:44

FERRET enhances adversarial prompt generation for large language models, improving attack success rates and efficiency over RAINBOW TEAMING while ensuring effective prompts across various model sizes. https://arxiv.org/abs//2408.10701 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://…

1
[QA] Scalable Autoregressive Image Generation with Mamba 7:11

29m ago7:11

7:11

AiM is an autoregressive image generative model using Mamba architecture, achieving superior quality and speed in image generation while maintaining efficient long-sequence modeling capabilities. https://arxiv.org/abs//2408.12245 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podca…

1
Scalable Autoregressive Image Generation with Mamba 17:27

30m ago17:27

17:27

AiM is an autoregressive image generative model using Mamba architecture, achieving superior quality and speed in image generation while maintaining efficient long-sequence modeling capabilities. https://arxiv.org/abs//2408.12245 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podca…

1
[QA] TableBench: A Comprehensive and Complex Benchmark for Table Question Answering 7:55

2d ago7:55

7:55

The paper investigates LLMs' challenges with real-world tabular data, proposing the TableBench benchmark and TABLELLM model, highlighting significant gaps between academic performance and industrial application. https://arxiv.org/abs//2408.09174 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcast…

1
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering 21:59

2d ago21:59

21:59

The paper investigates LLMs' challenges with real-world tabular data, proposing the TableBench benchmark and TABLELLM model, highlighting significant gaps between academic performance and industrial application. https://arxiv.org/abs//2408.09174 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcast…

1
[QA] FocusLLM: Scaling LLM's Context by Parallel Decoding 7:35

1h ago7:35

7:35

FocusLLM enhances decoder-only LLMs by efficiently processing long contexts, improving performance on long-context tasks while reducing training costs and maintaining strong language modeling capabilities. https://arxiv.org/abs//2408.11745 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: htt…

1
FocusLLM: Scaling LLM's Context by Parallel Decoding 20:55

1h ago20:55

20:55

FocusLLM enhances decoder-only LLMs by efficiently processing long contexts, improving performance on long-context tasks while reducing training costs and maintaining strong language modeling capabilities. https://arxiv.org/abs//2408.11745 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: htt…

1
[QA] Sapiens: Foundation for Human Vision Models 7:49

21m ago7:49

7:49

Sapiens is a versatile model family for human-centric vision tasks, achieving state-of-the-art performance through self-supervised pretraining and scalable design, excelling in pose estimation, segmentation, depth, and normal prediction. https://arxiv.org/abs//2408.12569 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@…

1
Sapiens: Foundation for Human Vision Models 22:52

21m ago22:52

22:52

Sapiens is a versatile model family for human-centric vision tasks, achieving state-of-the-art performance through self-supervised pretraining and scalable design, excelling in pose estimation, segmentation, depth, and normal prediction. https://arxiv.org/abs//2408.12569 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@…

1
[QA] Show-o: One Single Transformer to Unify Multimodal Understanding and Generation 7:25

36m ago7:25

7:25

Show-o is a unified transformer model that integrates multimodal understanding and generation, outperforming existing models in various vision-language tasks while supporting diverse input-output modalities. https://arxiv.org/abs//2408.12528 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: h…

1
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation 28:14

37m ago28:14

28:14

Show-o is a unified transformer model that integrates multimodal understanding and generation, outperforming existing models in various vision-language tasks while supporting diverse input-output modalities. https://arxiv.org/abs//2408.12528 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: h…

1
[QA] Jamba-1.5: Hybrid Transformer-Mamba Models at Scale 7:22

3h ago7:22

7:22

Jamba-1.5 introduces instruction-tuned large language models with high throughput, low memory usage, and extensive context length, outperforming competitors while being publicly available under an open model license. https://arxiv.org/abs//2408.12570 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Po…

1
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale 16:53

3h ago16:53

16:53

Jamba-1.5 introduces instruction-tuned large language models with high throughput, low memory usage, and extensive context length, outperforming competitors while being publicly available under an open model license. https://arxiv.org/abs//2408.12570 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Po…

1
[QA] Hermes 3 Technical Report 7:45

3h ago7:45

7:45

Hermes 3 is a neutrally-aligned instruct-tuned model with strong reasoning and creativity, achieving state-of-the-art performance on benchmarks, with weights available on Hugging Face. https://arxiv.org/abs//2408.11857 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.c…

1
Hermes 3 Technical Report 11:21

3h ago11:21

11:21

Hermes 3 is a neutrally-aligned instruct-tuned model with strong reasoning and creativity, achieving state-of-the-art performance on benchmarks, with weights available on Hugging Face. https://arxiv.org/abs//2408.11857 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.c…

1
[QA] LLM Pruning and Distillation in Practice: The Minitron Approach 7:23

18m ago7:23

7:23

https://arxiv.org/abs//2408.11796 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/supp…

1
LLM Pruning and Distillation in Practice: The Minitron Approach 12:36

2d ago12:36

12:36

https://arxiv.org/abs//2408.11796 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/supp…

1
[QA] Approaching Deep Learning through the Spectral Dynamics of Weights 7:26

20m ago7:26

7:26

This paper explores spectral dynamics of weights in deep learning, revealing optimization biases, enhancing weight decay effects, and distinguishing between memorizing and generalizing networks across various tasks. https://arxiv.org/abs//2408.11804 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Pod…

1
Approaching Deep Learning through the Spectral Dynamics of Weights 26:45

21m ago26:45

26:45

This paper explores spectral dynamics of weights in deep learning, revealing optimization biases, enhancing weight decay effects, and distinguishing between memorizing and generalizing networks across various tasks. https://arxiv.org/abs//2408.11804 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Pod…

1
[QA] Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations 8:04

2d ago8:04

8:04

The paper challenges the Linear Representation Hypothesis, showing that gated recurrent neural networks encode token sequences using magnitude rather than direction, suggesting broader interpretability in neural network research. https://arxiv.org/abs//2408.10920 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_pa…

1
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations 21:03

1h ago21:03

21:03

The paper challenges the Linear Representation Hypothesis, showing that gated recurrent neural networks encode token sequences using magnitude rather than direction, suggesting broader interpretability in neural network research. https://arxiv.org/abs//2408.10920 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_pa…

1
[QA] Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model 7:53

1h ago7:53

7:53

Transfusion is a multi-modal training method combining language modeling and diffusion, achieving superior performance in generating images and text with models up to 7B parameters. https://arxiv.org/abs//2408.11039 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…

1
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model 24:23

1h ago24:23

24:23

Transfusion is a multi-modal training method combining language modeling and diffusion, achieving superior performance in generating images and text with models up to 7B parameters. https://arxiv.org/abs//2408.11039 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…

1
[QA] Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models 8:37

2d ago8:37

8:37

The paper presents MOHAWK, a method for distilling Transformers into state space models, achieving strong performance with significantly less training data and computational resources. https://arxiv.org/abs//2408.10189 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.c…

1
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models 31:52

1h ago31:52

31:52

The paper presents MOHAWK, a method for distilling Transformers into state space models, achieving strong performance with significantly less training data and computational resources. https://arxiv.org/abs//2408.10189 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.c…

1
[QA] JPEG-LM: LLMs as Image Generators with Canonical Codec Representations 7:47

5d ago7:47

7:47

This paper proposes using canonical codecs for image and video generation in autoregressive models, demonstrating improved efficiency and effectiveness over traditional pixel-based and vector quantization methods. https://arxiv.org/abs//2408.08459 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podca…

1
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations 20:04

1d ago20:04

20:04

This paper proposes using canonical codecs for image and video generation in autoregressive models, demonstrating improved efficiency and effectiveness over traditional pixel-based and vector quantization methods. https://arxiv.org/abs//2408.08459 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podca…

1
[QA] TextCAVs: Debugging vision models using text 7:19

24m ago7:19

7:19

TextCAVs is a novel method for generating concept activation vectors using text descriptions, reducing the need for labeled image data in deep learning model interpretability, particularly in medical applications. https://arxiv.org/abs//2408.08652 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podca…

1
TextCAVs: Debugging vision models using text 9:33

6d ago9:33

9:33

TextCAVs is a novel method for generating concept activation vectors using text descriptions, reducing the need for labeled image data in deep learning model interpretability, particularly in medical applications. https://arxiv.org/abs//2408.08652 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podca…

Podcasts Worth a Listen

Team JamBase Podcasts

Podcasts Worth a Listen

Quick Reference Guide