Best Patches Stitches Podcasts (2024)

1
Patch-Level Training for Large Language Models 24:02

1h ago24:02

24:02

As Large Language Models (LLMs) achieve remarkable progress in language understanding and generation, their training efficiency has become a critical concern. Traditionally, LLMs are trained to predict the next token in a sequence. Despite the success of token-level training, it suffers from considerable computational costs due to the need to proce…

1
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models 35:12

9h ago35:12

35:12

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing. We propose STORM, a writing system f…

1
IMAGDressing-v1: Customizable Virtual Dressing 27:37

1h ago27:37

27:37

Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience. However, existing VTON technologies neglect the need for merchants to showcase garments comprehensively, including flexible control over garments, optional f…

1
A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights 36:34

1h ago36:34

36:34

Human video generation is a dynamic and rapidly evolving task that aims to synthesize 2D human body video sequences with generative models given control conditions such as text, audio, and pose. With the potential for wide-ranging applications in film, gaming, and virtual communication, the ability to generate natural and realistic human video is c…

1
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence 49:58

3h ago49:58

49:58

The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distribute…

1
SEED-Story: Multimodal Long Story Generation with Large Language Model 22:27

8d ago22:27

22:27

With the remarkable advancements in image generation and open-form text generation, the creation of interleaved image-text content has become an increasingly intriguing field. Multimodal story generation, characterized by producing narrative texts and vivid images in an interleaved manner, has emerged as a valuable and practical task with broad app…

1
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models 39:20

3d ago39:20

39:20

While language models (LMs) have shown potential across a range of decision-making tasks, their reliance on simple acting processes limits their broad deployment as autonomous agents. In this paper, we introduce Language Agent Tree Search (LATS) -- the first general framework that synergizes the capabilities of LMs in reasoning, acting, and plannin…

1
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control 39:35

1d ago39:35

39:35

Portrait Animation aims to synthesize a lifelike video from a single source image, using it as an appearance reference, with motion (i.e., facial expressions and head pose) derived from a driving video, audio, text, or generation. Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-b…

1
Agentless: Demystifying LLM-based Software Engineering Agents 35:54

2d ago35:54

35:54

Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practitioners have developed various autonomous LLM agents to perform end-to-end software development tasks. These agents…

1
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? 36:47

4d ago36:47

36:47

Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for speci…

1
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code 27:24

5d ago27:24

27:24

Despite Large Language Models (LLMs) like GPT-4 achieving impressive results in function-level code generation, they struggle with repository-scale code understanding (e.g., coming up with the right arguments for calling routines), requiring a deeper comprehension of complex file interactions. Also, recently, people have developed LLM agents that a…

1
Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image 22:25

8d ago22:25

22:25

In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability. Previous methods based on Score Distillation Sampling (SDS) can produce diversified 3D results by distilling 3D knowledge from large…

1
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence 37:18

9d ago37:18

37:18

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Co…

1
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time 38:01

15d ago38:01

38:01

The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fin…

1
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture 1:06:40

16d ago1:06:40

1:06:40

There are two common ways in which developers are incorporating proprietary and domain-specific data when building applications of Large Language Models (LLMs): Retrieval-Augmented Generation (RAG) and Fine-Tuning. RAG augments the prompt with the external data, while fine-Tuning incorporates the additional knowledge into the model itself. However,…

1
Seven Failure Points When Engineering a Retrieval Augmented Generation System 21:27

17d ago21:27

21:27

Software engineers are increasingly adding semantic search capabilities to applications using a strategy known as Retrieval Augmented Generation (RAG). A RAG system involves finding documents that semantically match a query and then passing the documents to a large language model (LLM) such as ChatGPT to extract the right answer using an LLM. RAG s…

1
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning 42:09

18d ago42:09

42:09

Language agents perform complex tasks by using tools to execute each step precisely. However, most existing agents are based on proprietary models or designed to target specific tasks, such as mathematics or multi-hop question answering. We introduce Husky, a holistic, open-source language agent that learns to reason over a unified action space to …

1
Recurrent Context Compression: Efficiently Expanding the Context Window of LLM 38:11

19d ago38:11

38:11

To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a method called Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLM…

1
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs 33:39

22d ago33:39

33:39

Retrieval Augmented Generation (RAG) enhances the abilities of Large Language Models (LLMs) by enabling the retrieval of documents into the LLM context to provide more accurate and relevant responses. Existing RAG solutions do not focus on queries that may require fetching multiple documents with substantially different contents. Such queries occur…

1
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning 40:44

23d ago40:44

40:44

Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech, Simul-S2ST requires a policy to control the model to generate corresponding target speech at the opp…

1
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time 39:55

24d ago39:55

39:55

We introduce VASA, a framework for generating lifelike talking faces with appealing visual affective skills (VAS) given a single static image and a speech audio clip. Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and …

1
”Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models 54:48

25d ago54:48

54:48

The misuse of large language models (LLMs) has drawn significant attention from the general public and LLM vendors. One particular type of adversarial prompt, known as jailbreak prompt, has emerged as the main attack vector to bypass the safeguards and elicit harmful content from LLMs. In this paper, employing our new framework JailbreakHub, we con…

1
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models 41:58

1M ago41:58

41:58

Large Language Models (LLMs) are often described as being instances of foundation models - that is, models that transfer strongly across various tasks and conditions in few-show or zero-shot manner, while exhibiting scaling laws that predict function improvement when increasing the pre-training scale. These claims of excelling in different function…

1
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models 33:37

1M ago33:37

33:37

We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs). Specifically, we propose meta-buffer to store a series of informative high-level thoughts, namely thought-template, distilled from the problem-solving processes across v…

1
GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning 33:57

1M ago33:57

33:57

Knowledge Graphs (KGs) represent human-crafted factual knowledge in the form of triplets (head, relation, tail), which collectively form a graph. Question Answering over KGs (KGQA) is the task of answering natural questions grounding the reasoning to the information provided by the KG. Large Language Models (LLMs) are the state-of-the-art models fo…

1
AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct} 28:24

1M ago28:24

28:24

We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test 90.9% vs. 90.2%). In addition, AutoCoder offers a more versatile code interpreter compared to GPT-4 Turbo and GPT-4o. It's code interpreter can install external packages instead of limiting to built-in pac…

1
From Sora What We Can See: A Survey of Text-to-Video Generation 1:27:32

2M ago1:27:32

1:27:32

With impressive achievements made, artificial intelligence is on the path forward to artificial general intelligence. Sora, developed by OpenAI, which is capable of minute-level world-simulative abilities can be considered as a milestone on this developmental path. However, despite its notable successes, Sora still encounters various obstacles that…

1
The Future of Large Language Model Pre-training is Federated 34:55

2M ago34:55

34:55

Generative pre-trained large language models (LLMs) have demonstrated impressive performance over a wide range of tasks, thanks to the unprecedented amount of data they have been trained on. As established scaling laws indicate, LLMs' future performance improvement depends on the amount of computing and data sources we can leverage for pre-training…

1
Long-form factuality in large language models 37:52

2M ago37:52

37:52

Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be…

1
Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head 42:15

2M ago42:15

42:15

End-to-end transformer-based detectors (DETRs) have shown exceptional performance in both closed-set and open-vocabulary object detection (OVD) tasks through the integration of language modalities. However, their demanding computational requirements have hindered their practical application in real-time object detection (OD) scenarios. In this pape…

1
Retrieval-Augmented Generation for AI-Generated Content: A Survey 1:13:57

2M ago1:13:57

1:13:57

Advancements in model algorithms, the growth of foundational models, and access to high-quality datasets have propelled the evolution of Artificial Intelligence Generated Content (AIGC). Despite its notable successes, AIGC still faces hurdles such as updating knowledge, handling long-tail data, mitigating data leakage, and managing high training an…

1
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning 26:36

2M ago26:36

26:36

Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings suggest that the low-rank updating mechanism may limit the ability of LLMs to effectively learn and memorize new knowledge. Inspired by this observation, w…

1
LightAutoML: AutoML Solution for a Large Financial Services Ecosystem 54:50

2M ago54:50

54:50

We present an AutoML system called LightAutoML developed for a large European financial services company and its ecosystem satisfying the set of idiosyncratic requirements that this ecosystem has for AutoML solutions. Our framework was piloted and deployed in numerous applications and performed at the level of the experienced data scientists while …

1
Efficient Multimodal Large Language Models: A Survey 1:12:40

2M ago1:12:40

1:12:40

In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficie…

1
The Platonic Representation Hypothesis 45:05

2M ago45:05

45:05

We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, we demonstrate convergence across data modalities: as vision model…

1
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment 33:04

2M ago33:04

33:04

Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences. Consequently, aligning these models with human ethics and preferences is an essential step toward ensuring their…

1
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks 52:21

2M ago52:21

52:21

Penetration testing, an essential component of software security testing, allows organizations to proactively identify and remediate vulnerabilities in their systems, thus bolstering their defense mechanisms against potential cyberattacks. One recent advancement in the realm of penetration testing is the utilization of Language Models (LLMs). We ex…

1
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval 36:53

2M ago36:53

36:53

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages…

1
A decoder-only foundation model for time-series forecasting 19:41

2M ago19:41

19:41

Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based…

1
Autonomous LLM-driven research from data to human-verifiable research papers 31:11

2M ago31:11

31:11

As AI promises to accelerate scientific discovery, it remains unclear whether fully AI-driven research is possible and whether it can adhere to key scientific values, such as transparency, traceability and verifiability. Mimicking human scientific practices, we built data-to-paper, an automation platform that guides interacting LLM agents through a…

1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model 41:56

2M ago41:56

41:56

We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) an…

1
Granite Code Models: A Family of Open Foundation Models for Code Intelligence 58:32

2M ago58:32

58:32

Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potent…

1
Improving Diffusion Models for Virtual Try-on 27:25

3M ago27:25

27:25

This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment, given a pair of images depicting the person and the garment, respectively. Previous works adapt existing exemplar-based inpainting diffusion models for virtual try-on to improve the naturalness of the generated visuals compared to other me…

1
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation 34:04

3M ago34:04

34:04

For recent diffusion-based generative models, maintaining consistent content across a series of generated images, especially those containing subjects and complex details, presents a significant challenge. In this paper, we propose a new way of self-attention calculation, termed Consistent Self-Attention, that significantly boosts the consistency b…

1
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing 1:16:45

3M ago1:16:45

1:16:45

Large Language Models (LLMs) have catalyzed significant advancements in Natural Language Processing (NLP), yet they encounter challenges such as hallucination and the need for domain-specific knowledge. To mitigate these, recent methodologies have integrated information retrieved from external resources with LLMs, substantially enhancing their perf…

1
KAN: Kolmogorov-Arnold Networks 1:33:54

3M ago1:33:54

1:33:54

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight paramet…

1
Make Your LLM Fully Utilize the Context 20:48

3M ago20:48

20:48

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a lon…

1
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites 43:03

3M ago43:03

43:03

In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foun…

1
Dynamic Generation of Personalities with Large Language Models 24:33

3M ago24:33

24:33

In the realm of mimicking human deliberation, large language models (LLMs) show promising performance, thereby amplifying the importance of this research area. Deliberation is influenced by both logic and personality. However, previous studies predominantly focused on the logic of LLMs, neglecting the exploration of personality aspects. In this wor…

1
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length 27:30

3M ago27:30

27:30

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy. We introduce Megalodon, a neural architecture…

Podcasts Worth a Listen

Patches Stitches Podcasts

Podcasts Worth a Listen

Quick Reference Guide