Best Codec Podcasts (2024)

1
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model 29:24

20h ago29:24

29:24

Recent advancements in audio generation have been significantly propelled by the capabilities of Large Language Models (LLMs). The existing research on audio LLM has primarily focused on enhancing the architecture and scale of audio language models, as well as leveraging larger datasets, and generally, acoustic codecs, such as EnCodec, are used for…

1
S11|E17 Crystal Clear Audio for Every Call and Meeting with Webex AI Codec 39:11

24d ago39:11

39:11

Hearing and being heard is essential in effective collaboration, but with today’s distributed, hybrid workforce, challenges like background noise and network disruptions can result in poor communication. With Webex, you can have crystal clear interactions despite poor network conditions or while in a noisy environment. The new Webex AI Codec (now a…

1
LLaMA-Omni: Seamless Speech Interaction with Large Language Models 32:15

2h ago32:15

32:15

Models like GPT-4o enable real-time interaction with large language models (LLMs) through speech, significantly enhancing user experience compared to traditional text-based interaction. However, there is still a lack of exploration on how to build speech interaction models based on open-source LLMs. To address this, we propose LLaMA-Omni, a novel m…

1
S11|E19 WebexOne 2024 Unveiled: Exploring Cisco’s Premier Collaboration Event 32:33

6h ago32:33

32:33

In this episode of Cisco Champion Radio, we dive into all the exciting details of the upcoming Webex One 2024 conference, Cisco's flagship event dedicated to hybrid work collaboration and AI innovation. Join us as we explore the key highlights, from inspirational keynote speeches by Fareed Zakaria and Erik Brynjolfsson to hands-on product demos and…

1
GeoCalib: Learning Single-image Calibration with Geometric Optimization 19:16

2h ago19:16

19:16

From a single image, visual cues can help deduce intrinsic and extrinsic camera parameters like the focal length and the gravity direction. This single-image calibration can benefit various downstream applications like image editing and 3D mapping. Current approaches to this problem are based on either classical geometry with lines and vanishing po…

1
Artificial Immune System of Secure Face Recognition Against Adversarial Attacks 1:10:54

15m ago1:10:54

1:10:54

Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding…

1
rerankers: A Lightweight Python Library to Unify Ranking Methods 15:39

2d ago15:39

15:39

This paper presents rerankers, a Python library which provides an easy-to-use interface to the most commonly used re-ranking approaches. Re-ranking is an integral component of many retrieval pipelines; however, there exist numerous approaches to it, relying on different implementation methods. rerankers unifies these methods into a single user-frie…

1
Automated Design of Agentic Systems 23:55

3d ago23:55

23:55

Researchers are investing substantial effort in developing powerful general-purpose agents, wherein Foundation Models are used as modules within agentic systems (e.g. Chain-of-Thought, Self-Reflection, Toolformer). However, the history of machine learning teaches us that hand-designed solutions are eventually replaced by learned solutions. We formu…

1
Text2SQL is Not Enough: Unifying AI and Databases with TAG 42:53

4d ago42:53

42:53

AI systems that serve natural language questions over databases promise to unlock tremendous value. Such systems would allow users to leverage the powerful reasoning and knowledge capabilities of language models (LMs) alongside the scalable computational power of data management systems. These combined capabilities would empower users to ask arbitr…

1
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders 35:05

8d ago35:05

35:05

The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on resolution-sensitive tasks, such as optical character recognition and document analysis. A number of rec…

1
Sapiens: Foundation for Human Vision Models 25:58

9d ago25:58

25:58

We present Sapiens, a family of models for four fundamental human-centric vision tasks -- 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Our models natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 milli…

1
S11|E18 Modernize Data Center Networks with Cisco Innovations 54:39

10d ago54:39

54:39

In this episode, we delve into the latest innovations in data center networking unveiled at Cisco Live US. Discover how Cisco is revolutionizing data center operations with new products and solutions aimed at simplifying operations, enhancing security, and ensuring a consistent user experience across various infrastructure architectures. Join us as…

1
OctFusion: Octree-based Diffusion Models for 3D Shape Generation 33:00

10d ago33:00

33:00

Diffusion models have emerged as a popular method for 3D generation. However, it is still challenging for diffusion models to efficiently generate diverse and high-quality 3D shapes. In this paper, we introduce OctFusion, which can generate 3D shapes with arbitrary resolutions in 2.5 seconds on a single Nvidia 4090 GPU, and the extracted meshes are…

1
Writing in the Margins: Better Inference Pattern for Long Context Retrieval 29:22

11d ago29:22

29:22

In this paper, we introduce Writing in the Margins (WiM), a new inference pattern for Large Language Models designed to optimize the handling of long input sequences in retrieval-oriented tasks. This approach leverages the chunked prefill of the key-value cache to perform segment-wise inference, which enables efficient processing of extensive conte…

1
Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs 19:53

14d ago19:53

19:53

Recent advancements in Large Language Models (LLMs) have showcased their proficiency in answering natural language queries. However, their effectiveness is hindered by limited domain-specific knowledge, raising concerns about the reliability of their responses. We introduce a hybrid system that augments LLMs with domain-specific knowledge graphs (K…

1
RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation 18:01

15d ago18:01

18:01

Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. However, even the most advanced LLMs face challenges such as hallucinations and real-time updating of their knowledge. Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval…

1
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation 27:28

16d ago27:28

27:28

Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging external knowledge, a comprehensive evaluation of RAG systems is still challenging due to the modular nature of RAG, evaluation of long-form responses and reliability of measurements. In this paper, we propose a fine-grained evaluation framework, RAGChecker, tha…

1
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search 47:39

21d ago47:39

47:39

We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal the…

1
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs 38:53

26d ago38:53

38:53

Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other …

1
ControlNeXt: Powerful and Efficient Control for Image and Video Generation 26:50

27d ago26:50

26:50

Diffusion models have demonstrated remarkable and robust abilities in both image and video generation. To achieve greater control over generated results, researchers introduce additional architectures, such as ControlNet, Adapters and ReferenceNet, to integrate conditioning controls. However, current controllable generation methods often require su…

1
S11|E17 Crystal Clear Audio for Every Call and Meeting with Webex AI Codec

1M ago

—

Hearing and being heard is essential in effective collaboration, but with today’s distributed, hybrid workforce, challenges like background noise and network disruptions can result in poor communication.With Webex, you can have crystal clear interactions despite poor network conditions or while in a noisy environment. The new Webex AI Codec (now av…

1
OpenResearcher: Unleashing AI for Accelerated Scientific Research 29:59

1M ago29:59

29:59

The rapid growth of scientific literature imposes significant challenges for researchers endeavoring to stay updated with the latest advancements in their fields and delve into new areas. We introduce OpenResearcher, an innovative platform that leverages Artificial Intelligence (AI) techniques to accelerate the research process by answering diverse…

1
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation 33:50

1M ago33:50

33:50

While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation. To effectively use LLMs for visual generation, one crucial component is the visual tokenizer that maps pixel-space inputs to discrete tokens appropriate for LLM learning. In this…

1
S11|E CC Unfiltered: Let's Talk AI 42:31

1M ago42:31

42:31

In this episode, we explore the intersection of AI and our daily lives, posing questions that prompt introspection and discussion. Join us as we examine how AI has become seamlessly integrated into our routines, enhancing productivity and convenience.We delve into the myriad of AI tools and applications that have revolutionized our day-to-day activ…

1
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls 41:29

1M ago41:29

41:29

We introduce AnyTool, a large language model agent designed to revolutionize the utilization of a vast array of tools in addressing user queries. We utilize over 16,000 APIs from Rapid API, operating under the assumption that a subset of these APIs could potentially resolve the queries. AnyTool primarily incorporates three elements: an API retrieve…

1
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads 38:55

1M ago38:55

38:55

Large Language Models (LLMs) employ auto-regressive decoding that requires sequential computation, with each step reliant on the previous one's output. This creates a bottleneck as each step necessitates moving the full model parameters from High-Bandwidth Memory (HBM) to the accelerator's cache. While methods such as speculative decoding have been…

1
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders 29:11

1M ago29:11

29:11

Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-…

1
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers 31:47

1M ago31:47

31:47

Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-image (DALL-E and CogView) generation. Its application to video generation is still facing many challenges: The potential huge computation cost makes the training from scratch unaffordable; The scarcity and weak relevance of text-video datasets hinder the model …

1
S11|E16 The Quantum Leap: Networking in the Quantum Era 47:29

1M ago47:29

47:29

Join us for an exciting episode of Cisco Champion Radio as we delve into the fascinating world of quantum networking. In this episode, host Amilee San Juan, Cisco Champions Mark Sibering, Marius Hole, Michael Witte, and Paul Giblin, and Cisco guest Tim Szigeti discuss the transformative potential of quantum communication technologies. We'll explore…

1
S11|E16 The Quantum Leap: Networking in the Quantum Era

1M ago

—

Join us for an exciting episode of Cisco Champion Radio as we delve into the fascinating world of quantum networking. In this episode, host Amilee San Juan, Cisco Champions Mark Sibering, Marius Hole, Michael Witte, and Paul Giblin, and Cisco guest Tim Szigeti discuss the transformative potential of quantum communication technologies.We'll explore …

1
S11|E15 Get Ready, Set, Automate with Meraki Terraform

1M ago

—

Are you a network provider looking to save time? Of course you are!This session will introduce you to the Meraki Terraform Provider, a tool that enables teams and individuals to automate their workflows and manage Cisco Meraki network infrastructure using Terraform.With this provider, you can now define and manage Meraki organizations, networks, de…

1
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher 26:22

1M ago26:22

26:22

Information seeking and integration is a complex cognitive task that consumes enormous time and effort. Inspired by the remarkable progress of Large Language Models, recent works attempt to solve this task by combining LLMs and search engines. However, these methods still obtain unsatisfying performance due to three challenges: (1) complex requests…

1
TVV EP 28 - Romain Bouqueau: Championing Open-Source in the Streaming Industry 1:05:13

1M ago1:05:13

1:05:13

In this episode, Romain Bouqueau, CEO and Founder of Motion Spell gives us a deep look into the contributions of the open-source community in the world of video streaming. Romain also shares his insights into how open-source works, how GPAC/Motion Spell has remained ahead of the curve with its focus on R&D, and how open-source and commercial entiti…

1
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models 34:03

2M ago34:03

34:03

Diffusion models have achieved great progress in image animation due to powerful generative capabilities. However, maintaining spatio-temporal consistency with detailed information from the input static image over time (e.g., style, background, and object of the input static image) and ensuring smoothness in animated video narratives guided by text…

1
FinanceBench: A New Benchmark for Financial Question Answering 41:34

2M ago41:34

41:34

FinanceBench is a first-of-its-kind test suite for evaluating the performance of LLMs on open book financial question answering (QA). It comprises 10,231 questions about publicly traded companies, with corresponding answers and evidence strings. The questions in FinanceBench are ecologically valid and cover a diverse set of scenarios. They are inte…

1
TVV EP 27 - Flavio Ribeiro: The Past, Present & Future of Video Streaming 1:01:37

2M ago1:01:37

1:01:37

Tune in to hear Flavio Ribeiro, Sr. Engineering Manager of Netflix’s Live Streaming Technologies, discuss all things video streaming. Starting in the streets of Campina Grande, Flavio shares his journey from contributing to the recreation of Brazil’s digital television system and working on Globo’s live streaming platform for the 2014 FIFA World Cu…

1
Stable-Hair: Real-World Hair Transfer via Diffusion Model 30:25

2M ago30:25

30:25

Current hair transfer methods struggle to handle diverse and intricate hairstyles, thus limiting their applicability in real-world scenarios. In this paper, we propose a novel diffusion-based hair transfer framework, named \textit{Stable-Hair}, which robustly transfers a wide range of real-world hairstyles onto user-provided faces for virtual hair …

1
The Last Of The Podcast 1:18:12

2M ago1:18:12

1:18:12

It's with a heavy heart we're announcing this, but everything comes to an end and it's no different with Codec Moments. For many reasons we've decided to put the adventure out of its under utilised misery, but it doesn't have to be a mournful thing... we've decided to record The Last Of The Podcast to detail why it's coming to a stop and remind our…

1
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? 31:03

2M ago31:03

31:03

Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI opera…

1
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs 34:06

2M ago34:06

34:06

This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generatio…

1
Patch-Level Training for Large Language Models 24:02

2M ago24:02

24:02

As Large Language Models (LLMs) achieve remarkable progress in language understanding and generation, their training efficiency has become a critical concern. Traditionally, LLMs are trained to predict the next token in a sequence. Despite the success of token-level training, it suffers from considerable computational costs due to the need to proce…

1
S11|E15 Get Ready, Set, Automate with Meraki Terraform 47:08

2M ago47:08

47:08

Are you a network provider looking to save time? Of course you are! This session will introduce you to the Meraki Terraform Provider, a tool that enables teams and individuals to automate their workflows and manage Cisco Meraki network infrastructure using Terraform. With this provider, you can now define and manage Meraki organizations, networks, …

1
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models 35:12

2M ago35:12

35:12

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing. We propose STORM, a writing system f…

1
IMAGDressing-v1: Customizable Virtual Dressing 27:37

2M ago27:37

27:37

Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience. However, existing VTON technologies neglect the need for merchants to showcase garments comprehensively, including flexible control over garments, optional f…

1
A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights 36:34

2M ago36:34

36:34

Human video generation is a dynamic and rapidly evolving task that aims to synthesize 2D human body video sequences with generative models given control conditions such as text, audio, and pose. With the potential for wide-ranging applications in film, gaming, and virtual communication, the ability to generate natural and realistic human video is c…

1
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence 49:58

2M ago49:58

49:58

The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distribute…

1
SEED-Story: Multimodal Long Story Generation with Large Language Model 22:27

2M ago22:27

22:27

With the remarkable advancements in image generation and open-form text generation, the creation of interleaved image-text content has become an increasingly intriguing field. Multimodal story generation, characterized by producing narrative texts and vivid images in an interleaved manner, has emerged as a valuable and practical task with broad app…

1
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models 39:20

2M ago39:20

39:20

While language models (LMs) have shown potential across a range of decision-making tasks, their reliance on simple acting processes limits their broad deployment as autonomous agents. In this paper, we introduce Language Agent Tree Search (LATS) -- the first general framework that synergizes the capabilities of LMs in reasoning, acting, and plannin…

1
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control 39:35

2M ago39:35

39:35

Portrait Animation aims to synthesize a lifelike video from a single source image, using it as an appearance reference, with motion (i.e., facial expressions and head pose) derived from a driving video, audio, text, or generation. Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-b…

Podcasts Worth a Listen

Codec Podcasts

Podcasts Worth a Listen

Quick Reference Guide