All the best in gaming, technology and tenuous quizzes from the codecmoments.com team.
…
continue reading
Entertainment news and review group
…
continue reading
A podcast focusing on the Metal Gear Solid franchise and the fan community of www.reddit.com/r/MetalGearSolid
…
continue reading
Listen to video experts and engineers speak about all things video. From UGC to OTT to Broadcast, we discuss the approaches and algorithms they use to deliver the ultimate video experience, spanning capture, encoding, processing, distribution, streaming, and playback.
…
continue reading
Listen as two experienced programmers go behind the scenes as they develop Briefs.fm, a new service for publishing tiny podcasts.
…
continue reading
Welcome to the Year Of Shame Challenge 6 where 4 hapless individuals, after taking a blood oath, swear never to hit the buy button or open their wallet for a game for a whole 365 days...
…
continue reading
Tastier than a bucket of kittens
…
continue reading
Keeping you up to date with the latest trends and best performing architectures in this fast evolving field in computer science. Selecting papers by comparative results, citations and influence we educate you on the latest research. Consider supporting us on Patreon.com/PapersRead for feedback and ideas.
…
continue reading
Join The Video Insiders hosted by Mark Donnigan and Dror Gill as they wrestle with the hottest topics on the minds of streaming video professionals. Nothing is off limits - video compression, codecs, encoding, transcoding, workflows, technology trends and business models - The Video Insiders and their guests cover it all.
…
continue reading
Into the Zone is a podcast about opposites, and how borders are never as clear as we think. With a novelist’s eye for the unexpected, host Hari Kunzru takes the listener around the world, meeting philosophers and punk musicians, New Age gurus and space explorers, to investigate the gray zone between life and death, public and private, black and white, and more.iHeartMedia is the exclusive podcast partner of Pushkin Industries.
…
continue reading
Get your weekly roundup of technology news with The Two Techies. Released every Saturday, the show goes over some of the most notable and interesting technology stories of the week. Occasionally joined by guests, well known to the technology world!
…
continue reading
Who am I? My name is Stephen Scott and I’m the Founder & CEO of Travel Hub 365 a premium and luxury travel brand that helps consumers plan unique travel experiences around the world. I’ve worked in the travel industry for over 15 years. Where I have worked multiple roles in corporate sales at United Airlines, and then consumer and agency sales with Royal Caribbean International. I’ve gained great perspective on the In's and outs of major travel brands, and am now taking on the journey of bui ...
…
continue reading
1
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
29:24
29:24
Play later
Play later
Lists
Like
Liked
29:24
Recent advancements in audio generation have been significantly propelled by the capabilities of Large Language Models (LLMs). The existing research on audio LLM has primarily focused on enhancing the architecture and scale of audio language models, as well as leveraging larger datasets, and generally, acoustic codecs, such as EnCodec, are used for…
…
continue reading
1
S11|E17 Crystal Clear Audio for Every Call and Meeting with Webex AI Codec
39:11
39:11
Play later
Play later
Lists
Like
Liked
39:11
Hearing and being heard is essential in effective collaboration, but with today’s distributed, hybrid workforce, challenges like background noise and network disruptions can result in poor communication. With Webex, you can have crystal clear interactions despite poor network conditions or while in a noisy environment. The new Webex AI Codec (now a…
…
continue reading
1
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
32:15
32:15
Play later
Play later
Lists
Like
Liked
32:15
Models like GPT-4o enable real-time interaction with large language models (LLMs) through speech, significantly enhancing user experience compared to traditional text-based interaction. However, there is still a lack of exploration on how to build speech interaction models based on open-source LLMs. To address this, we propose LLaMA-Omni, a novel m…
…
continue reading
1
S11|E19 WebexOne 2024 Unveiled: Exploring Cisco’s Premier Collaboration Event
32:33
32:33
Play later
Play later
Lists
Like
Liked
32:33
In this episode of Cisco Champion Radio, we dive into all the exciting details of the upcoming Webex One 2024 conference, Cisco's flagship event dedicated to hybrid work collaboration and AI innovation. Join us as we explore the key highlights, from inspirational keynote speeches by Fareed Zakaria and Erik Brynjolfsson to hands-on product demos and…
…
continue reading
1
GeoCalib: Learning Single-image Calibration with Geometric Optimization
19:16
19:16
Play later
Play later
Lists
Like
Liked
19:16
From a single image, visual cues can help deduce intrinsic and extrinsic camera parameters like the focal length and the gravity direction. This single-image calibration can benefit various downstream applications like image editing and 3D mapping. Current approaches to this problem are based on either classical geometry with lines and vanishing po…
…
continue reading
1
Artificial Immune System of Secure Face Recognition Against Adversarial Attacks
1:10:54
1:10:54
Play later
Play later
Lists
Like
Liked
1:10:54
Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding…
…
continue reading
1
rerankers: A Lightweight Python Library to Unify Ranking Methods
15:39
15:39
Play later
Play later
Lists
Like
Liked
15:39
This paper presents rerankers, a Python library which provides an easy-to-use interface to the most commonly used re-ranking approaches. Re-ranking is an integral component of many retrieval pipelines; however, there exist numerous approaches to it, relying on different implementation methods. rerankers unifies these methods into a single user-frie…
…
continue reading
Researchers are investing substantial effort in developing powerful general-purpose agents, wherein Foundation Models are used as modules within agentic systems (e.g. Chain-of-Thought, Self-Reflection, Toolformer). However, the history of machine learning teaches us that hand-designed solutions are eventually replaced by learned solutions. We formu…
…
continue reading
1
Text2SQL is Not Enough: Unifying AI and Databases with TAG
42:53
42:53
Play later
Play later
Lists
Like
Liked
42:53
AI systems that serve natural language questions over databases promise to unlock tremendous value. Such systems would allow users to leverage the powerful reasoning and knowledge capabilities of language models (LMs) alongside the scalable computational power of data management systems. These combined capabilities would empower users to ask arbitr…
…
continue reading
1
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
35:05
35:05
Play later
Play later
Lists
Like
Liked
35:05
The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on resolution-sensitive tasks, such as optical character recognition and document analysis. A number of rec…
…
continue reading
1
Sapiens: Foundation for Human Vision Models
25:58
25:58
Play later
Play later
Lists
Like
Liked
25:58
We present Sapiens, a family of models for four fundamental human-centric vision tasks -- 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Our models natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 milli…
…
continue reading
1
S11|E18 Modernize Data Center Networks with Cisco Innovations
54:39
54:39
Play later
Play later
Lists
Like
Liked
54:39
In this episode, we delve into the latest innovations in data center networking unveiled at Cisco Live US. Discover how Cisco is revolutionizing data center operations with new products and solutions aimed at simplifying operations, enhancing security, and ensuring a consistent user experience across various infrastructure architectures. Join us as…
…
continue reading
1
OctFusion: Octree-based Diffusion Models for 3D Shape Generation
33:00
33:00
Play later
Play later
Lists
Like
Liked
33:00
Diffusion models have emerged as a popular method for 3D generation. However, it is still challenging for diffusion models to efficiently generate diverse and high-quality 3D shapes. In this paper, we introduce OctFusion, which can generate 3D shapes with arbitrary resolutions in 2.5 seconds on a single Nvidia 4090 GPU, and the extracted meshes are…
…
continue reading
1
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
29:22
29:22
Play later
Play later
Lists
Like
Liked
29:22
In this paper, we introduce Writing in the Margins (WiM), a new inference pattern for Large Language Models designed to optimize the handling of long input sequences in retrieval-oriented tasks. This approach leverages the chunked prefill of the key-value cache to perform segment-wise inference, which enables efficient processing of extensive conte…
…
continue reading
1
Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs
19:53
19:53
Play later
Play later
Lists
Like
Liked
19:53
Recent advancements in Large Language Models (LLMs) have showcased their proficiency in answering natural language queries. However, their effectiveness is hindered by limited domain-specific knowledge, raising concerns about the reliability of their responses. We introduce a hybrid system that augments LLMs with domain-specific knowledge graphs (K…
…
continue reading
1
RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation
18:01
18:01
Play later
Play later
Lists
Like
Liked
18:01
Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. However, even the most advanced LLMs face challenges such as hallucinations and real-time updating of their knowledge. Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval…
…
continue reading
1
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation
27:28
27:28
Play later
Play later
Lists
Like
Liked
27:28
Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging external knowledge, a comprehensive evaluation of RAG systems is still challenging due to the modular nature of RAG, evaluation of long-form responses and reliability of measurements. In this paper, we propose a fine-grained evaluation framework, RAGChecker, tha…
…
continue reading
1
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
47:39
47:39
Play later
Play later
Lists
Like
Liked
47:39
We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal the…
…
continue reading
1
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
38:53
38:53
Play later
Play later
Lists
Like
Liked
38:53
Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other …
…
continue reading
1
ControlNeXt: Powerful and Efficient Control for Image and Video Generation
26:50
26:50
Play later
Play later
Lists
Like
Liked
26:50
Diffusion models have demonstrated remarkable and robust abilities in both image and video generation. To achieve greater control over generated results, researchers introduce additional architectures, such as ControlNet, Adapters and ReferenceNet, to integrate conditioning controls. However, current controllable generation methods often require su…
…
continue reading
Hearing and being heard is essential in effective collaboration, but with today’s distributed, hybrid workforce, challenges like background noise and network disruptions can result in poor communication.With Webex, you can have crystal clear interactions despite poor network conditions or while in a noisy environment. The new Webex AI Codec (now av…
…
continue reading
1
OpenResearcher: Unleashing AI for Accelerated Scientific Research
29:59
29:59
Play later
Play later
Lists
Like
Liked
29:59
The rapid growth of scientific literature imposes significant challenges for researchers endeavoring to stay updated with the latest advancements in their fields and delve into new areas. We introduce OpenResearcher, an innovative platform that leverages Artificial Intelligence (AI) techniques to accelerate the research process by answering diverse…
…
continue reading
1
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
33:50
33:50
Play later
Play later
Lists
Like
Liked
33:50
While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation. To effectively use LLMs for visual generation, one crucial component is the visual tokenizer that maps pixel-space inputs to discrete tokens appropriate for LLM learning. In this…
…
continue reading
In this episode, we explore the intersection of AI and our daily lives, posing questions that prompt introspection and discussion. Join us as we examine how AI has become seamlessly integrated into our routines, enhancing productivity and convenience.We delve into the myriad of AI tools and applications that have revolutionized our day-to-day activ…
…
continue reading
1
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
41:29
41:29
Play later
Play later
Lists
Like
Liked
41:29
We introduce AnyTool, a large language model agent designed to revolutionize the utilization of a vast array of tools in addressing user queries. We utilize over 16,000 APIs from Rapid API, operating under the assumption that a subset of these APIs could potentially resolve the queries. AnyTool primarily incorporates three elements: an API retrieve…
…
continue reading
1
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
38:55
38:55
Play later
Play later
Lists
Like
Liked
38:55
Large Language Models (LLMs) employ auto-regressive decoding that requires sequential computation, with each step reliant on the previous one's output. This creates a bottleneck as each step necessitates moving the full model parameters from High-Bandwidth Memory (HBM) to the accelerator's cache. While methods such as speculative decoding have been…
…
continue reading
1
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
29:11
29:11
Play later
Play later
Lists
Like
Liked
29:11
Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-…
…
continue reading
1
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
31:47
31:47
Play later
Play later
Lists
Like
Liked
31:47
Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-image (DALL-E and CogView) generation. Its application to video generation is still facing many challenges: The potential huge computation cost makes the training from scratch unaffordable; The scarcity and weak relevance of text-video datasets hinder the model …
…
continue reading
1
S11|E16 The Quantum Leap: Networking in the Quantum Era
47:29
47:29
Play later
Play later
Lists
Like
Liked
47:29
Join us for an exciting episode of Cisco Champion Radio as we delve into the fascinating world of quantum networking. In this episode, host Amilee San Juan, Cisco Champions Mark Sibering, Marius Hole, Michael Witte, and Paul Giblin, and Cisco guest Tim Szigeti discuss the transformative potential of quantum communication technologies. We'll explore…
…
continue reading
Join us for an exciting episode of Cisco Champion Radio as we delve into the fascinating world of quantum networking. In this episode, host Amilee San Juan, Cisco Champions Mark Sibering, Marius Hole, Michael Witte, and Paul Giblin, and Cisco guest Tim Szigeti discuss the transformative potential of quantum communication technologies.We'll explore …
…
continue reading
Are you a network provider looking to save time? Of course you are!This session will introduce you to the Meraki Terraform Provider, a tool that enables teams and individuals to automate their workflows and manage Cisco Meraki network infrastructure using Terraform.With this provider, you can now define and manage Meraki organizations, networks, de…
…
continue reading
1
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
26:22
26:22
Play later
Play later
Lists
Like
Liked
26:22
Information seeking and integration is a complex cognitive task that consumes enormous time and effort. Inspired by the remarkable progress of Large Language Models, recent works attempt to solve this task by combining LLMs and search engines. However, these methods still obtain unsatisfying performance due to three challenges: (1) complex requests…
…
continue reading
1
TVV EP 28 - Romain Bouqueau: Championing Open-Source in the Streaming Industry
1:05:13
1:05:13
Play later
Play later
Lists
Like
Liked
1:05:13
In this episode, Romain Bouqueau, CEO and Founder of Motion Spell gives us a deep look into the contributions of the open-source community in the world of video streaming. Romain also shares his insights into how open-source works, how GPAC/Motion Spell has remained ahead of the curve with its focus on R&D, and how open-source and commercial entiti…
…
continue reading
1
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
34:03
34:03
Play later
Play later
Lists
Like
Liked
34:03
Diffusion models have achieved great progress in image animation due to powerful generative capabilities. However, maintaining spatio-temporal consistency with detailed information from the input static image over time (e.g., style, background, and object of the input static image) and ensuring smoothness in animated video narratives guided by text…
…
continue reading
1
FinanceBench: A New Benchmark for Financial Question Answering
41:34
41:34
Play later
Play later
Lists
Like
Liked
41:34
FinanceBench is a first-of-its-kind test suite for evaluating the performance of LLMs on open book financial question answering (QA). It comprises 10,231 questions about publicly traded companies, with corresponding answers and evidence strings. The questions in FinanceBench are ecologically valid and cover a diverse set of scenarios. They are inte…
…
continue reading
1
TVV EP 27 - Flavio Ribeiro: The Past, Present & Future of Video Streaming
1:01:37
1:01:37
Play later
Play later
Lists
Like
Liked
1:01:37
Tune in to hear Flavio Ribeiro, Sr. Engineering Manager of Netflix’s Live Streaming Technologies, discuss all things video streaming. Starting in the streets of Campina Grande, Flavio shares his journey from contributing to the recreation of Brazil’s digital television system and working on Globo’s live streaming platform for the 2014 FIFA World Cu…
…
continue reading
1
Stable-Hair: Real-World Hair Transfer via Diffusion Model
30:25
30:25
Play later
Play later
Lists
Like
Liked
30:25
Current hair transfer methods struggle to handle diverse and intricate hairstyles, thus limiting their applicability in real-world scenarios. In this paper, we propose a novel diffusion-based hair transfer framework, named \textit{Stable-Hair}, which robustly transfers a wide range of real-world hairstyles onto user-provided faces for virtual hair …
…
continue reading
It's with a heavy heart we're announcing this, but everything comes to an end and it's no different with Codec Moments. For many reasons we've decided to put the adventure out of its under utilised misery, but it doesn't have to be a mournful thing... we've decided to record The Last Of The Podcast to detail why it's coming to a stop and remind our…
…
continue reading
1
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
31:03
31:03
Play later
Play later
Lists
Like
Liked
31:03
Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI opera…
…
continue reading
1
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
34:06
34:06
Play later
Play later
Lists
Like
Liked
34:06
This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generatio…
…
continue reading
1
Patch-Level Training for Large Language Models
24:02
24:02
Play later
Play later
Lists
Like
Liked
24:02
As Large Language Models (LLMs) achieve remarkable progress in language understanding and generation, their training efficiency has become a critical concern. Traditionally, LLMs are trained to predict the next token in a sequence. Despite the success of token-level training, it suffers from considerable computational costs due to the need to proce…
…
continue reading
1
S11|E15 Get Ready, Set, Automate with Meraki Terraform
47:08
47:08
Play later
Play later
Lists
Like
Liked
47:08
Are you a network provider looking to save time? Of course you are! This session will introduce you to the Meraki Terraform Provider, a tool that enables teams and individuals to automate their workflows and manage Cisco Meraki network infrastructure using Terraform. With this provider, you can now define and manage Meraki organizations, networks, …
…
continue reading
1
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
35:12
35:12
Play later
Play later
Lists
Like
Liked
35:12
We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing. We propose STORM, a writing system f…
…
continue reading
1
IMAGDressing-v1: Customizable Virtual Dressing
27:37
27:37
Play later
Play later
Lists
Like
Liked
27:37
Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience. However, existing VTON technologies neglect the need for merchants to showcase garments comprehensively, including flexible control over garments, optional f…
…
continue reading
1
A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights
36:34
36:34
Play later
Play later
Lists
Like
Liked
36:34
Human video generation is a dynamic and rapidly evolving task that aims to synthesize 2D human body video sequences with generative models given control conditions such as text, audio, and pose. With the potential for wide-ranging applications in film, gaming, and virtual communication, the ability to generate natural and realistic human video is c…
…
continue reading
1
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
49:58
49:58
Play later
Play later
Lists
Like
Liked
49:58
The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distribute…
…
continue reading
1
SEED-Story: Multimodal Long Story Generation with Large Language Model
22:27
22:27
Play later
Play later
Lists
Like
Liked
22:27
With the remarkable advancements in image generation and open-form text generation, the creation of interleaved image-text content has become an increasingly intriguing field. Multimodal story generation, characterized by producing narrative texts and vivid images in an interleaved manner, has emerged as a valuable and practical task with broad app…
…
continue reading
1
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
39:20
39:20
Play later
Play later
Lists
Like
Liked
39:20
While language models (LMs) have shown potential across a range of decision-making tasks, their reliance on simple acting processes limits their broad deployment as autonomous agents. In this paper, we introduce Language Agent Tree Search (LATS) -- the first general framework that synergizes the capabilities of LMs in reasoning, acting, and plannin…
…
continue reading
1
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
39:35
39:35
Play later
Play later
Lists
Like
Liked
39:35
Portrait Animation aims to synthesize a lifelike video from a single source image, using it as an appearance reference, with motion (i.e., facial expressions and head pose) derived from a driving video, audio, text, or generation. Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-b…
…
continue reading