show episodes
 
Listen to video experts and engineers speak about all things video. From UGC to OTT to Broadcast, we discuss the approaches and algorithms they use to deliver the ultimate video experience, spanning capture, encoding, processing, distribution, streaming, and playback.
  continue reading
 
Artwork

1
Year Of Shame Challenge 6

Lapsed Gamer Radio, Codec Moments & Film Guff

Unsubscribe
Unsubscribe
Monthly
 
Welcome to the Year Of Shame Challenge 6 where 4 hapless individuals, after taking a blood oath, swear never to hit the buy button or open their wallet for a game for a whole 365 days...
  continue reading
 
Keeping you up to date with the latest trends and best performing architectures in this fast evolving field in computer science. Selecting papers by comparative results, citations and influence we educate you on the latest research. Consider supporting us on Patreon.com/PapersRead for feedback and ideas.
  continue reading
 
Artwork
 
Join The Video Insiders hosted by Mark Donnigan and Dror Gill as they wrestle with the hottest topics on the minds of streaming video professionals. Nothing is off limits - video compression, codecs, encoding, transcoding, workflows, technology trends and business models - The Video Insiders and their guests cover it all.
  continue reading
 
Artwork

1
Into the Zone

Pushkin Industries

Unsubscribe
Unsubscribe
Monthly
 
Into the Zone is a podcast about opposites, and how borders are never as clear as we think. With a novelist’s eye for the unexpected, host Hari Kunzru takes the listener around the world, meeting philosophers and punk musicians, New Age gurus and space explorers, to investigate the gray zone between life and death, public and private, black and white, and more.iHeartMedia is the exclusive podcast partner of Pushkin Industries.
  continue reading
 
Who am I? My name is Stephen Scott and I’m the Founder & CEO of Travel Hub 365 a premium and luxury travel brand that helps consumers plan unique travel experiences around the world. I’ve worked in the travel industry for over 15 years. Where I have worked multiple roles in corporate sales at United Airlines, and then consumer and agency sales with Royal Caribbean International. I’ve gained great perspective on the In's and outs of major travel brands, and am now taking on the journey of bui ...
  continue reading
 
Loading …
show series
 
Recent advancements in audio generation have been significantly propelled by the capabilities of Large Language Models (LLMs). The existing research on audio LLM has primarily focused on enhancing the architecture and scale of audio language models, as well as leveraging larger datasets, and generally, acoustic codecs, such as EnCodec, are used for…
  continue reading
 
Hearing and being heard is essential in effective collaboration, but with today’s distributed, hybrid workforce, challenges like background noise and network disruptions can result in poor communication. With Webex, you can have crystal clear interactions despite poor network conditions or while in a noisy environment. The new Webex AI Codec (now a…
  continue reading
 
Models like GPT-4o enable real-time interaction with large language models (LLMs) through speech, significantly enhancing user experience compared to traditional text-based interaction. However, there is still a lack of exploration on how to build speech interaction models based on open-source LLMs. To address this, we propose LLaMA-Omni, a novel m…
  continue reading
 
In this episode of Cisco Champion Radio, we dive into all the exciting details of the upcoming Webex One 2024 conference, Cisco's flagship event dedicated to hybrid work collaboration and AI innovation. Join us as we explore the key highlights, from inspirational keynote speeches by Fareed Zakaria and Erik Brynjolfsson to hands-on product demos and…
  continue reading
 
From a single image, visual cues can help deduce intrinsic and extrinsic camera parameters like the focal length and the gravity direction. This single-image calibration can benefit various downstream applications like image editing and 3D mapping. Current approaches to this problem are based on either classical geometry with lines and vanishing po…
  continue reading
 
Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding…
  continue reading
 
This paper presents rerankers, a Python library which provides an easy-to-use interface to the most commonly used re-ranking approaches. Re-ranking is an integral component of many retrieval pipelines; however, there exist numerous approaches to it, relying on different implementation methods. rerankers unifies these methods into a single user-frie…
  continue reading
 
Researchers are investing substantial effort in developing powerful general-purpose agents, wherein Foundation Models are used as modules within agentic systems (e.g. Chain-of-Thought, Self-Reflection, Toolformer). However, the history of machine learning teaches us that hand-designed solutions are eventually replaced by learned solutions. We formu…
  continue reading
 
AI systems that serve natural language questions over databases promise to unlock tremendous value. Such systems would allow users to leverage the powerful reasoning and knowledge capabilities of language models (LMs) alongside the scalable computational power of data management systems. These combined capabilities would empower users to ask arbitr…
  continue reading
 
The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on resolution-sensitive tasks, such as optical character recognition and document analysis. A number of rec…
  continue reading
 
We present Sapiens, a family of models for four fundamental human-centric vision tasks -- 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Our models natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 milli…
  continue reading
 
In this episode, we delve into the latest innovations in data center networking unveiled at Cisco Live US. Discover how Cisco is revolutionizing data center operations with new products and solutions aimed at simplifying operations, enhancing security, and ensuring a consistent user experience across various infrastructure architectures. Join us as…
  continue reading
 
Diffusion models have emerged as a popular method for 3D generation. However, it is still challenging for diffusion models to efficiently generate diverse and high-quality 3D shapes. In this paper, we introduce OctFusion, which can generate 3D shapes with arbitrary resolutions in 2.5 seconds on a single Nvidia 4090 GPU, and the extracted meshes are…
  continue reading
 
In this paper, we introduce Writing in the Margins (WiM), a new inference pattern for Large Language Models designed to optimize the handling of long input sequences in retrieval-oriented tasks. This approach leverages the chunked prefill of the key-value cache to perform segment-wise inference, which enables efficient processing of extensive conte…
  continue reading
 
Recent advancements in Large Language Models (LLMs) have showcased their proficiency in answering natural language queries. However, their effectiveness is hindered by limited domain-specific knowledge, raising concerns about the reliability of their responses. We introduce a hybrid system that augments LLMs with domain-specific knowledge graphs (K…
  continue reading
 
Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. However, even the most advanced LLMs face challenges such as hallucinations and real-time updating of their knowledge. Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval…
  continue reading
 
Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging external knowledge, a comprehensive evaluation of RAG systems is still challenging due to the modular nature of RAG, evaluation of long-form responses and reliability of measurements. In this paper, we propose a fine-grained evaluation framework, RAGChecker, tha…
  continue reading
 
We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal the…
  continue reading
 
Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other …
  continue reading
 
Diffusion models have demonstrated remarkable and robust abilities in both image and video generation. To achieve greater control over generated results, researchers introduce additional architectures, such as ControlNet, Adapters and ReferenceNet, to integrate conditioning controls. However, current controllable generation methods often require su…
  continue reading
 
Hearing and being heard is essential in effective collaboration, but with today’s distributed, hybrid workforce, challenges like background noise and network disruptions can result in poor communication.With Webex, you can have crystal clear interactions despite poor network conditions or while in a noisy environment. The new Webex AI Codec (now av…
  continue reading
 
The rapid growth of scientific literature imposes significant challenges for researchers endeavoring to stay updated with the latest advancements in their fields and delve into new areas. We introduce OpenResearcher, an innovative platform that leverages Artificial Intelligence (AI) techniques to accelerate the research process by answering diverse…
  continue reading
 
While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation. To effectively use LLMs for visual generation, one crucial component is the visual tokenizer that maps pixel-space inputs to discrete tokens appropriate for LLM learning. In this…
  continue reading
 
In this episode, we explore the intersection of AI and our daily lives, posing questions that prompt introspection and discussion. Join us as we examine how AI has become seamlessly integrated into our routines, enhancing productivity and convenience.We delve into the myriad of AI tools and applications that have revolutionized our day-to-day activ…
  continue reading
 
We introduce AnyTool, a large language model agent designed to revolutionize the utilization of a vast array of tools in addressing user queries. We utilize over 16,000 APIs from Rapid API, operating under the assumption that a subset of these APIs could potentially resolve the queries. AnyTool primarily incorporates three elements: an API retrieve…
  continue reading
 
Large Language Models (LLMs) employ auto-regressive decoding that requires sequential computation, with each step reliant on the previous one's output. This creates a bottleneck as each step necessitates moving the full model parameters from High-Bandwidth Memory (HBM) to the accelerator's cache. While methods such as speculative decoding have been…
  continue reading
 
Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-…
  continue reading
 
Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-image (DALL-E and CogView) generation. Its application to video generation is still facing many challenges: The potential huge computation cost makes the training from scratch unaffordable; The scarcity and weak relevance of text-video datasets hinder the model …
  continue reading
 
Join us for an exciting episode of Cisco Champion Radio as we delve into the fascinating world of quantum networking. In this episode, host Amilee San Juan, Cisco Champions Mark Sibering, Marius Hole, Michael Witte, and Paul Giblin, and Cisco guest Tim Szigeti discuss the transformative potential of quantum communication technologies. We'll explore…
  continue reading
 
Join us for an exciting episode of Cisco Champion Radio as we delve into the fascinating world of quantum networking. In this episode, host Amilee San Juan, Cisco Champions Mark Sibering, Marius Hole, Michael Witte, and Paul Giblin, and Cisco guest Tim Szigeti discuss the transformative potential of quantum communication technologies.We'll explore …
  continue reading
 
Are you a network provider looking to save time? Of course you are!This session will introduce you to the Meraki Terraform Provider, a tool that enables teams and individuals to automate their workflows and manage Cisco Meraki network infrastructure using Terraform.With this provider, you can now define and manage Meraki organizations, networks, de…
  continue reading
 
Information seeking and integration is a complex cognitive task that consumes enormous time and effort. Inspired by the remarkable progress of Large Language Models, recent works attempt to solve this task by combining LLMs and search engines. However, these methods still obtain unsatisfying performance due to three challenges: (1) complex requests…
  continue reading
 
In this episode, Romain Bouqueau, CEO and Founder of Motion Spell gives us a deep look into the contributions of the open-source community in the world of video streaming. Romain also shares his insights into how open-source works, how GPAC/Motion Spell has remained ahead of the curve with its focus on R&D, and how open-source and commercial entiti…
  continue reading
 
Diffusion models have achieved great progress in image animation due to powerful generative capabilities. However, maintaining spatio-temporal consistency with detailed information from the input static image over time (e.g., style, background, and object of the input static image) and ensuring smoothness in animated video narratives guided by text…
  continue reading
 
FinanceBench is a first-of-its-kind test suite for evaluating the performance of LLMs on open book financial question answering (QA). It comprises 10,231 questions about publicly traded companies, with corresponding answers and evidence strings. The questions in FinanceBench are ecologically valid and cover a diverse set of scenarios. They are inte…
  continue reading
 
Tune in to hear Flavio Ribeiro, Sr. Engineering Manager of Netflix’s Live Streaming Technologies, discuss all things video streaming. Starting in the streets of Campina Grande, Flavio shares his journey from contributing to the recreation of Brazil’s digital television system and working on Globo’s live streaming platform for the 2014 FIFA World Cu…
  continue reading
 
Current hair transfer methods struggle to handle diverse and intricate hairstyles, thus limiting their applicability in real-world scenarios. In this paper, we propose a novel diffusion-based hair transfer framework, named \textit{Stable-Hair}, which robustly transfers a wide range of real-world hairstyles onto user-provided faces for virtual hair …
  continue reading
 
It's with a heavy heart we're announcing this, but everything comes to an end and it's no different with Codec Moments. For many reasons we've decided to put the adventure out of its under utilised misery, but it doesn't have to be a mournful thing... we've decided to record The Last Of The Podcast to detail why it's coming to a stop and remind our…
  continue reading
 
Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI opera…
  continue reading
 
This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generatio…
  continue reading
 
As Large Language Models (LLMs) achieve remarkable progress in language understanding and generation, their training efficiency has become a critical concern. Traditionally, LLMs are trained to predict the next token in a sequence. Despite the success of token-level training, it suffers from considerable computational costs due to the need to proce…
  continue reading
 
Are you a network provider looking to save time? Of course you are! This session will introduce you to the Meraki Terraform Provider, a tool that enables teams and individuals to automate their workflows and manage Cisco Meraki network infrastructure using Terraform. With this provider, you can now define and manage Meraki organizations, networks, …
  continue reading
 
We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing. We propose STORM, a writing system f…
  continue reading
 
Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience. However, existing VTON technologies neglect the need for merchants to showcase garments comprehensively, including flexible control over garments, optional f…
  continue reading
 
Human video generation is a dynamic and rapidly evolving task that aims to synthesize 2D human body video sequences with generative models given control conditions such as text, audio, and pose. With the potential for wide-ranging applications in film, gaming, and virtual communication, the ability to generate natural and realistic human video is c…
  continue reading
 
The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distribute…
  continue reading
 
With the remarkable advancements in image generation and open-form text generation, the creation of interleaved image-text content has become an increasingly intriguing field. Multimodal story generation, characterized by producing narrative texts and vivid images in an interleaved manner, has emerged as a valuable and practical task with broad app…
  continue reading
 
While language models (LMs) have shown potential across a range of decision-making tasks, their reliance on simple acting processes limits their broad deployment as autonomous agents. In this paper, we introduce Language Agent Tree Search (LATS) -- the first general framework that synergizes the capabilities of LMs in reasoning, acting, and plannin…
  continue reading
 
Portrait Animation aims to synthesize a lifelike video from a single source image, using it as an appearance reference, with motion (i.e., facial expressions and head pose) derived from a driving video, audio, text, or generation. Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-b…
  continue reading
 
Loading …

Quick Reference Guide