show episodes
 
Artwork
 
On the In Culture podcast, we go behind the scenes with artists, gamers, musicians, designers, athletes, and visionaries in their fields to share a real-world look at how they’re shaping culture. In our latest podcast series, Variations on a theme, we explore the life and legacy of Sol LeWitt. We’ll cover key themes in LeWitt’s work and explore how his approach still influences some of the creative pioneers shaping the 21st century. Variations on a theme is a companion to the Sol LeWitt App, ...
  continue reading
 
A daily update on the latest AI Research Papers. We provide a high level overview of a handful of papers each day and will link all papers in the description for further reading. This podcast is created entirely with AI by PocketPod. Head over to https://pocketpod.app to learn more.
  continue reading
 
Artwork
 
An original podcast from T-Mobile for Business and iHeartRadio, Jonathan Strickland connects with the world’s most unconventional thinkers, the leaders at the intersection of technology and business, to understand how they continue to thrive in a world of complex organizations and lightning-fast technology. How do these executives innovate and enable change, both inside and outside their companies, and what are they looking forward to tackling next? Let’s find out…
  continue reading
 
The FaithTech Podcast aims to bring the world’s leading thinkers and communicators in the ecosystem of faith and technology together for your learning. FaithTech is a community of Christians in tech who come together once a month all around the world to meet, learn, and build projects together. If you are a Christian in tech, we want to see you join our community! Just head over to our website to learn more about how you can get plugged in https://faithtech.com
  continue reading
 
Loading …
show series
 
Scaling Laws with Vocabulary: Larger Models Deserve Larger VocabulariesScaling Retrieval-Based Language Models with a Trillion-Token DatastoreShape of Motion: 4D Reconstruction from a Single VideoStreetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video DiffusionUnderstanding Reference Policies in Direct Preference Opti…
  continue reading
 
Qwen2 Technical ReportLearning to Refuse: Towards Mitigating Privacy Risks in LLMsThe Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-DeterminismQ-Sparse: All Large Language Models can be Fully Sparsely-ActivatedGRUtopia: Dream General Robots in a City at Scale
  continue reading
 
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes OnVideo Diffusion Alignment via Reward GradientsMultimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language ModelQ-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank GradientsMAVIS: Math…
  continue reading
 
Unveiling Encoder-Free Vision-Language ModelsFunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMsAriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM AgentsRULE: Reliable Multimodal RAG for Factuality in Medical Vision Language ModelsChartGemma: Visual Instruction-…
  continue reading
 
Diffusion Forcing: Next-token Prediction Meets Full-Sequence DiffusionLet the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language ModelsPlanetarium: A Rigorous Benchmark for Translating Text to Structured Planning LanguagesInternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Co…
  continue reading
 
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoningMMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient EvaluationLiteSearch: Efficacious Tree Search for LLMWavelets Are All You Need for Autoregressive Image…
  continue reading
 
Scaling Synthetic Data Creation with 1,000,000,000 PersonasHuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at ScaleLLaRA: Supercharging Robot Learning Data for Vision-Language PolicyDirect Preference Knowledge Distillation for Large Language ModelsGaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enh…
  continue reading
 
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and UnderstandingStep-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMsMUMU: Bootstrapping Multimodal Image Generation from Text-to-Image DataSimulating Classroom Education with LLM-Empowered AgentsSeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval …
  continue reading
 
The FineWeb Datasets: Decanting the Web for the Finest Text Data at ScaleYouDream: Generating Anatomically Controllable Consistent Text-to-3D AnimalsDiffusionPDE: Generative PDE-Solving Under Partial ObservationAligning Diffusion Models with Noise-Conditioned PerceptionUnlocking Continual Learning Abilities in Language Models…
  continue reading
 
DreamBench++: A Human-Aligned Benchmark for Personalized Image GenerationBigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex InstructionsCambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMsEvaluating D-MERIT of Partial-annotation on Information RetrievalLong Context Transfer from Language to Vision…
  continue reading
 
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMsJudging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-JudgesComplexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a TaskTowards Retrieval Augmented Generation over Large Video LibrariesStylebreeder: Exploring …
  continue reading
 
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement LearningMake It Count: Text-to-Image Generation with an Accurate Number of ObjectsChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code GenerationNeedle In A Multimodal HaystackBABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Hay…
  continue reading
 
Depth Anything V2An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual PixelsTransformers meet Neural Algorithmic ReasonersSamba: Simple Hybrid State Space Models for Efficient Unlimited Context Language ModelingOpenVLA: An Open-Source Vision-Language-Action ModelAlleviating Distortion in Image Generation via Multi-Resolut…
  continue reading
 
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video EditingMotionClone: Training-Free Motion Cloning for Controllable Video GenerationWhat If We Recaption Billions of Web Images with LLaMA-3?Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with NothingPowerInfer-2: Fast Large Language Model I…
  continue reading
 
An Image is Worth 32 Tokens for Reconstruction and GenerationMcEval: Massively Multilingual Code EvaluationZero-shot Image Editing with Reference ImitationThe Prompt Report: A Systematic Survey of Prompting TechniquesTextGrad: Automatic "Differentiation" via Text
  continue reading
 
Autoregressive Model Beats Diffusion: Llama for Scalable Image GenerationHusky: A Unified, Open-Source Language Agent for Multi-Step ReasoningVript: A Video Is Worth Thousands of WordsLighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View SynthesisVALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text …
  continue reading
 
ShareGPT4Video: Improving Video Understanding and Generation with Better CaptionsBitsFusion: 1.99 bits Weight Quantization of Diffusion ModelStep-aware Preference Optimization: Aligning Preference with Denoising Performance at Each StepBuffer of Thoughts: Thought-Augmented Reasoning with Large Language ModelsSF-V: Single Forward Video Generation Mo…
  continue reading
 
Apple announced new Siri features and Apple Intelligence today, Interestingly, Apple already released a paper, titled "Ferret-UI," on how it all works - a multimodal vision-language model capable of understanding widgets, icons, and text on an iOS mobile screen, and reasoning about their spatial relationships and functional meanings. https://arxiv.…
  continue reading
 
Block Transformer: Global-to-Local Language Modeling for Fast InferenceParrot: Multilingual Visual Instruction TuningMobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent CollaborationOuroboros3D: Image-to-3D Generation via 3D-aware Recursive DiffusionLiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autore…
  continue reading
 
Seed-TTS: A Family of High-Quality Versatile Speech Generation ModelsTo Believe or Not to Believe Your LLMI4VGen: Image as Stepping Stone for Text-to-Video GenerationSelf-Improving Robust Preference OptimizationGuiding a Diffusion Model with a Bad Version of Itself
  continue reading
 
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding BenchmarkLearning Temporally Consistent Video Depth from Video Diffusion PriorsShow, Don't Tell: Aligning Language Models with Demonstrated FeedbackArtificial Generational Intelligence: Cultural Accumulation in Reinforcement LearningZeroSmooth: Training-free Diffuser Adaptati…
  continue reading
 
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space DualityVideo-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video AnalysisPerplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference ModelsKaleido Diffusion: Improving Conditional Diffusion Models with Au…
  continue reading
 
AI Papers Podcast for 06/04/2024 DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music GenerationGECO: Generative Image-to-3D within a SECOndPLA4D: Pixel-Level Alignments for Text-to-4D Gaussian SplattingDevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code RepositoriesParrot: Efficient Serving of LLM-b…
  continue reading
 
AI Papers Podcast for 06/03/2024 Jina CLIP: Your CLIP Model Is Also Your Text RetrieverSimilarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered ThoughtsMotionLLM: Understanding Human Behaviors from Human Motions and VideosXwin-LM: Strong and Scalable Alignment Practice for LLMsMOFA-Video: Controllable Image Animati…
  continue reading
 
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model SeriesT2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward FeedbackLLMs achieve adult human performance on higher-order theory of mind tasksNearest Neighbor Speculative Decoding for LLM Generation and AttributionZipper: A Multi-Tower Decoder Ar…
  continue reading
 
An Introduction to Vision-Language ModelingTransformers Can Do Arithmetic with the Right EmbeddingsMatryoshka Multimodal ModelsI2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion ModelsZamba: A Compact 7B SSM Hybrid ModelLooking Backward: Streaming Video-to-Video Translation with Feature Banks…
  continue reading
 
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal ModelsMeteor: Mamba-based Traversal of Rationale for Large Language and Vision ModelsGrokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of GeneralizationAya 23: Open Weight Releases to Further Multilingual ProgressStacking Your Transformers: A Close…
  continue reading
 
ReVideo: Remake a Video with Motion and Content ControlNot All Language Model Features Are LinearRectifID: Personalizing Rectified Flow with Anchored Classifier GuidanceVisual Echoes: A Simple Unified Transformer for Audio-Visual GenerationDeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic DataDense Connector for MLLMs…
  continue reading
 
Your Transformer is Secretly LinearDiffusion for World Modeling: Visual Details Matter in AtariFace Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute ControlReducing Transformer Key-Value Cache Size with Cross-Layer AttentionOmniGlue: Generalizable Feature Matching with Foundation Model GuidancePersonalized Residuals for C…
  continue reading
 
FIFO-Diffusion: Generating Infinite Videos from Text without TrainingMoRA: High-Rank Updating for Parameter-Efficient Fine-TuningOpenRLHF: An Easy-to-use, Scalable and High-performance RLHF FrameworkImp: Highly Capable Large Multimodal Models for Mobile DevicesOcto: An Open-Source Generalist Robot PolicyTowards Modular LLMs by Building and Reusing …
  continue reading
 
INDUS: Effective and Efficient Language Models for Scientific ApplicationsObservational Scaling Laws and the Predictability of Language Model PerformanceGrounded 3D-LLM with Referent TokensLayer-Condensed KV Cache for Efficient Inference of Large Language ModelsDynamic data sampler for cross-language transfer learning in large language models…
  continue reading
 
Chameleon: Mixed-Modal Early-Fusion Foundation ModelsLoRA Learns Less and Forgets LessMany-Shot In-Context Learning in Multimodal Foundation ModelsCAT3D: Create Anything in 3D with Multi-View Diffusion ModelsGrounding DINO 1.5: Advance the "Edge" of Open-Set Object DetectionDual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode…
  continue reading
 
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language ModelsXmodel-VLM: A Simple Baseline for Multimodal Vision Language ModelBEHAVIOR Vision Suite: Customizable Dataset Generation via SimulationNaturalistic Music Decoding from EEG Data via Latent Diffusion ModelsNo Time to Waste: Squeeze Time into Channel for Mobile Vide…
  continue reading
 
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion ModelsBeyond Scaling Laws: Understanding Transformer Performance with Associative MemoryCoin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided ConditioningHunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Unde…
  continue reading
 
What matters when building vision-language models?RLHF Workflow: From Reward Modeling to Online RLHFSUTRA: Scalable Multilingual Language Model ArchitectureSambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of ExpertsPlot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from …
  continue reading
 
BlenderAlchemy: Editing 3D Graphics with Vision-Language ModelsStylus: Automatic Adapter Selection for Diffusion ModelsAg2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action RepresentationsDressCode: Autoregressively Sewing and Generating Garments from Text GuidancePLLaVA : Parameter-free LLaVA Extension from Images to V…
  continue reading
 
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency ModelVisual Fact Checker: Enabling High-Fidelity Detailed Caption GenerationGS-LRM: Large Reconstruction Model for 3D Gaussian SplattingSAGS: Structure-Aware 3D Gaussian SplattingInvisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting…
  continue reading
 
Paint by Inpaint: Learning to Add Image Objects by Removing Them FirstSelf-Play Preference Optimization for Language Model AlignmentAutomatic Creative Selection with Cross-Modal MatchingSTT: Stateful Tracking with Transformers for Autonomous DrivingOctopus v4: Graph of language models
  continue reading
 
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language ModelsWildChat: 1M ChatGPT Interaction Logs in the WildStoryDiffusion: Consistent Self-Attention for Long-Range Image and Video GenerationLoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical ReportLLM-AD: Large Language Model based Audio Description System…
  continue reading
 
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3A Careful Examination of Large Language Model Performance on Grade School ArithmeticSpectrally Pruned Gaussian Fields with Neural CompensationSemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General SoundClover: Regressive Lightweight Speculative …
  continue reading
 
Octopus v4: Graph of language modelsInstantFamily: Masked Attention for Zero-shot Multi-ID Image GenerationBetter & Faster Large Language Models via Multi-token PredictionGS-LRM: Large Reconstruction Model for 3D Gaussian SplattingIterative Reasoning Preference Optimization
  continue reading
 
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse ModelsLEGENT: Open Platform for Embodied AgentsAg2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action RepresentationsKangaroo: Lossless Self-Speculative Decoding via Double Early ExitingBlenderAlchemy: Editing 3D Graphics with Vision-Languag…
  continue reading
 
AI Papers Podcast for 04/29/2024 PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense CaptioningAdvPrompter: Fast Adaptive Adversarial Prompting for LLMsHaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo CollectionsMaPa: Text-driven Photorealistic Material Painting for 3D Shapes…
  continue reading
 
AI Papers Podcast for 04/26/2024 How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source SuitesInteractive3D: Create What You Want by Interactive 3D GenerationLayer Skip: Enabling Early Exit Inference and Self-Speculative DecodingTele-FLM Technical ReportSEED-Bench-2-Plus: Benchmarking Multimodal Large Language Mo…
  continue reading
 
AI Papers Podcast for 04/25/2024 Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image SynthesisA Multimodal Automated Interpretability AgentSEED-X: Multimodal Models with Unified Multi-granularity Comprehension and GenerationMultiBooth: Towards Generating All Your Concepts in an Image from TextLearning H-Infinity Locomotion Control…
  continue reading
 
AI Papers Podcast for 04/24/2024 OpenELM: An Efficient Language Model Family with Open-source Training and Inference FrameworkMulti-Head Mixture-of-ExpertsPegasus-v1 Technical ReportAlign Your Steps: Optimizing Sampling Schedules in Diffusion ModelsSnapKV: LLM Knows What You are Looking for Before Generation…
  continue reading
 
Loading …

Quick Reference Guide