PocketPod is a silly, song-filled, weekly Animal Crossing podcast about Pocket Camp and New Horizons. Join JoeyBoey, Rar, and Leesh on their journey through these delightful games.
…
continue reading
A daily update on the latest AI Research Papers. We provide a high level overview of a handful of papers each day and will link all papers in the description for further reading. This podcast is created entirely with AI by PocketPod. Head over to https://pocketpod.app to learn more.
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Animal Crossing #232 - Wild Wonderful and Totally Twisted
1:11:23
1:11:23
Play later
Play later
Lists
Like
Liked
1:11:23
This week we hope (not) to see natural disasters from a safe, social distance while wearing bike helmets. We dig deep into the next sets of #AnimalCrossing #Lego sets and the new Super Mario Land in Orlando. We also talk about #birds, like a lot. --- Patreon Members Only: View this episode as a Vodcast! --- Join our Patreon! https://patreon.com/the…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Advancing AI's Mathematical Reasoning: WE-MATH, ROS-LLM Framework, Autoregressive Image Generation
10:36
10:36
Play later
Play later
Lists
Like
Liked
10:36
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoningMMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient EvaluationLiteSearch: Efficacious Tree Search for LLMWavelets Are All You Need for Autoregressive Image…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Persona-Driven Data Synthesis, Enhancing Medical MLLMs, Robot Learning, Knowledge Distillation in LLMs, Text to 3D Gaussian Revolution
11:24
11:24
Play later
Play later
Lists
Like
Liked
11:24
Scaling Synthetic Data Creation with 1,000,000,000 PersonasHuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at ScaleLLaRA: Supercharging Robot Learning Data for Vision-Language PolicyDirect Preference Knowledge Distillation for Large Language ModelsGaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enh…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
OMG-LLaVA: Unifying Vision and Language Understanding, Step-DPO for LLMs Mathematical Reasoning, MUMU's Multimodal Image Generation
12:15
12:15
Play later
Play later
Lists
Like
Liked
12:15
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and UnderstandingStep-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMsMUMU: Bootstrapping Multimodal Image Generation from Text-to-Image DataSimulating Classroom Education with LLM-Empowered AgentsSeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval …
…
continue reading
![Artwork](/static/images/128pixel.png)
1
FineWeb Datasets, YouDream's 3D Animals, PDE-Solving Breakthrough, Noise-Conditioned Perception Alignment, Language Models' Continual Learning
11:02
11:02
Play later
Play later
Lists
Like
Liked
11:02
The FineWeb Datasets: Decanting the Web for the Finest Text Data at ScaleYouDream: Generating Anatomically Controllable Consistent Text-to-3D AnimalsDiffusionPDE: Generative PDE-Solving Under Partial ObservationAligning Diffusion Models with Noise-Conditioned PerceptionUnlocking Continual Learning Abilities in Language Models…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
BigCodeBench Challenges, Cambrian-1 Leap, D-MERIT's Evaluation, Long Context Breakthrough in Vision
11:06
11:06
Play later
Play later
Lists
Like
Liked
11:06
DreamBench++: A Human-Aligned Benchmark for Personalized Image GenerationBigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex InstructionsCambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMsEvaluating D-MERIT of Partial-annotation on Information RetrievalLong Context Transfer from Language to Vision…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
LongRAG Breakthrough, LLMs as Judges, Transformer Memory Insights, Video Library AI, Democratizing Art Styles
10:14
10:14
Play later
Play later
Lists
Like
Liked
10:14
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMsJudging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-JudgesComplexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a TaskTowards Retrieval Augmented Generation over Large Video LibrariesStylebreeder: Exploring …
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Scaling In-Context Reinforcement Learning, ChartMimic's AI Benchmark, Multimodal Document Comprehension, Long Context Reasoning Challenges
10:36
10:36
Play later
Play later
Lists
Like
Liked
10:36
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement LearningMake It Count: Text-to-Image Generation with an Accurate Number of ObjectsChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code GenerationNeedle In A Multimodal HaystackBABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Hay…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Revolutionizing Vision and Language Models: Depth Prediction Breakthroughs, Pixel-Level Transformers, and Robotic Skill Learning
13:20
13:20
Play later
Play later
Lists
Like
Liked
13:20
Depth Anything V2An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual PixelsTransformers meet Neural Algorithmic ReasonersSamba: Simple Hybrid State Space Models for Efficient Unlimited Context Language ModelingOpenVLA: An Open-Source Vision-Language-Action ModelAlleviating Distortion in Image Generation via Multi-Resolut…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
NaRCan Revolutionizes Video Editing, Training-Free Video Generation, Recaptioning Web Images with LLaMA-3, Novel Data Synthesis Approach, Smartphone LLM Inference
11:33
11:33
Play later
Play later
Lists
Like
Liked
11:33
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video EditingMotionClone: Training-Free Motion Cloning for Controllable Video GenerationWhat If We Recaption Billions of Web Images with LLaMA-3?Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with NothingPowerInfer-2: Fast Large Language Model I…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Revolutionizing Image Synthesis with TiTok, Multilingual Code Benchmark, Exploring GenAI Prompting Techniques,
10:53
10:53
Play later
Play later
Lists
Like
Liked
10:53
An Image is Worth 32 Tokens for Reconstruction and GenerationMcEval: Massively Multilingual Code EvaluationZero-shot Image Editing with Reference ImitationThe Prompt Report: A Systematic Survey of Prompting TechniquesTextGrad: Automatic "Differentiation" via Text
…
continue reading
![Artwork](/static/images/128pixel.png)
1
LlamaGen's Image Revolution, Husky: The Multi-Step Reasoner, Vript's Video Breakthrough, VALL-E 2 Achieves Human Parity
10:46
10:46
Play later
Play later
Lists
Like
Liked
10:46
Autoregressive Model Beats Diffusion: Llama for Scalable Image GenerationHusky: A Unified, Open-Source Language Agent for Multi-Step ReasoningVript: A Video Is Worth Thousands of WordsLighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View SynthesisVALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text …
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Mixture-of-Agents, Benchmarking LLMs, and GenAI Arena Evaluation
11:06
11:06
Play later
Play later
Lists
Like
Liked
11:06
Mixture-of-Agents Enhances Large Language Model CapabilitiesWildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the WildCRAG -- Comprehensive RAG BenchmarkGenAI Arena: An Open Evaluation Platform for Generative ModelsLarge Language Model Confidence Estimation via Black-Box Access
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Enhancing AI Video and Image Generation, BitsFusion Quantization, Step-aware Optimization, Thought-Augmented Reasoning, and Single Forward Video Generation
11:39
11:39
Play later
Play later
Lists
Like
Liked
11:39
ShareGPT4Video: Improving Video Understanding and Generation with Better CaptionsBitsFusion: 1.99 bits Weight Quantization of Diffusion ModelStep-aware Preference Optimization: Aligning Preference with Denoising Performance at Each StepBuffer of Thoughts: Thought-Augmented Reasoning with Large Language ModelsSF-V: Single Forward Video Generation Mo…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
AI Papers Podcast Special Edition: Apple Intelligence & Ferret-UI
1:52
1:52
Play later
Play later
Lists
Like
Liked
1:52
Apple announced new Siri features and Apple Intelligence today, Interestingly, Apple already released a paper, titled "Ferret-UI," on how it all works - a multimodal vision-language model capable of understanding widgets, icons, and text on an iOS mobile screen, and reasoning about their spatial relationships and functional meanings. https://arxiv.…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Block Transformers: Faster Inference, Mobile Device AI Agents, 3D-Image Generation, Low Latency TTS
10:41
10:41
Play later
Play later
Lists
Like
Liked
10:41
Block Transformer: Global-to-Local Language Modeling for Fast InferenceParrot: Multilingual Visual Instruction TuningMobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent CollaborationOuroboros3D: Image-to-3D Generation via 3D-aware Recursive DiffusionLiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autore…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Seed-TTS, Decoding LLMs, Innovations in Text-to-Video, Self-Improving AI Preferences, and Refining Diffusion Models
11:10
11:10
Play later
Play later
Lists
Like
Liked
11:10
Seed-TTS: A Family of High-Quality Versatile Speech Generation ModelsTo Believe or Not to Believe Your LLMI4VGen: Image as Stepping Stone for Text-to-Video GenerationSelf-Improving Robust Preference OptimizationGuiding a Diffusion Model with a Bad Version of Itself
…
continue reading
![Artwork](/static/images/128pixel.png)
1
MMLU-Pro: Next-Level Language Understanding, Tailored LLMs, High FPS Video Generation Innovation
11:30
11:30
Play later
Play later
Lists
Like
Liked
11:30
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding BenchmarkLearning Temporally Consistent Video Depth from Video Diffusion PriorsShow, Don't Tell: Aligning Language Models with Demonstrated FeedbackArtificial Generational Intelligence: Cultural Accumulation in Reinforcement LearningZeroSmooth: Training-free Diffuser Adaptati…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Transformers and State-Space Models Unite, Multi-modal LLM Benchmark, Perplexity in Data Pruning, Advancing 4D Content Generation
10:23
10:23
Play later
Play later
Lists
Like
Liked
10:23
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space DualityVideo-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video AnalysisPerplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference ModelsKaleido Diffusion: Improving Conditional Diffusion Models with Au…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
DITTO-2 Speeds Up Music AI, GECO's Quick 3D Generation, PLA4D's 4D Advances, DevEval's Real-World Code Benchmark, Parrot's LLM Application Efficiency
10:47
10:47
Play later
Play later
Lists
Like
Liked
10:47
AI Papers Podcast for 06/04/2024 DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music GenerationGECO: Generative Image-to-3D within a SECOndPLA4D: Pixel-Level Alignments for Text-to-4D Gaussian SplattingDevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code RepositoriesParrot: Efficient Serving of LLM-b…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Boosting Text Retrieval with CLIP Models, Rethinking Retrieval Augmented Generation, and Deciphering Human Behavior through MotionLLM
10:42
10:42
Play later
Play later
Lists
Like
Liked
10:42
AI Papers Podcast for 06/03/2024 Jina CLIP: Your CLIP Model Is Also Your Text RetrieverSimilarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered ThoughtsMotionLLM: Understanding Human Behaviors from Human Motions and VideosXwin-LM: Strong and Scalable Alignment Practice for LLMsMOFA-Video: Controllable Image Animati…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Bilingual LLM Transparency, T2V-Turbo's Video Generation, LLMs Surpassing Human Theory of Mind Performance, Advancements in LLM Attribution
8:47
8:47
Play later
Play later
Lists
Like
Liked
8:47
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model SeriesT2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward FeedbackLLMs achieve adult human performance on higher-order theory of mind tasksNearest Neighbor Speculative Decoding for LLM Generation and AttributionZipper: A Multi-Tower Decoder Ar…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Phased Consistency Model, 2-Stage Backpropagation, and the Future of 4D World Reconstruction
8:09
8:09
Play later
Play later
Lists
Like
Liked
8:09
Phased Consistency Model2BP: 2-Stage BackpropagationGFlow: Recovering 4D World from Monocular VideoInstruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction TuningLLaMA-NAS: Efficient Neural Architecture Search for Large Language Models
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Vision-Language Models, Arithmetic Transformers, Next-Gen Video Editing:
10:20
10:20
Play later
Play later
Lists
Like
Liked
10:20
An Introduction to Vision-Language ModelingTransformers Can Do Arithmetic with the Right EmbeddingsMatryoshka Multimodal ModelsI2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion ModelsZamba: A Compact 7B SSM Hybrid ModelLooking Backward: Streaming Video-to-Video Translation with Feature Banks…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
ConvLLaVA's Visual Compression, Efficient LLVM, Multilingual Aya 23, and AutoCoder's Code Mastery
11:11
11:11
Play later
Play later
Lists
Like
Liked
11:11
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal ModelsMeteor: Mamba-based Traversal of Rationale for Large Language and Vision ModelsGrokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of GeneralizationAya 23: Open Weight Releases to Further Multilingual ProgressStacking Your Transformers: A Close…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Revolution in Image Generation, Thermodynamic Gradient Descent, DMD2 for Fast Synthesis, Distributed Speculative Inference
10:56
10:56
Play later
Play later
Lists
Like
Liked
10:56
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Language Model Mysteries, Personalized Image Generation, Audio-Visual Transformer Innovations, DeepSeek-Prover, Dense Connector: MLLM Potential
10:31
10:31
Play later
Play later
Lists
Like
Liked
10:31
ReVideo: Remake a Video with Motion and Content ControlNot All Language Model Features Are LinearRectifID: Personalizing Rectified Flow with Anchored Classifier GuidanceVisual Echoes: A Simple Unified Transformer for Audio-Visual GenerationDeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic DataDense Connector for MLLMs…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Transformer Linearity, Face-Adapter Diffusion Models, Cross-Layer Attention Shrinks LLMs, Image Generation Breakthrough
10:14
10:14
Play later
Play later
Lists
Like
Liked
10:14
Your Transformer is Secretly LinearDiffusion for World Modeling: Visual Details Matter in AtariFace Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute ControlReducing Transformer Key-Value Cache Size with Cross-Layer AttentionOmniGlue: Generalizable Feature Matching with Foundation Model GuidancePersonalized Residuals for C…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Infinite Video Generation, High-Rank Fine-Tuning, Modular LLMs with LoRA Libraries
9:18
9:18
Play later
Play later
Lists
Like
Liked
9:18
FIFO-Diffusion: Generating Infinite Videos from Text without TrainingMoRA: High-Rank Updating for Parameter-Efficient Fine-TuningOpenRLHF: An Easy-to-use, Scalable and High-performance RLHF FrameworkImp: Highly Capable Large Multimodal Models for Mobile DevicesOcto: An Open-Source Generalist Robot PolicyTowards Modular LLMs by Building and Reusing …
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Tailoring Language Models for Science, Scaling Laws in NLP, Grounded 3D-LLM Innovations, Efficient Large Model Inference
9:30
9:30
Play later
Play later
Lists
Like
Liked
9:30
INDUS: Effective and Efficient Language Models for Scientific ApplicationsObservational Scaling Laws and the Predictability of Language Model PerformanceGrounded 3D-LLM with Referent TokensLayer-Condensed KV Cache for Efficient Inference of Large Language ModelsDynamic data sampler for cross-language transfer learning in large language models…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Chameleon's Multimodal Breakthrough, LoRA's Learning Efficiency, Many-Shot In-Context Learning, Object Detection Innovation, Text-to-3D Generation
10:29
10:29
Play later
Play later
Lists
Like
Liked
10:29
Chameleon: Mixed-Modal Early-Fusion Foundation ModelsLoRA Learns Less and Forgets LessMany-Shot In-Context Learning in Multimodal Foundation ModelsCAT3D: Create Anything in 3D with Multi-View Diffusion ModelsGrounding DINO 1.5: Advance the "Edge" of Open-Set Object DetectionDual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Efficient Multimodality, Vision Suite's Custom Data, EEG Music Decoding Advances, Mobile Video Breakthrough
8:44
8:44
Play later
Play later
Lists
Like
Liked
8:44
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language ModelsXmodel-VLM: A Simple Baseline for Multimodal Vision Language ModelBEHAVIOR Vision Suite: Customizable Dataset Generation via SimulationNaturalistic Music Decoding from EEG Data via Latent Diffusion ModelsNo Time to Waste: Squeeze Time into Channel for Mobile Vide…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Transformer Models Beyond Scaling, Multilingual Image Synthesis, Advanced Text-to-Image Control
9:28
9:28
Play later
Play later
Lists
Like
Liked
9:28
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion ModelsBeyond Scaling Laws: Understanding Transformer Performance with Associative MemoryCoin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided ConditioningHunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Unde…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Vision-Language Model Design, Online RLHF Workflow, Multilingual AI, AI Memory Solution
9:41
9:41
Play later
Play later
Lists
Like
Liked
9:41
What matters when building vision-language models?RLHF Workflow: From Reward Modeling to Online RLHFSUTRA: Scalable Multilingual Language Model ArchitectureSambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of ExpertsPlot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from …
…
continue reading
![Artwork](/static/images/128pixel.png)
1
BlenderAlchemy Revolution, Stylus Adapter Magic, DressCode Digital Fashion
10:04
10:04
Play later
Play later
Lists
Like
Liked
10:04
BlenderAlchemy: Editing 3D Graphics with Vision-Language ModelsStylus: Automatic Adapter Selection for Diffusion ModelsAg2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action RepresentationsDressCode: Autoregressively Sewing and Generating Garments from Text GuidancePLLaVA : Parameter-free LLaVA Extension from Images to V…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Animal Crossing #231 - Papa Has to See This Me
1:06:39
1:06:39
Play later
Play later
Lists
Like
Liked
1:06:39
This week, we're catching up on our spring break activities, including pottery, celestial events, and Passover. We also hop into #AnimalCrossing #NewHorizons to ponder a koala and play vacuuming. And we learn... how to type? --- Patreon Members Only: View this episode as a Vodcast! --- Join our Patreon! https://patreon.com/thepocketpod Visit our We…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Real-Time Motion Control, Next-Gen Visual Captions, 3D Scene Reconstruction Innovations
11:30
11:30
Play later
Play later
Lists
Like
Liked
11:30
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency ModelVisual Fact Checker: Enabling High-Fidelity Detailed Caption GenerationGS-LRM: Large Reconstruction Model for 3D Gaussian SplattingSAGS: Structure-Aware 3D Gaussian SplattingInvisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Kolmogorov-Arnold Networks, Iterative Reasoning Optimization, Extending Llama-3 Context Length
11:24
11:24
Play later
Play later
Lists
Like
Liked
11:24
KAN: Kolmogorov-Arnold NetworksInstantFamily: Masked Attention for Zero-shot Multi-ID Image GenerationBetter & Faster Large Language Models via Multi-token PredictionIterative Reasoning Preference OptimizationExtending Llama-3's Context Ten-Fold Overnight
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Innovative Image Editing, Advanced Autonomous Tracking, and the Evolution of Open-Source AI
12:10
12:10
Play later
Play later
Lists
Like
Liked
12:10
Paint by Inpaint: Learning to Add Image Objects by Removing Them FirstSelf-Play Preference Optimization for Language Model AlignmentAutomatic Creative Selection with Cross-Modal MatchingSTT: Stateful Tracking with Transformers for Autonomous DrivingOctopus v4: Graph of language models
…
continue reading
![Artwork](/static/images/128pixel.png)
1
GPT-4 Rival Models, Revolutionizing Open Source LM Evaluation, StoryDiffusion's Visual Narrative Breakthrough
11:31
11:31
Play later
Play later
Lists
Like
Liked
11:31
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language ModelsWildChat: 1M ChatGPT Interaction Logs in the WildStoryDiffusion: Consistent Self-Attention for Long-Range Image and Video GenerationLoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical ReportLLM-AD: Large Language Model based Audio Description System…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Model Editing Insights with Llama-3, Rethinking Large Language Models in Math, 3D Rendering and Audio Compression
11:52
11:52
Play later
Play later
Lists
Like
Liked
11:52
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3A Careful Examination of Large Language Model Performance on Grade School ArithmeticSpectrally Pruned Gaussian Fields with Neural CompensationSemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General SoundClover: Regressive Lightweight Speculative …
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Advancing LLMs with Multi-Token Prediction, Octopus v4 Revolution in Open-Source Language Models, Enhancing Reasoning with Iterative Preference Optimization
11:55
11:55
Play later
Play later
Lists
Like
Liked
11:55
Octopus v4: Graph of language modelsInstantFamily: Masked Attention for Zero-shot Multi-ID Image GenerationBetter & Faster Large Language Models via Multi-token PredictionGS-LRM: Large Reconstruction Model for 3D Gaussian SplattingIterative Reasoning Preference Optimization
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Evaluating LLMs with Diverse Models, Novel Robotic Skills Framework, Editing 3D Graphics with VLMs
11:02
11:02
Play later
Play later
Lists
Like
Liked
11:02
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse ModelsLEGENT: Open Platform for Embodied AgentsAg2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action RepresentationsKangaroo: Lossless Self-Speculative Decoding via Double Early ExitingBlenderAlchemy: Editing 3D Graphics with Vision-Languag…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
PLLaVA Breakthrough in Video-Language Modeling, Exploring Landmarks with HaLo-NeRF, and MaPa's Text-driven 3D Material Painting
9:19
9:19
Play later
Play later
Lists
Like
Liked
9:19
AI Papers Podcast for 04/29/2024 PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense CaptioningAdvPrompter: Fast Adaptive Adversarial Prompting for LLMsHaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo CollectionsMaPa: Text-driven Photorealistic Material Painting for 3D Shapes…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Bridging the Gap to GPT-4V, Interactive 3D Generation, Accelerating LLM Inference
12:13
12:13
Play later
Play later
Lists
Like
Liked
12:13
AI Papers Podcast for 04/26/2024 How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source SuitesInteractive3D: Create What You Want by Interactive 3D GenerationLayer Skip: Enabling Early Exit Inference and Self-Speculative DecodingTele-FLM Technical ReportSEED-Bench-2-Plus: Benchmarking Multimodal Large Language Mo…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Hyper-SD Breakthrough, MAIA's Neural Understanding, SEED-X Multimodal Innovation
11:23
11:23
Play later
Play later
Lists
Like
Liked
11:23
AI Papers Podcast for 04/25/2024 Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image SynthesisA Multimodal Automated Interpretability AgentSEED-X: Multimodal Models with Unified Multi-granularity Comprehension and GenerationMultiBooth: Towards Generating All Your Concepts in an Image from TextLearning H-Infinity Locomotion Control…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Enhancing AI with Multi-Head MoEs, Pegasus-1's Video Mastery, Optimizing Diffusion Models,
11:08
11:08
Play later
Play later
Lists
Like
Liked
11:08
AI Papers Podcast for 04/24/2024 OpenELM: An Efficient Language Model Family with Open-source Training and Inference FrameworkMulti-Head Mixture-of-ExpertsPegasus-v1 Technical ReportAlign Your Steps: Optimizing Sampling Schedules in Diffusion ModelsSnapKV: LLM Knows What You are Looking for Before Generation…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Model Efficiency, Instruction Prioritization, and Workflow Automation
11:51
11:51
Play later
Play later
Lists
Like
Liked
11:51
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions FlowMind: Automatic Workflow Generation with LLMs Music Consistency Models How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Physics-Based Video, Text-Centric Visuals, Gaussian Splatting, Program Repair, Progressive Web Crawling
11:47
11:47
Play later
Play later
Lists
Like
Liked
11:47
AI Papers Podcast for 04/23/2024 PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation TextSquare: Scaling up Text-Centric Visual Instruction Tuning Does Gaussian Splatting need SFM Initialization? How Far Can We Go with Practical Function-Level Program Repair? AutoCrawler: A Progressive Understanding Web Agent for Web Crawler…
…
continue reading
![Artwork](/static/images/128pixel.png)
1
Adapting Diverse Controls: Ctrl-Adapter, HQ-Edit, Tango 2
11:48
11:48
Play later
Play later
Lists
Like
Liked
11:48
AI Papers Podcast for 04/21/2024Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion ModelHQ-Edit: A High-Quality Dataset for Instruction-based Image EditingTango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference OptimizationTextHawk: Exploring Efficient Fine-Grained Percept…
…
continue reading