A no-fluff Roadmap for launching, loving and living a HIGHER VERSION OF YOURSELF!! The insights here can activate mindset re-engineering and conscious metamorphosis for men and women seeking a life of peace, purpose and prosperity!! I share my fearless framework for catalyzing an UPGRADE in all significant areas of your life! You can radically grow in your vision, experience and influence!
…
continue reading
In the world of celebrity mummy bloggers, the voice of the Dad often goes unheard. Dads know stuff too, right? Or do they? Dads Don't Know takes you deep into the minds of fathers, to get their insights and real perspectives on all facets of parenting. With discussions ranging from child birth, to raising toddlers, education and much more, it is more than likely you will be shocked, appalled and perhaps offended by the revelations. The team at Dads Don't Know also believe that producers of c ...
…
continue reading
A daily update on the latest AI Research Papers. We provide a high level overview of a handful of papers each day and will link all papers in the description for further reading. This podcast is created entirely with AI by PocketPod. Head over to https://pocketpod.app to learn more.
…
continue reading
Goldenboy is an award winning esports broadcaster who brings you a show that covers ALL video games from Overwatch, Valorant, League of Legends, Halo, Fortnite, CSGO and so much more. You'll learn all the latest news and conversations around the incredible world of esports from a source who's right in the middle of the action.
…
continue reading
1
Improving Agent Design, JPEG-LM's Visual Breakthrough, TurboEdit's Real-Time Image Edits, Video Segmentation Advances, LLMs Learning Like Humans, RL Benchmarks
16:00
16:00
Play later
Play later
Lists
Like
Liked
16:00
xGen-MM (BLIP-3): A Family of Open Large Multimodal ModelsJPEG-LM: LLMs as Image Generators with Canonical Codec RepresentationsAutomated Design of Agentic SystemsTurboEdit: Instant text-based image editingSurgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame PruningFine-tuning Large Language Models with Human-inspired Lea…
…
continue reading
1
Science & Clinical LLMs Leaps, Enhancing Small Model Reasoning, New Frontiers in Controlled Media Generation
14:24
14:24
Play later
Play later
Lists
Like
Liked
14:24
The AI Scientist: Towards Fully Automated Open-Ended Scientific DiscoveryMed42-v2: A Suite of Clinical LLMsMutual Reasoning Makes Smaller LLMs Stronger Problem-SolversControlNeXt: Powerful and Efficient Control for Image and Video GenerationCogVideoX: Text-to-Video Diffusion Models with An Expert TransformerFruitNeRF: A Unified Neural Radiance Fiel…
…
continue reading
1
Enjoy Your Significant Life of Vision (No Burnout, No Overwhelm)
23:15
23:15
Play later
Play later
Lists
Like
Liked
23:15
In this coaching conversation with visionaries, DDK shares Seven Strategies to support your pursuit of a harmonious, empowered and successful life, where your vision thrives and you do too!! Please find below some of the tools I mentioned during this Conversation. Make sure you leave a comment for what resonated the most with you, and what you hope…
…
continue reading
1
Multimodal Benchmarks, Visual Task Transfer, and 3D Object Generation
14:15
14:15
Play later
Play later
Lists
Like
Liked
14:15
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language ModelsLLaVA-OneVision: Easy Visual Task TransferAn Object is Worth 64x64 Pixels: Generating 3D Object via Image DiffusionMedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for MedicineIPAdapter-Instruct: Resolving Ambiguity in Image-based Co…
…
continue reading
1
Image and Video Segmentation with SAM 2, Gemma 2 for Efficient Language Models, Boosting Small Models with Contrastive Fine-Tuning, and MM-Vet v2 Challenges Large Multimodal Models
13:40
13:40
Play later
Play later
Lists
Like
Liked
13:40
SAM 2: Segment Anything in Images and VideosGemma 2: Improving Open Language Models at a Practical SizeCoarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language ModelImproving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuningOmniParser for Pure Vision Based GUI AgentSF3D: Stable Fast 3D Mesh Reconstructi…
…
continue reading
1
Text-Guided Image Inpainting, AMEX for Mobile GUI Agents, AgentScope's Multi-Agent Simulation
14:29
14:29
Play later
Play later
Lists
Like
Liked
14:29
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion ModelLAMBDA: A Large Model Based Data AgentAMEX: Android Multi-annotation Expo Dataset for Mobile GUI AgentsBetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth EstimationVery Large-Scale Multi-Agent Simulation in AgentScopeData Mixture Inference: What do BPE Tok…
…
continue reading
1
OpenDevin & AI Software Development, Enhancing Visual Language Models, , DDK: Refining Large Language Model Efficiency through Domain Knowledge
13:45
13:45
Play later
Play later
Lists
Like
Liked
13:45
OpenDevin: An Open Platform for AI Software Developers as Generalist AgentsVILA^2: VILA Augmented VILAHumanVid: Demystifying Training Data for Camera-controllable Human Image AnimationPERSONA: A Reproducible Testbed for Pluralistic AlignmentSV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View ConsistencyScalify: scale propagation for…
…
continue reading
1
Vocabulary Expansion for Large Models, Big Data Enhancing LMs, 4D Reconstruction Progress, AI Cityscape Generation, DPO Policy Analysis, Expanding Code Models, Multimodal LM Trust Evaluation
14:55
14:55
Play later
Play later
Lists
Like
Liked
14:55
Scaling Laws with Vocabulary: Larger Models Deserve Larger VocabulariesScaling Retrieval-Based Language Models with a Trillion-Token DatastoreShape of Motion: 4D Reconstruction from a Single VideoStreetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video DiffusionUnderstanding Reference Policies in Direct Preference Opti…
…
continue reading
1
Qwen2 Language Model, Mitigating Privacy Risks in LLMs, Exploring Non-Determinism, Increased Efficiency with Q-Sparse, GRUtopia for Embodied AI
10:38
10:38
Play later
Play later
Lists
Like
Liked
10:38
Qwen2 Technical ReportLearning to Refuse: Towards Mitigating Privacy Risks in LLMsThe Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-DeterminismQ-Sparse: All Large Language Models can be Fully Sparsely-ActivatedGRUtopia: Dream General Robots in a City at Scale
…
continue reading
1
Skywork-Math's Reasoning, Video Diffusion Model Innovations, Multimodal Learning, Q-GaLore's Memory Efficiency, MAVIS: Visual Math Instruction
12:11
12:11
Play later
Play later
Lists
Like
Liked
12:11
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes OnVideo Diffusion Alignment via Reward GradientsMultimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language ModelQ-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank GradientsMAVIS: Math…
…
continue reading
1
Beyond Encoders in Vision-Language Models, Revolutionizing Human-LLM Interaction, and Advancing Knowledge Graphs
12:05
12:05
Play later
Play later
Lists
Like
Liked
12:05
Unveiling Encoder-Free Vision-Language ModelsFunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMsAriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM AgentsRULE: Reliable Multimodal RAG for Factuality in Medical Vision Language ModelsChartGemma: Visual Instruction-…
…
continue reading
1
Diffusion Forcing to Expert Tuning, Structured Planning, Vision-Language Models, and Tabular ML Benchmarks
11:34
11:34
Play later
Play later
Lists
Like
Liked
11:34
Diffusion Forcing: Next-token Prediction Meets Full-Sequence DiffusionLet the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language ModelsPlanetarium: A Rigorous Benchmark for Translating Text to Structured Planning LanguagesInternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Co…
…
continue reading
1
Advancing AI's Mathematical Reasoning: WE-MATH, ROS-LLM Framework, Autoregressive Image Generation
10:36
10:36
Play later
Play later
Lists
Like
Liked
10:36
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoningMMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient EvaluationLiteSearch: Efficacious Tree Search for LLMWavelets Are All You Need for Autoregressive Image…
…
continue reading
1
Persona-Driven Data Synthesis, Enhancing Medical MLLMs, Robot Learning, Knowledge Distillation in LLMs, Text to 3D Gaussian Revolution
11:24
11:24
Play later
Play later
Lists
Like
Liked
11:24
Scaling Synthetic Data Creation with 1,000,000,000 PersonasHuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at ScaleLLaRA: Supercharging Robot Learning Data for Vision-Language PolicyDirect Preference Knowledge Distillation for Large Language ModelsGaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enh…
…
continue reading
1
OMG-LLaVA: Unifying Vision and Language Understanding, Step-DPO for LLMs Mathematical Reasoning, MUMU's Multimodal Image Generation
12:15
12:15
Play later
Play later
Lists
Like
Liked
12:15
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and UnderstandingStep-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMsMUMU: Bootstrapping Multimodal Image Generation from Text-to-Image DataSimulating Classroom Education with LLM-Empowered AgentsSeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval …
…
continue reading
1
FineWeb Datasets, YouDream's 3D Animals, PDE-Solving Breakthrough, Noise-Conditioned Perception Alignment, Language Models' Continual Learning
11:02
11:02
Play later
Play later
Lists
Like
Liked
11:02
The FineWeb Datasets: Decanting the Web for the Finest Text Data at ScaleYouDream: Generating Anatomically Controllable Consistent Text-to-3D AnimalsDiffusionPDE: Generative PDE-Solving Under Partial ObservationAligning Diffusion Models with Noise-Conditioned PerceptionUnlocking Continual Learning Abilities in Language Models…
…
continue reading
1
BigCodeBench Challenges, Cambrian-1 Leap, D-MERIT's Evaluation, Long Context Breakthrough in Vision
11:06
11:06
Play later
Play later
Lists
Like
Liked
11:06
DreamBench++: A Human-Aligned Benchmark for Personalized Image GenerationBigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex InstructionsCambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMsEvaluating D-MERIT of Partial-annotation on Information RetrievalLong Context Transfer from Language to Vision…
…
continue reading
1
LongRAG Breakthrough, LLMs as Judges, Transformer Memory Insights, Video Library AI, Democratizing Art Styles
10:14
10:14
Play later
Play later
Lists
Like
Liked
10:14
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMsJudging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-JudgesComplexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a TaskTowards Retrieval Augmented Generation over Large Video LibrariesStylebreeder: Exploring …
…
continue reading
1
Scaling In-Context Reinforcement Learning, ChartMimic's AI Benchmark, Multimodal Document Comprehension, Long Context Reasoning Challenges
10:36
10:36
Play later
Play later
Lists
Like
Liked
10:36
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement LearningMake It Count: Text-to-Image Generation with an Accurate Number of ObjectsChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code GenerationNeedle In A Multimodal HaystackBABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Hay…
…
continue reading
1
Revolutionizing Vision and Language Models: Depth Prediction Breakthroughs, Pixel-Level Transformers, and Robotic Skill Learning
13:20
13:20
Play later
Play later
Lists
Like
Liked
13:20
Depth Anything V2An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual PixelsTransformers meet Neural Algorithmic ReasonersSamba: Simple Hybrid State Space Models for Efficient Unlimited Context Language ModelingOpenVLA: An Open-Source Vision-Language-Action ModelAlleviating Distortion in Image Generation via Multi-Resolut…
…
continue reading
1
NaRCan Revolutionizes Video Editing, Training-Free Video Generation, Recaptioning Web Images with LLaMA-3, Novel Data Synthesis Approach, Smartphone LLM Inference
11:33
11:33
Play later
Play later
Lists
Like
Liked
11:33
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video EditingMotionClone: Training-Free Motion Cloning for Controllable Video GenerationWhat If We Recaption Billions of Web Images with LLaMA-3?Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with NothingPowerInfer-2: Fast Large Language Model I…
…
continue reading
1
Revolutionizing Image Synthesis with TiTok, Multilingual Code Benchmark, Exploring GenAI Prompting Techniques,
10:53
10:53
Play later
Play later
Lists
Like
Liked
10:53
An Image is Worth 32 Tokens for Reconstruction and GenerationMcEval: Massively Multilingual Code EvaluationZero-shot Image Editing with Reference ImitationThe Prompt Report: A Systematic Survey of Prompting TechniquesTextGrad: Automatic "Differentiation" via Text
…
continue reading
1
LlamaGen's Image Revolution, Husky: The Multi-Step Reasoner, Vript's Video Breakthrough, VALL-E 2 Achieves Human Parity
10:46
10:46
Play later
Play later
Lists
Like
Liked
10:46
Autoregressive Model Beats Diffusion: Llama for Scalable Image GenerationHusky: A Unified, Open-Source Language Agent for Multi-Step ReasoningVript: A Video Is Worth Thousands of WordsLighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View SynthesisVALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text …
…
continue reading
1
Mixture-of-Agents, Benchmarking LLMs, and GenAI Arena Evaluation
11:06
11:06
Play later
Play later
Lists
Like
Liked
11:06
Mixture-of-Agents Enhances Large Language Model CapabilitiesWildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the WildCRAG -- Comprehensive RAG BenchmarkGenAI Arena: An Open Evaluation Platform for Generative ModelsLarge Language Model Confidence Estimation via Black-Box Access
…
continue reading
1
Enhancing AI Video and Image Generation, BitsFusion Quantization, Step-aware Optimization, Thought-Augmented Reasoning, and Single Forward Video Generation
11:39
11:39
Play later
Play later
Lists
Like
Liked
11:39
ShareGPT4Video: Improving Video Understanding and Generation with Better CaptionsBitsFusion: 1.99 bits Weight Quantization of Diffusion ModelStep-aware Preference Optimization: Aligning Preference with Denoising Performance at Each StepBuffer of Thoughts: Thought-Augmented Reasoning with Large Language ModelsSF-V: Single Forward Video Generation Mo…
…
continue reading
1
AI Papers Podcast Special Edition: Apple Intelligence & Ferret-UI
1:52
1:52
Play later
Play later
Lists
Like
Liked
1:52
Apple announced new Siri features and Apple Intelligence today, Interestingly, Apple already released a paper, titled "Ferret-UI," on how it all works - a multimodal vision-language model capable of understanding widgets, icons, and text on an iOS mobile screen, and reasoning about their spatial relationships and functional meanings. https://arxiv.…
…
continue reading
1
Block Transformers: Faster Inference, Mobile Device AI Agents, 3D-Image Generation, Low Latency TTS
10:41
10:41
Play later
Play later
Lists
Like
Liked
10:41
Block Transformer: Global-to-Local Language Modeling for Fast InferenceParrot: Multilingual Visual Instruction TuningMobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent CollaborationOuroboros3D: Image-to-3D Generation via 3D-aware Recursive DiffusionLiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autore…
…
continue reading
1
Seed-TTS, Decoding LLMs, Innovations in Text-to-Video, Self-Improving AI Preferences, and Refining Diffusion Models
11:10
11:10
Play later
Play later
Lists
Like
Liked
11:10
Seed-TTS: A Family of High-Quality Versatile Speech Generation ModelsTo Believe or Not to Believe Your LLMI4VGen: Image as Stepping Stone for Text-to-Video GenerationSelf-Improving Robust Preference OptimizationGuiding a Diffusion Model with a Bad Version of Itself
…
continue reading
1
MMLU-Pro: Next-Level Language Understanding, Tailored LLMs, High FPS Video Generation Innovation
11:30
11:30
Play later
Play later
Lists
Like
Liked
11:30
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding BenchmarkLearning Temporally Consistent Video Depth from Video Diffusion PriorsShow, Don't Tell: Aligning Language Models with Demonstrated FeedbackArtificial Generational Intelligence: Cultural Accumulation in Reinforcement LearningZeroSmooth: Training-free Diffuser Adaptati…
…
continue reading
1
Transformers and State-Space Models Unite, Multi-modal LLM Benchmark, Perplexity in Data Pruning, Advancing 4D Content Generation
10:23
10:23
Play later
Play later
Lists
Like
Liked
10:23
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space DualityVideo-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video AnalysisPerplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference ModelsKaleido Diffusion: Improving Conditional Diffusion Models with Au…
…
continue reading
1
DITTO-2 Speeds Up Music AI, GECO's Quick 3D Generation, PLA4D's 4D Advances, DevEval's Real-World Code Benchmark, Parrot's LLM Application Efficiency
10:47
10:47
Play later
Play later
Lists
Like
Liked
10:47
AI Papers Podcast for 06/04/2024 DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music GenerationGECO: Generative Image-to-3D within a SECOndPLA4D: Pixel-Level Alignments for Text-to-4D Gaussian SplattingDevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code RepositoriesParrot: Efficient Serving of LLM-b…
…
continue reading
1
Boosting Text Retrieval with CLIP Models, Rethinking Retrieval Augmented Generation, and Deciphering Human Behavior through MotionLLM
10:42
10:42
Play later
Play later
Lists
Like
Liked
10:42
AI Papers Podcast for 06/03/2024 Jina CLIP: Your CLIP Model Is Also Your Text RetrieverSimilarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered ThoughtsMotionLLM: Understanding Human Behaviors from Human Motions and VideosXwin-LM: Strong and Scalable Alignment Practice for LLMsMOFA-Video: Controllable Image Animati…
…
continue reading
1
Bilingual LLM Transparency, T2V-Turbo's Video Generation, LLMs Surpassing Human Theory of Mind Performance, Advancements in LLM Attribution
8:47
8:47
Play later
Play later
Lists
Like
Liked
8:47
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model SeriesT2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward FeedbackLLMs achieve adult human performance on higher-order theory of mind tasksNearest Neighbor Speculative Decoding for LLM Generation and AttributionZipper: A Multi-Tower Decoder Ar…
…
continue reading
1
Phased Consistency Model, 2-Stage Backpropagation, and the Future of 4D World Reconstruction
8:09
8:09
Play later
Play later
Lists
Like
Liked
8:09
Phased Consistency Model2BP: 2-Stage BackpropagationGFlow: Recovering 4D World from Monocular VideoInstruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction TuningLLaMA-NAS: Efficient Neural Architecture Search for Large Language Models
…
continue reading
1
Vision-Language Models, Arithmetic Transformers, Next-Gen Video Editing:
10:20
10:20
Play later
Play later
Lists
Like
Liked
10:20
An Introduction to Vision-Language ModelingTransformers Can Do Arithmetic with the Right EmbeddingsMatryoshka Multimodal ModelsI2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion ModelsZamba: A Compact 7B SSM Hybrid ModelLooking Backward: Streaming Video-to-Video Translation with Feature Banks…
…
continue reading
1
ConvLLaVA's Visual Compression, Efficient LLVM, Multilingual Aya 23, and AutoCoder's Code Mastery
11:11
11:11
Play later
Play later
Lists
Like
Liked
11:11
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal ModelsMeteor: Mamba-based Traversal of Rationale for Large Language and Vision ModelsGrokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of GeneralizationAya 23: Open Weight Releases to Further Multilingual ProgressStacking Your Transformers: A Close…
…
continue reading
1
Revolution in Image Generation, Thermodynamic Gradient Descent, DMD2 for Fast Synthesis, Distributed Speculative Inference
10:56
10:56
Play later
Play later
Lists
Like
Liked
10:56
…
continue reading
1
Language Model Mysteries, Personalized Image Generation, Audio-Visual Transformer Innovations, DeepSeek-Prover, Dense Connector: MLLM Potential
10:31
10:31
Play later
Play later
Lists
Like
Liked
10:31
ReVideo: Remake a Video with Motion and Content ControlNot All Language Model Features Are LinearRectifID: Personalizing Rectified Flow with Anchored Classifier GuidanceVisual Echoes: A Simple Unified Transformer for Audio-Visual GenerationDeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic DataDense Connector for MLLMs…
…
continue reading
1
Transformer Linearity, Face-Adapter Diffusion Models, Cross-Layer Attention Shrinks LLMs, Image Generation Breakthrough
10:14
10:14
Play later
Play later
Lists
Like
Liked
10:14
Your Transformer is Secretly LinearDiffusion for World Modeling: Visual Details Matter in AtariFace Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute ControlReducing Transformer Key-Value Cache Size with Cross-Layer AttentionOmniGlue: Generalizable Feature Matching with Foundation Model GuidancePersonalized Residuals for C…
…
continue reading
1
Infinite Video Generation, High-Rank Fine-Tuning, Modular LLMs with LoRA Libraries
9:18
9:18
Play later
Play later
Lists
Like
Liked
9:18
FIFO-Diffusion: Generating Infinite Videos from Text without TrainingMoRA: High-Rank Updating for Parameter-Efficient Fine-TuningOpenRLHF: An Easy-to-use, Scalable and High-performance RLHF FrameworkImp: Highly Capable Large Multimodal Models for Mobile DevicesOcto: An Open-Source Generalist Robot PolicyTowards Modular LLMs by Building and Reusing …
…
continue reading
1
Tailoring Language Models for Science, Scaling Laws in NLP, Grounded 3D-LLM Innovations, Efficient Large Model Inference
9:30
9:30
Play later
Play later
Lists
Like
Liked
9:30
INDUS: Effective and Efficient Language Models for Scientific ApplicationsObservational Scaling Laws and the Predictability of Language Model PerformanceGrounded 3D-LLM with Referent TokensLayer-Condensed KV Cache for Efficient Inference of Large Language ModelsDynamic data sampler for cross-language transfer learning in large language models…
…
continue reading
1
Chameleon's Multimodal Breakthrough, LoRA's Learning Efficiency, Many-Shot In-Context Learning, Object Detection Innovation, Text-to-3D Generation
10:29
10:29
Play later
Play later
Lists
Like
Liked
10:29
Chameleon: Mixed-Modal Early-Fusion Foundation ModelsLoRA Learns Less and Forgets LessMany-Shot In-Context Learning in Multimodal Foundation ModelsCAT3D: Create Anything in 3D with Multi-View Diffusion ModelsGrounding DINO 1.5: Advance the "Edge" of Open-Set Object DetectionDual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode…
…
continue reading
1
Efficient Multimodality, Vision Suite's Custom Data, EEG Music Decoding Advances, Mobile Video Breakthrough
8:44
8:44
Play later
Play later
Lists
Like
Liked
8:44
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language ModelsXmodel-VLM: A Simple Baseline for Multimodal Vision Language ModelBEHAVIOR Vision Suite: Customizable Dataset Generation via SimulationNaturalistic Music Decoding from EEG Data via Latent Diffusion ModelsNo Time to Waste: Squeeze Time into Channel for Mobile Vide…
…
continue reading
1
Transformer Models Beyond Scaling, Multilingual Image Synthesis, Advanced Text-to-Image Control
9:28
9:28
Play later
Play later
Lists
Like
Liked
9:28
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion ModelsBeyond Scaling Laws: Understanding Transformer Performance with Associative MemoryCoin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided ConditioningHunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Unde…
…
continue reading
1
Vision-Language Model Design, Online RLHF Workflow, Multilingual AI, AI Memory Solution
9:41
9:41
Play later
Play later
Lists
Like
Liked
9:41
What matters when building vision-language models?RLHF Workflow: From Reward Modeling to Online RLHFSUTRA: Scalable Multilingual Language Model ArchitectureSambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of ExpertsPlot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from …
…
continue reading
1
BlenderAlchemy Revolution, Stylus Adapter Magic, DressCode Digital Fashion
10:04
10:04
Play later
Play later
Lists
Like
Liked
10:04
BlenderAlchemy: Editing 3D Graphics with Vision-Language ModelsStylus: Automatic Adapter Selection for Diffusion ModelsAg2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action RepresentationsDressCode: Autoregressively Sewing and Generating Garments from Text GuidancePLLaVA : Parameter-free LLaVA Extension from Images to V…
…
continue reading
1
Real-Time Motion Control, Next-Gen Visual Captions, 3D Scene Reconstruction Innovations
11:30
11:30
Play later
Play later
Lists
Like
Liked
11:30
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency ModelVisual Fact Checker: Enabling High-Fidelity Detailed Caption GenerationGS-LRM: Large Reconstruction Model for 3D Gaussian SplattingSAGS: Structure-Aware 3D Gaussian SplattingInvisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting…
…
continue reading
1
Kolmogorov-Arnold Networks, Iterative Reasoning Optimization, Extending Llama-3 Context Length
11:24
11:24
Play later
Play later
Lists
Like
Liked
11:24
KAN: Kolmogorov-Arnold NetworksInstantFamily: Masked Attention for Zero-shot Multi-ID Image GenerationBetter & Faster Large Language Models via Multi-token PredictionIterative Reasoning Preference OptimizationExtending Llama-3's Context Ten-Fold Overnight
…
continue reading
1
Innovative Image Editing, Advanced Autonomous Tracking, and the Evolution of Open-Source AI
12:10
12:10
Play later
Play later
Lists
Like
Liked
12:10
Paint by Inpaint: Learning to Add Image Objects by Removing Them FirstSelf-Play Preference Optimization for Language Model AlignmentAutomatic Creative Selection with Cross-Modal MatchingSTT: Stateful Tracking with Transformers for Autonomous DrivingOctopus v4: Graph of language models
…
continue reading
1
GPT-4 Rival Models, Revolutionizing Open Source LM Evaluation, StoryDiffusion's Visual Narrative Breakthrough
11:31
11:31
Play later
Play later
Lists
Like
Liked
11:31
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language ModelsWildChat: 1M ChatGPT Interaction Logs in the WildStoryDiffusion: Consistent Self-Attention for Long-Range Image and Video GenerationLoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical ReportLLM-AD: Large Language Model based Audio Description System…
…
continue reading