#055 Embedding Intelligence: AI's Move to the Edge

How AI Is Built

Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

4M ago 1:05:35

MP3•Episode home

while everyone races to cloud-scale LLMs, Pete Warden is solving AI problems by going completely offline. No network connectivity required.

Today I have the chance to talk to Pete Warden, CEO of Useful Sensors and author of the TinyML book.

His philosophy: if you can't explain to users exactly what happens to their data, your privacy model is broken.

Key Insight: The Real World Action Gap

LLMs excel at text-to-text transformations but fail catastrophically at connecting language to physical actions. There's nothing in the web corpus that teaches a model how "turn on the light" maps to sending a pin high on a microcontroller.

This explains why every AI agent demo focuses on booking flights and API calls - those actions are documented in text. The moment you step off the web into real-world device control, even simple commands become impossible without custom training on action-to-outcome data.

Pete's company builds speech-to-intent systems that skip text entirely, going directly from audio to device actions using embeddings trained on limited action sets.

💡 Core Concepts

Speech-to-Intent: Direct audio-to-action mapping that bypasses text conversion, preserving ambiguity until final classification

ML Sensors: Self-contained circuit boards processing sensitive data locally, outputting only simple signals without exposing raw video/audio

Embedding-Based Action Matching: Vector representations mapping natural language variations to canonical device actions within constrained domains

⏱ Important Moments

Real World Action Problem: [06:27] LLMs discuss turning on lights but lack training data connecting text commands to device control

Apple Intelligence Challenges: [04:07] Design-led culture clashes with AI accuracy limitations

Speech-to-Intent vs Speech-to-Text: [12:01] Breaking audio into text loses critical ambiguity information

Limited Action Set Strategy: [15:30] Smart speakers succeed by constraining to ~3 functions rather than infinite commands

8-Bit Quantization: [33:12] Remains deployment sweet spot - processor instruction support matters more than compression

On-Device Privacy: [47:00] Complete local processing provides explainable guarantees vs confusing hybrid systems

🛠 Tools & Tech

Whisper: github.com/openai/whisper

Moonshine: github.com/usefulsensors/moonshine

TinyML Book: oreilly.com/library/view/tinyml/9781492052036

Stanford Edge ML: github.com/petewarden/stanford-edge-ml

📚 Resources

Looking to Listen Paper: looking-to-listen.github.io

Lottery Ticket Hypothesis: arxiv.org/abs/1803.03635

Connect: [email protected] | petewarden.com | usefulsensors.com

Beta Opportunity: Moonshine browser implementation for client-side speech processing in

JavaScript

63 episodes

#Tech #Nicolay Gerold #Technology #LLM #Machine Learning #Data Engineering