71 subscribers
Go offline with the Player FM app!
Podcasts Worth a Listen
SPONSORED


1 The Secret To Getting Inspired: Millie Bobby Brown & Chris Pratt Go Behind The Scenes 21:04
Reward Models | Data Brew | Episode 40
Manage episode 472480859 series 2814833
In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).
Highlights include:
- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.
- Techniques like Policy Proximal Optimization (PPO) and Direct Preference
Optimization (DPO) for enhancing response quality.
- The role of reward models in improving coding, math, reasoning, and other NLP tasks.
Connect with Brandon Cui:
https://www.linkedin.com/in/bcui19/
42 episodes
Manage episode 472480859 series 2814833
In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).
Highlights include:
- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.
- Techniques like Policy Proximal Optimization (PPO) and Direct Preference
Optimization (DPO) for enhancing response quality.
- The role of reward models in improving coding, math, reasoning, and other NLP tasks.
Connect with Brandon Cui:
https://www.linkedin.com/in/bcui19/
42 episodes
All episodes
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.