Go offline with the Player FM app!
RLHF Workflow: From Reward Modeling to Online RLHF
Manage episode 418218460 series 3524393
The paper introduces Online Iterative Reinforcement Learning from Human Feedback (RLHF) workflow, achieving superior performance in large language models using open-source datasets and proxy human feedback.
https://arxiv.org/abs//2405.07863
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1168 episodes
Manage episode 418218460 series 3524393
The paper introduces Online Iterative Reinforcement Learning from Human Feedback (RLHF) workflow, achieving superior performance in large language models using open-source datasets and proxy human feedback.
https://arxiv.org/abs//2405.07863
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1168 episodes
All episodes
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.