AIRethinking the Role of PPO in RLHF – The Berkeley Artificial Intelligence Research Blog By TheCryptocurrencyPost October 16, 2023 7 Mins read Rethinking the Role of PPO in RLHF TL;DR: In RLHF, there’s tension between the reward learning phase, which uses human preference in… Read more