An unexpected RL Renaissance New talk! Forecasting the Alpaca moment for reasoning models and why the new style of RL training is a far bigger deal than the emergence of RLHF. YouTube: https://lnkd.in/gEEnN9UN Slides: https://lnkd.in/g9rcQ4jh More info: