Unsloth AI转发了
We teamed up with Hugging Face to release a free GRPO notebook that fine-tunes Gemma 3 into a powerful reasoning model! Using Unsloth AI, OpenAI’s math dataset and custom reward functions, we fine-tune Google’s Gemma 3 (1B) to generate chain-of-thought reasoning. Free Colab Notebook: https://lnkd.in/e94SKJz4 Summary of what you'll learn: ? Implement chain-of-thought reasoning in Google's Gemma 3 (1B) using 16-bit LoRA ? Make tiny LLMs benefit from GRPO ? Understand reward functions ? Prepare your data + evaluate your LLM Join HF's Course: https://lnkd.in/e_PhX4tc Thank you Ben Burtenshaw for being patient and working with us on this collab! ??