AI This Week: Karpathy’s Game-Changing GPT-2 Training, Codestral’s Language Mastery, and More!
Karpathy has released a method to train GPT-2 models quickly and cost-effectively. Training a tiny GPT-2 (124M parameters) model takes about 90 minutes and $20 using an 8xA100 GPU. The 350M version requires 14 hours and around $200. Training the full 1.6B model takes one week and $2.5k.
This release is part of Karpathy’s llm.c repository, which focuses on LLM training in simple, pure C/CUDA. There’s no need for large frameworks like PyTorch (245MB) or cPython (107MB).
The training uses the FineWeb dataset, considered higher quality than the original WebText. FineWeb comprises simple English text with minimal math or code, enabling more efficient model capacity use and addressing diminishing returns seen in the original GPT-2’s 100B tokens.
Trending Signals
Top Repos
领英推荐
PR-Agent is a collection of tools and features built around leveraging LLMs to assist with various tasks related to code review and development workflows in a collaborative software environment.
Tarsier is a tool suite that solves the following problems regarding using LLMs for web interaction: Feeding the webpage to the LLM (HTML, Accessibility Tree, Screenshot), Mapping LLM responses back to web elements, Informing a text-only LLM about the page’s visual structure.
RT-DETR repository is the official implementation of the paper DETRs Beat YOLOs On Real-Time Object Detection. It presents Real-Time DEtection TRansformer (RT-DETR, aka RTDETR), the first real-time end-to-end object detector, outperforming previously advanced YOLOs in both speed and accuracy.
Subscribe to Newsletter : https://lnkd.in/guxfrUSM