AI This Week: Karpathy’s Game-Changing GPT-2 Training, Codestral’s Language Mastery, and More!

AI This Week: Karpathy’s Game-Changing GPT-2 Training, Codestral’s Language Mastery, and More!

Language Models - Karpathy unveils a guide to train GPT-2 in just 90 minutes with a budget of $20

Karpathy has released a method to train GPT-2 models quickly and cost-effectively. Training a tiny GPT-2 (124M parameters) model takes about 90 minutes and $20 using an 8xA100 GPU. The 350M version requires 14 hours and around $200. Training the full 1.6B model takes one week and $2.5k.

This release is part of Karpathy’s llm.c repository, which focuses on LLM training in simple, pure C/CUDA. There’s no need for large frameworks like PyTorch (245MB) or cPython (107MB).

The training uses the FineWeb dataset, considered higher quality than the original WebText. FineWeb comprises simple English text with minimal math or code, enabling more efficient model capacity use and addressing diminishing returns seen in the original GPT-2’s 100B tokens.

Trending Signals

Top Repos

Coding Tools

PR-Agent is a collection of tools and features built around leveraging LLMs to assist with various tasks related to code review and development workflows in a collaborative software environment.

Web Agents

Tarsier is a tool suite that solves the following problems regarding using LLMs for web interaction: Feeding the webpage to the LLM (HTML, Accessibility Tree, Screenshot), Mapping LLM responses back to web elements, Informing a text-only LLM about the page’s visual structure.

Object Detection

RT-DETR repository is the official implementation of the paper DETRs Beat YOLOs On Real-Time Object Detection. It presents Real-Time DEtection TRansformer (RT-DETR, aka RTDETR), the first real-time end-to-end object detector, outperforming previously advanced YOLOs in both speed and accuracy.


Subscribe to Newsletter : https://lnkd.in/guxfrUSM

要查看或添加评论,请登录

Jerome Fernandes的更多文章

社区洞察

其他会员也浏览了