AI This Week: Karpathy’s Game-Changing GPT-2 Training, Codestral’s Language Mastery, and More!

Jerome Fernandes

?? Experimenting with AI in Digital Marketing

发布日期: 2024年5月30日

Language Models - Karpathy unveils a guide to train GPT-2 in just 90 minutes with a budget of $20

Karpathy has released a method to train GPT-2 models quickly and cost-effectively. Training a tiny GPT-2 (124M parameters) model takes about 90 minutes and $20 using an 8xA100 GPU. The 350M version requires 14 hours and around $200. Training the full 1.6B model takes one week and $2.5k.

This release is part of Karpathy’s llm.c repository, which focuses on LLM training in simple, pure C/CUDA. There’s no need for large frameworks like PyTorch (245MB) or cPython (107MB).

The training uses the FineWeb dataset, considered higher quality than the original WebText. FineWeb comprises simple English text with minimal math or code, enabling more efficient model capacity use and addressing diminishing returns seen in the original GPT-2’s 100B tokens.

Trending Signals

Top Repos

领英推荐

??? Three months of AI in six charts

Azeem Azhar 1 年前

OpenAI’s New GPT-4o Mini Is Giving Competitors A Run…

ARK Investment Management LLC 2 个月前

AI eats software

Azeem Azhar 1 年前

Coding Tools

PR-Agent is a collection of tools and features built around leveraging LLMs to assist with various tasks related to code review and development workflows in a collaborative software environment.

Web Agents

Tarsier is a tool suite that solves the following problems regarding using LLMs for web interaction: Feeding the webpage to the LLM (HTML, Accessibility Tree, Screenshot), Mapping LLM responses back to web elements, Informing a text-only LLM about the page’s visual structure.

Object Detection

RT-DETR repository is the official implementation of the paper DETRs Beat YOLOs On Real-Time Object Detection. It presents Real-Time DEtection TRansformer (RT-DETR, aka RTDETR), the first real-time end-to-end object detector, outperforming previously advanced YOLOs in both speed and accuracy.

Subscribe to Newsletter : https://lnkd.in/guxfrUSM

要查看或添加评论，请登录

Jerome Fernandes的更多文章

AI This Week: Open Source Triumphs, New Models, and Cutting-Edge Innovations

2024年6月19日

AI This Week: Open Source Triumphs, New Models, and Cutting-Edge Innovations

This week, the world of AI has been buzzing with exciting developments, particularly in the open-source domain. Here…
AI This Week: Unleashing the Power of AI - From Text-to-Video to Mobile Optimization

2024年6月13日

AI This Week: Unleashing the Power of AI - From Text-to-Video to Mobile Optimization

Trending Signals Text-to-Video: Luma AI has unveiled Dream Machine, a free and realistic text-to-video model that is…

1 条评论
AI This Week: Revolutionizing Language Models and More!

2024年6月11日

AI This Week: Revolutionizing Language Models and More!

Top News Architecture: Eliminating Matrix Multiplication (MatMul) from LLMs The paper “Scalable MatMul-free Language…
AI This Week: Siri’s New ChatGPT Powers and a GPT in a Spreadsheet!

2024年6月10日

AI This Week: Siri’s New ChatGPT Powers and a GPT in a Spreadsheet!

In the ever-evolving landscape of artificial intelligence, this week has seen some remarkable developments that are…
Alibaba’s Qwen2 Shakes Up the AI World, Outperforming Meta’s Llama3

2024年6月9日

Alibaba’s Qwen2 Shakes Up the AI World, Outperforming Meta’s Llama3

Top News Alibaba’s New Open Model, Qwen2 Alibaba has announced the Qwen2 AI model, an advanced version of its previous…
AI This Week: Unleashing the Power of Fine-Tuning and Object Detection!

2024年6月7日

AI This Week: Unleashing the Power of Fine-Tuning and Object Detection!

Top News Mistral introduces an API for fine-tuning Mistral AI has launched an SDK and services to fine-tune its models.…
AI This Week: Unveiling Mamba-2, The Game-Changer in State Space Model Architecture

2024年6月5日

AI This Week: Unveiling Mamba-2, The Game-Changer in State Space Model Architecture

Mamba-2: The New State Space Model Architecture After the success of Mamba-1, researchers Tri Dao and Albert Gu have…
AI This Week: Claude’s New Feature Revolutionizes AI Interactions

2024年6月1日

AI This Week: Claude’s New Feature Revolutionizes AI Interactions

Language Models: Anthropic’s Claude Now Lets You Work Smarter Anthropic has launched a groundbreaking feature for its…
AI This Week: From Personal Assistants to Groundbreaking Papers

2024年5月28日

AI This Week: From Personal Assistants to Groundbreaking Papers

Welcome to another edition of “AI This Week,” where we bring you the latest and most significant happenings in the…
AI This Week: Unleashing the Power of AI - From Autonomous Assistants to Multilingual Models

2024年5月26日

AI This Week: Unleashing the Power of AI - From Autonomous Assistants to Multilingual Models

Top Repo: AI Assistants PhiData is a new framework for building Autonomous Assistants, also known as Agents. These…

See all articles

AI This Week: Karpathy’s Game-Changing GPT-2 Training, Codestral’s Language Mastery, and More!

Jerome Fernandes

?? Experimenting with AI in Digital Marketing

Trending Signals

Top Repos

领英推荐

Jerome Fernandes的更多文章

社区洞察

其他会员也浏览了

??Top ML Papers of the Week

The Position Encoding In Transformers!

Issue #229 - THE ML ENGINEER ??

AI-Powered Autocomplete Lets you Code in Natural Language

Artificial Intelligence #106

Solving Complex Problems Using FastAPI, LangChain, and GPT-4 Enhanced by OCR and Graph-Based Tools

The Software Industry's "Kodak Moment" - When Code Writes Itself

??Top ML Papers of the Week

Fine-Tuning LLMs with Your Data

??Top ML Papers of the Week

Trending Signals

Top Repos

领英推荐

Jerome Fernandes的更多文章

AI This Week: Open Source Triumphs, New Models, and Cutting-Edge Innovations

AI This Week: Unleashing the Power of AI - From Text-to-Video to Mobile Optimization

AI This Week: Revolutionizing Language Models and More!

AI This Week: Siri’s New ChatGPT Powers and a GPT in a Spreadsheet!

Alibaba’s Qwen2 Shakes Up the AI World, Outperforming Meta’s Llama3

AI This Week: Unleashing the Power of Fine-Tuning and Object Detection!

AI This Week: Unveiling Mamba-2, The Game-Changer in State Space Model Architecture

AI This Week: Claude’s New Feature Revolutionizes AI Interactions

AI This Week: From Personal Assistants to Groundbreaking Papers

AI This Week: Unleashing the Power of AI - From Autonomous Assistants to Multilingual Models

社区洞察

其他会员也浏览了

??Top ML Papers of the Week

The Position Encoding In Transformers!

Issue #229 - THE ML ENGINEER ??

AI-Powered Autocomplete Lets you Code in Natural Language

Artificial Intelligence #106

Solving Complex Problems Using FastAPI, LangChain, and GPT-4 Enhanced by OCR and Graph-Based Tools

The Software Industry's "Kodak Moment" - When Code Writes Itself

??Top ML Papers of the Week

Fine-Tuning LLMs with Your Data

??Top ML Papers of the Week