Joy JILI casino login download,Go JILI casino.REGISTER NOW GET FREE 888 PESOS REWARDS!

Welcome to Continual Learnings

A weekly newsletter for practitioners building ML-powered products.

What we're reading this week

Open-source language models are finally starting to work:

For the past couple of years, one of the big stories in LLMs was how hard it has been for open source to catch up with OpenAI and their other closed-source competitors like Anthropic and Cohere. This week, open source LLMs started to show signs of life
Large language models are having their Stable Diffusion moment: Simon Willison argues that two factors: (i) the leak of Facebook’s LLaMA model weights and (ii) a library called llama.cpp that allows you to run LLaMA relatively efficiently on a MacBook are creating the conditions for LLMs to have a stable diffusion-like explosion of creativity
OpenChatKit: OpenChatKit is a new open source model designed for building chatbots. Notably, it is fine-tuned on a large dataset of instructions, which was one of the two most important factors (alongside RLHF) in achieving ChatGPT’s performance

Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference

Shows that you can meaningfully reduce the cost of LLM inference by running hyperparameter optimization on the inference hyperparameters

?

ChatAug: Leveraging ChatGPT for Text Data Augmentation

Given some examples of behavior you want the model to exhibit, you can use ChatGPT to generate similar examples. Those can be used to train a smaller language model that achieves similar performance on the target task

?

Automatically Auditing Large Language Models via Discrete Optimization

Evaluating language models is hard because it’s difficult to achieve “coverage” of all of the cases that might cause problems for the model. This paper proposes an automatic approach to finding examples with negative properties, like derogatory completions about celebrities or French inputs that generate English completions?

Production ML papers to know

In this series, we cover important papers to know if you build ML-powered product.

How to teach an old model new tricks

If you want a single machine learning model that can solve a variety of image classification tasks, you might look to an?open-vocabulary?model like CLIP.

CLIP achieves near-state-of-the-art zero-shot performance on certain classification tasks, but not all (e.g., not even MNIST). Ideally, we’d be able to use a small amount of data to adapt the model to new tasks as we encounter them. But naively fine-tuning it to improve performance on a new task leads to performance degradation on older ones (the “catastrophic forgetting” problem).

Today’s?paper?proposes a solution, based on the idea?model patching.

Painting with Interpolation

The goal of patching is to updating the weights of your model so that they are better suited to the new task, while retaining performance on the original task.

The paper introduces a patching method called Patching with Interpolation (PAINT). The paper links to a repo with a helpful python implementation:

To summarize, PAINT fine-tunes a model like normal on the new task. But rather than using the fine-tuned weights, it uses an interpolation between those weights and the original ones. The interpolation coefficient alpha is chosen by cross-validation.

This process is for patching on a single task. The paper provides three ways to patch on multiple tasks:?joint patching, where all patching tasks are merged into a single task before the above procedure is run; sequential patching, where the patching procedure is done sequentially on each new task; and parallel patching, where the first step of each task is run in parallel.

Large Models are easier to patch

The authors tested PAINT on a range of image classification tasks, including supported tasks (which a model - typically a CLIP pre-trained Vision transformer (ViT) - has been trained on), and patching tasks (on which a zero-shot CLIP model performs poorly compared to a specialized model).

Performance for patching models on a single task is summarized in the chart below.

On nine tasks where zero-shot CLIP performs poorly, PAINT increases accuracy by 15 to 60 percentage points while preserving accuracy on ImageNet within one percentage point of the zero-shot model.

PAINT works better with larger models. They are closer in accuracy to specialized models than smaller models (left chart), require less interpolation to fit new data (middle chart), and have higher cosine similarity between the weights of the unpatched and fine-tuned models (right chart).

Performance for patching models on multiple tasks is summarized in the chart below, which shows model accuracies for two different ViT models, patched using the different methods outlined above, against a range of tasks.

A single CLIP model, patched on nine image classification tasks, is “competitive” against specialized models for each task. Joint patching is the best-performing method on average, with parallel patching the worst performing method.

The paper also demonstrates how PAINT enables broad transfer. A ViT patched on one half of a dataset improves its accuracy on the other half of the dataset, despite the presence of disjoint classes between the two halves.

The Upshot

At Continual Learnings, we love a simple technique that works well. This appears it could be one, though there are some clear limitations (for example, accuracy on old tasks can still decrease, especially for smaller models).

The paper also provides applications for PAINT beyond the experiments covered in the paper, such as patching the vulnerabilities of CLIP models to typographic attacks (where text superimposed on an image leads to misclassification).

If you work with open vocabulary models, or are interested more generally in how models can be adapted to new tasks without retraining, then this paper is well worth checking out.

You can find the paper here.

How to teach an old model new tricks

Gantry

Continuously improve your ML-powered products

Welcome to Continual Learnings

What we're reading this week

Production ML papers to know

How to teach an old model new tricks

领英推荐

Painting with Interpolation

Large Models are easier to patch

The Upshot

Continual Learnings

1,226 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Build a chatbot with Llama 2 and LangChain

ChatGPT-3.5 vs. ChatGPT-4: A Comparison of the Latest GPT-based Chatbot Web Apps

I Asked ChatGPT about GOOGLE Bard.......... I got this ??

A neat new feature in the new year: DEX RS and ChatGPT.

Elastacloud on OpenAI

ChatGPT and the LLM revolution: A private markets perspective

How ChatGPT Became Possible - Rise of LLMs

ChatGPT 4, What's new here?

What is Google's Bard and how is it different from ChatGPT?

GPT-4o and its Applications for Free Users

Welcome to Continual Learnings

What we're reading this week

Production ML papers to know

How to teach an old model new tricks

领英推荐

Painting with Interpolation

Large Models are easier to patch

The Upshot

Continual Learnings

1,226 位关注者

Putting Responsible AI into Practice

2023年2月23日

How to measure language model performance

2023年2月16日

Why do ML Projects Fail?

2023年1月26日

Monolith: The Recommendation System Behind TikTok

2023年1月10日

MLOps at Industrial-Scale: Lessons from Google

2022年12月23日

From prompt magic to prompt engineering?

2022年12月14日

How do people actually operationalize ML in 2022?

2022年12月7日

Do You Really Need a Feature Store?

2022年11月30日

社区洞察

其他会员也浏览了

Build a chatbot with Llama 2 and LangChain

ChatGPT-3.5 vs. ChatGPT-4: A Comparison of the Latest GPT-based Chatbot Web Apps

I Asked ChatGPT about GOOGLE Bard.......... I got this ??

A neat new feature in the new year: DEX RS and ChatGPT.

Elastacloud on OpenAI

ChatGPT and the LLM revolution: A private markets perspective

How ChatGPT Became Possible - Rise of LLMs

ChatGPT 4, What's new here?

What is Google's Bard and how is it different from ChatGPT?

GPT-4o and its Applications for Free Users