Interesting Content in AI, Software, Business, and Tech- 6/21/2023
Devansh Devansh
Chocolate Milk Cult Leader| Machine Learning Engineer| Writer | AI Researcher| | Computational Math, Data Science, Software Engineering, Computer Science
A lot of people reach out to me for reading recommendations. I figured I'd start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won't always be the most recent publications- just the ones I'm paying attention to this week. Without further ado, here are interesting readings/viewings for 6/21/2023.?If you missed last week's readings, you can find it here.
AI Papers/Writeups
The False Promise of Imitating Proprietary LLMs
An emerging method to cheaply improve a weaker language model is to finetune it on outputs from a stronger model, such as a proprietary system like ChatGPT (e.g., Alpaca, Self-Instruct, and others). This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model. In this work, we critically analyze this approach. We first finetune a series of LMs that imitate ChatGPT using varying base model sizes (1.5B--13B), data sources, and imitation data amounts (0.3M--150M tokens). We then evaluate the models using crowd raters and canonical NLP benchmarks. Initially, we were surprised by the output quality of our imitation models -- they appear far better at following instructions, and crowd workers rate their outputs as competitive with ChatGPT. However, when conducting more targeted automatic evaluations, we find that imitation models close little to none of the gap from the base LM to ChatGPT on tasks that are not heavily supported in the imitation data. We show that these performance discrepancies may slip past human raters because imitation models are adept at mimicking ChatGPT's style but not its factuality. Overall, we conclude that model imitation is a false promise: there exists a substantial capabilities gap between open and closed LMs that, with current methods, can only be bridged using an unwieldy amount of imitation data or by using more capable base LMs. In turn, we argue that the highest leverage action for improving open-source models is to tackle the difficult challenge of developing better base LMs, rather than taking the shortcut of imitating proprietary systems.
Found this through Sebastian Raschka, PhD 's great Twitter here- https://twitter.com/rasbt/status/1670956682409816064
Scaling Laws for Neural Language Models
Abstract- We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.
By - Jared Kaplan , Sam McCandlish , Tom Henighan , Tom B. Brown, Ben Chess , Rewon Child , Scott Gray , Alec Radford, Jeffrey Wu , Dario Amodei
Real-time detection of robotic traffic in online advertising
https://www.amazon.science/publications/real-time-detection-of-robotic-traffic-in-online-advertising
Detecting robotic traffc at scale on online ads needs an approach that is scalable, comprehensive, precise, and can rapidly respond to changing traffic patterns. In this paper we describe SLIDR or SLIce-Level Detection of Robots, a realtime deep neural network model trained with weak supervision to identify invalid clicks on online ads. We ensure fairness across different traffc slices by formulating a convex optimization problem that allows SLIDR to achieve optimal performance on individual traffc slices with a budget on overall false positives. SLIDR has been deployed since 2021 and safeguards advertiser campaigns on Amazon against robots clicking on ads on the e-commerce site. We describe some of the important lessons learned by deploying SLIDR that include guardrails that prevent updates of anomalous models and disaster recovery mechanisms to mitigate or correct decisions made by a faulty model.
The Curse of Recursion: Training on Generated Data Makes Models Forget
Abstract- Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such language models to the general public. It is now clear that large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images. In this paper we consider what the future might hold. What will happen to GPT-{n} once LLMs contribute much of the language found online? We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs. We build theoretical intuition behind the phenomenon and portray its ubiquity amongst all learned generative models. We demonstrate that it has to be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of content generated by LLMs in data crawled from the Internet.
Found this gem in the exceptional Davis Summarizes Papers newsletter by? Davis Blalock ?here. Worth subscribing to if you're interested in Machine Learning. Check the edition I'm referring to here.
Reader Spotlight-? Mohnish Jagwani
If you're looking to hire a very talented sales representative, you should reach out to Mohnish. He's got some great experience with fund raising and B2B sales and is a real go getter. Attaching his resume below. He's based in India, so the costs to hire him will also be relatively low, especially given his skills. Find his resume over here.
If you're doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you've written, an interesting project you've worked on, some personal challenge you're working on, your content platform, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in the community. No costs/obligations attached.
Cool Vids-
Building high-performing teams | Melissa Tan (Webflow, Dropbox, Canva) Lenny Rachitsky Melissa Tan
领英推荐
Why do some artists become famous? Albert-Laszlo Barabasi at the Big Think
Attention for Neural Networks, Clearly Explained!!! Joshua Starmer PhD
Why Do Neural Networks Love the Softmax? DJ Rich
SVD Visualized, Singular Value Decomposition explained | SEE Matrix , Chapter 3
I'll catch y'all with more of these next week. In the meanwhile if you'd like to find me, here are my social links-
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Check out my other articles on Medium. :?https://rb.gy/zn1aiu
My YouTube:?https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect:?https://rb.gy/m5ok2y
My Instagram:?https://rb.gy/gmvuy9
My Twitter:?https://twitter.com/Machine01776819