登录查看更多内容

Interesting Content in AI, Software, Business, and Tech- 6/21/2023

Devansh Devansh

Chocolate Milk Cult Leader| Machine Learning Engineer| Writer | AI Researcher| | Computational Math, Data Science, Software Engineering, Computer Science

发布日期: 2023年6月21日

A lot of people reach out to me for reading recommendations. I figured I'd start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won't always be the most recent publications- just the ones I'm paying attention to this week. Without further ado, here are interesting readings/viewings for 6/21/2023.?If you missed last week's readings, you can find it here.

Join 35K+ tech leaders and get insights on the most important ideas in AI straight to your inbox through my free newsletter- AI Made Simple

AI Papers/Writeups

The False Promise of Imitating Proprietary LLMs

An emerging method to cheaply improve a weaker language model is to finetune it on outputs from a stronger model, such as a proprietary system like ChatGPT (e.g., Alpaca, Self-Instruct, and others). This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model. In this work, we critically analyze this approach. We first finetune a series of LMs that imitate ChatGPT using varying base model sizes (1.5B--13B), data sources, and imitation data amounts (0.3M--150M tokens). We then evaluate the models using crowd raters and canonical NLP benchmarks. Initially, we were surprised by the output quality of our imitation models -- they appear far better at following instructions, and crowd workers rate their outputs as competitive with ChatGPT. However, when conducting more targeted automatic evaluations, we find that imitation models close little to none of the gap from the base LM to ChatGPT on tasks that are not heavily supported in the imitation data. We show that these performance discrepancies may slip past human raters because imitation models are adept at mimicking ChatGPT's style but not its factuality. Overall, we conclude that model imitation is a false promise: there exists a substantial capabilities gap between open and closed LMs that, with current methods, can only be bridged using an unwieldy amount of imitation data or by using more capable base LMs. In turn, we argue that the highest leverage action for improving open-source models is to tackle the difficult challenge of developing better base LMs, rather than taking the shortcut of imitating proprietary systems.

Found this through Sebastian Raschka, PhD 's great Twitter here- https://twitter.com/rasbt/status/1670956682409816064

Scaling Laws for Neural Language Models

https://arxiv.org/abs/2001.08361

Abstract- We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.

By - Jared Kaplan , Sam McCandlish , Tom Henighan , Tom B. Brown, Ben Chess , Rewon Child , Scott Gray , Alec Radford, Jeffrey Wu , Dario Amodei

Real-time detection of robotic traffic in online advertising

https://www.amazon.science/publications/real-time-detection-of-robotic-traffic-in-online-advertising

Detecting robotic traffc at scale on online ads needs an approach that is scalable, comprehensive, precise, and can rapidly respond to changing traffic patterns. In this paper we describe SLIDR or SLIce-Level Detection of Robots, a realtime deep neural network model trained with weak supervision to identify invalid clicks on online ads. We ensure fairness across different traffc slices by formulating a convex optimization problem that allows SLIDR to achieve optimal performance on individual traffc slices with a budget on overall false positives. SLIDR has been deployed since 2021 and safeguards advertiser campaigns on Amazon against robots clicking on ads on the e-commerce site. We describe some of the important lessons learned by deploying SLIDR that include guardrails that prevent updates of anomalous models and disaster recovery mechanisms to mitigate or correct decisions made by a faulty model.

The Curse of Recursion: Training on Generated Data Makes Models Forget

https://arxiv.org/abs/2305.17493

Abstract- Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such language models to the general public. It is now clear that large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images. In this paper we consider what the future might hold. What will happen to GPT-{n} once LLMs contribute much of the language found online? We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs. We build theoretical intuition behind the phenomenon and portray its ubiquity amongst all learned generative models. We demonstrate that it has to be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of content generated by LLMs in data crawled from the Internet.

Found this gem in the exceptional Davis Summarizes Papers newsletter by? Davis Blalock ?here. Worth subscribing to if you're interested in Machine Learning. Check the edition I'm referring to here.

Reader Spotlight-? Mohnish Jagwani

If you're looking to hire a very talented sales representative, you should reach out to Mohnish. He's got some great experience with fund raising and B2B sales and is a real go getter. Attaching his resume below. He's based in India, so the costs to hire him will also be relatively low, especially given his skills. Find his resume over here.

If you're doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you've written, an interesting project you've worked on, some personal challenge you're working on, your content platform, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in the community. No costs/obligations attached.

Cool Vids-

Building high-performing teams | Melissa Tan (Webflow, Dropbox, Canva) Lenny Rachitsky Melissa Tan

领英推荐

The 5 Best Generative AI Chatbots Everyone Should Know…

Bernard Marr 1 年前

How to Bypass ZeroGPT AI Detection? (10 Free Solutions)

Parul Gautam 10 个月前

How to Bypass Copyleaks AI Detection

Shushant Lakhyani 10 个月前

Why do some artists become famous? Albert-Laszlo Barabasi at the Big Think

Attention for Neural Networks, Clearly Explained!!! Joshua Starmer PhD

Why Do Neural Networks Love the Softmax? DJ Rich

SVD Visualized, Singular Value Decomposition explained | SEE Matrix , Chapter 3

I'll catch y'all with more of these next week. In the meanwhile if you'd like to find me, here are my social links-

Reach out to me

Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.

Small Snippets about Tech, AI and Machine Learning over here

AI Newsletter- https://artificialintelligencemadesimple.substack.com/

My grandma's favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/

Check out my other articles on Medium. :?https://rb.gy/zn1aiu

My YouTube:?https://rb.gy/88iwdd

Reach out to me on LinkedIn. Let’s connect:?https://rb.gy/m5ok2y

My Instagram:?https://rb.gy/gmvuy9

My Twitter:?https://twitter.com/Machine01776819

要查看或添加评论，请登录

Devansh Devansh的更多文章

Most important AI Developments to watch for Founders and Investors: March 2025

2025年3月25日

Most important AI Developments to watch for Founders and Investors: March 2025

This market research was originally published here. Sign up for free to join over 150K founders, investors, and tech…

3 条评论
5 AI Studies Every Builder Must Know (but probably doesn’t)

2025年3月15日

5 AI Studies Every Builder Must Know (but probably doesn’t)

Thank you Tampa for all the love. I couldn’t spend enough to meet all of you who reached out, but I really appreciate…

2 条评论
What you should know in AI, Software, Business, and Tech- 3/4/2025

2025年3月6日

What you should know in AI, Software, Business, and Tech- 3/4/2025

A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI…

2 条评论
The Next AI Revolution: Why Diffusion-Based Language (like Mercury LLM) Models Are a Big Deal

2025年2月28日

The Next AI Revolution: Why Diffusion-Based Language (like Mercury LLM) Models Are a Big Deal

This breakdown was originally published here. To ensure that you get such high-quality articles delivered straight to…

2 条评论
Why Research is Expensive

2025年2月21日

Why Research is Expensive

When most people think about the costs of research, they focus on the obvious: expensive equipment, competitive…

1 条评论
What you should know in AI, Software, Business, and Tech- 2/19/2025

2025年2月20日

What you should know in AI, Software, Business, and Tech- 2/19/2025

Before we get into the AI stuff, I saw "Capt America, Brave New World" today. Didn't really go in with many…

5 条评论
How Google uses AI to save Millions of Dollars on Computing Chip Design

2025年2月12日

How Google uses AI to save Millions of Dollars on Computing Chip Design

Following is an excerpt from my article- "AI x Computing Chips: How to Use Artificial Intelligence to Design Better…

1 条评论
The Chinese Philosophy to Building Flexible AI

2025年1月31日

The Chinese Philosophy to Building Flexible AI

Why you should read “The Tao Te Ching” by Lao Tzu, AI Edition This was originally published in my Free Newsletter-…

8 条评论
How to develop the most Important Skill Required for AI

2025年1月27日

How to develop the most Important Skill Required for AI

How you should learn Math for to get Good at AI The following is an excepr from my article, “What Math do you need to…

5 条评论
Why Deepseek is sharing their R1 AI Model publically

2025年1月24日

Why Deepseek is sharing their R1 AI Model publically

Understanding the misunderstood business of Open Source Software Deepseek's R1 model, which is competitive with…

15 条评论

See all articles

Interesting Content in AI, Software, Business, and Tech- 6/21/2023

Devansh Devansh

Chocolate Milk Cult Leader| Machine Learning Engineer| Writer | AI Researcher| | Computational Math, Data Science, Software Engineering, Computer Science

Join 35K+ tech leaders and get insights on the most important ideas in AI straight to your inbox through my free newsletter- AI Made Simple

AI Papers/Writeups

The False Promise of Imitating Proprietary LLMs

Scaling Laws for Neural Language Models

Real-time detection of robotic traffic in online advertising

The Curse of Recursion: Training on Generated Data Makes Models Forget

Reader Spotlight-? Mohnish Jagwani

Cool Vids-

领英推荐

Reach out to me

Devansh Devansh的更多文章

社区洞察

其他会员也浏览了

Perceptions and expectations of ChatGPT

LLMs for Growth: Find the Best AI Model for Your Needs

Why AI Chatbots Hallucinate: Understanding the Causes and Solutions

Top 10 Generative AI Innovations in 2022

The Rise of Generative AI: Tools Like ChatGPT and MidJourney

What is an AI Agent? A Beginners Guide

ChatGPT vs. DeepSeek: The Battle of AI Titans

Evolution of GPT and Microsoft's Ecosystem encompassing AI

Perplexity AI: The AI Research Assistant You Didn't Know You Needed

Stop getting ChatGPT and GPT3 confused!

Join 35K+ tech leaders and get insights on the most important ideas in AI straight to your inbox through my free newsletter- AI Made Simple

AI Papers/Writeups

The False Promise of Imitating Proprietary LLMs

Scaling Laws for Neural Language Models

Real-time detection of robotic traffic in online advertising

The Curse of Recursion: Training on Generated Data Makes Models Forget

Reader Spotlight-? Mohnish Jagwani

Cool Vids-

领英推荐

Reach out to me

Devansh Devansh的更多文章

Most important AI Developments to watch for Founders and Investors: March 2025

5 AI Studies Every Builder Must Know (but probably doesn’t)

What you should know in AI, Software, Business, and Tech- 3/4/2025

The Next AI Revolution: Why Diffusion-Based Language (like Mercury LLM) Models Are a Big Deal

Why Research is Expensive

What you should know in AI, Software, Business, and Tech- 2/19/2025

How Google uses AI to save Millions of Dollars on Computing Chip Design

The Chinese Philosophy to Building Flexible AI

How to develop the most Important Skill Required for AI

Why Deepseek is sharing their R1 AI Model publically

社区洞察

其他会员也浏览了

Perceptions and expectations of ChatGPT

LLMs for Growth: Find the Best AI Model for Your Needs

Why AI Chatbots Hallucinate: Understanding the Causes and Solutions

Top 10 Generative AI Innovations in 2022

The Rise of Generative AI: Tools Like ChatGPT and MidJourney

What is an AI Agent? A Beginners Guide

ChatGPT vs. DeepSeek: The Battle of AI Titans

Evolution of GPT and Microsoft's Ecosystem encompassing AI

Perplexity AI: The AI Research Assistant You Didn't Know You Needed

Stop getting ChatGPT and GPT3 confused!