登录查看更多内容

点击“继续加入或登录”，即表示您同意遵守领英的《用户协议》、《隐私政策》及《Cookie 政策》。

Interesting Content in AI, Software, Business, and Tech- 04/03/2024

Devansh Devansh

Chocolate Milk Cult Leader| Machine Learning Engineer| Writer | AI Researcher| | Computational Math, Data Science, Software Engineering, Computer Science

发布日期: 2024年4月3日

A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 03/27/2024. If you missed last week’s readings, you can find it here.

Reminder- We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/. If you’d like to stay on top of community events and updates, join the discord for our cult here: https://discord.com/invite/EgrVtXSjYf.

Community Spotlight: Tasting History with Max Miller

"Tasting History with Max Miller" is a super interesting YouTube channel that digs through history by going through recipes in old manuscripts. It's always super interesting to see Max go into how those recipes teach us things about the that particular time period and geography. Personally, I don't even care about the food aspects: the historical deep dives into how the culture has evolved is what keeps me subbed. If you're a history nerd, check it out. I'll share a video from them in this reading list.

If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.

Previews

Curious about what articles I’m working on? Here are the previews for the next planned articles-

Tech Made Simple

How do you generate the following:

AI Made Simple

Flooding x AI

Join 150K+ tech leaders and get insights on the most important ideas in AI straight to your inbox through my free newsletter- AI Made Simple

Highly Recommended

These are pieces that I feel are particularly well done. If you don’t have much time, make sure you at least catch these works.

AutoBNN: Probabilistic time series forecasting with compositional bayesian neural networks

I'll have to study this in more detail, but the idea is definitely very interesting. In the meantime, would love to hear from TSF experts like Valeriy Manokhin, PhD, MBA, CQF

Time series problems are ubiquitous, from forecasting weather and traffic patterns to understanding economic trends. Bayesian approaches start with an assumption about the data's patterns (prior probability), collecting evidence (e.g., new time series data), and continuously updating that assumption to form a posterior probability distribution. Traditional Bayesian approaches like Gaussian processes (GPs) and Structural Time Series are extensively used for modeling time series data, e.g., the commonly used Mauna Loa CO2 dataset. However, they often rely on domain experts to painstakingly select appropriate model components and may be computationally expensive. Alternatives such as neural networks lack interpretability, making it difficult to understand how they generate forecasts, and don't produce reliable confidence intervals.

To that end, we introduce AutoBNN, a new open-source package written in JAX. AutoBNN automates the discovery of interpretable time series forecasting models, provides high-quality uncertainty estimates, and scales effectively for use on large datasets. We describe how AutoBNN combines the interpretability of traditional probabilistic approaches with the scalability and flexibility of neural networks.

Mamba Explained

A very in-depth explanation of the Mamba architecture that might replace Transformers. Another great writeup by the people at The Gradient

Mamba, however, is one of an alternative class of models called State Space Models (SSMs). Importantly, for the first time, Mamba promises similar performance (and crucially similar scaling laws) as the Transformer whilst being feasible at long sequence lengths (say 1 million tokens). To achieve this long context, the Mamba authors remove the “quadratic bottleneck” in the Attention Mechanism. Mamba also runs fast - like “up to 5x faster than Transformer fast”

...

Here we’ll discuss:

The advantages (and disadvantages) of Mamba (??) vs Transformers (??),
Analogies and intuitions for thinking about Mamba, and
What Mamba means for Interpretability, AI Safety and Applications.

What it was like to visit a Medieval Tavern

The aforementioned recommendation from Max's channel.

The Unreasonable Ineffectiveness of the Deeper Layers

Given how many people are exploring efficient LLM training, this is worth reading

We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed. To prune these models, we identify the optimal block of layers to prune by considering similarity across layers; then, to "heal" the damage, we perform a small amount of finetuning. In particular, we use parameter-efficient finetuning (PEFT) methods, specifically quantization and Low Rank Adapters (QLoRA), such that each of our experiments can be performed on a single A100 GPU. From a practical perspective, these results suggest that layer pruning methods can complement other PEFT strategies to further reduce computational resources of finetuning on the one hand, and can improve the memory and latency of inference on the other hand. From a scientific perspective, the robustness of these LLMs to the deletion of layers implies either that current pretraining methods are not properly leveraging the parameters in the deeper layers of the network or that the shallow layers play a critical role in storing knowledge.

The 3 Species That Break Genetics

Scientists have discovered a group of three closely related flowers that seem to break the laws of genetics. These mountain beardtongues are pollinated by either bees or butterflies, but not both, and that's the key to an incredibly weird quirk of natural selection.

Simplest explanation on hierarchical softmax

As a tree supremacist, this is the kind of development that gets me hot and bothered. Very cool research into improving the inference of LLMs by focusing cutting down how many words are looked at. This will be pretty interesting to combine with fine-tuning and possibly RAG to nudge LLMs towards certain directions. Great writeup by Dhruvil Karani

When computing the full softmax, the resulting probability distribution is usually skewed. This means that out of thousands of possible words, only a handful are plausible choices, which is logical. Most English words don’t fit in the blank - I love to play ____. Yet, we compute the probabilities for the entire vocabulary. This is suboptimal.

Can we avoid computing the probability of obviously unlikely words? The answer is yes, and this is what hierarchical softmax achieves.

Complete Summary of Absolute, Relative and Rotary Position Embeddings!

Great compilation of the research done by a member of our cult. Aziz does great research summaries, so check him out if you're looking for more technical/research-focused resources.

Position embeddings have been used a lot in recent LLMs. In this article, I explore the concept behind them and discuss the different types of position embeddings and their differences.

March 2024 - AI Tidbits Monthly Roundup

Welcome to the March edition of AI Tidbits Monthly, where we uncover the latest and greatest in AI. This month has been filled with groundbreaking announcements from industry leaders and exciting progress in open-source AI, showcasing the rapid advancements in the field.

Interest clubs, maintenance cycles, and personal work ??

?? Luca Rossi writes some of my favorite productivity, Software Engineering, and Leadership content out there. You should check him out.

The Extinction of GoPro

An amazing case-study into how GoPro ruined their first mover advantage. The most important lesson is at minute 34- GoPro ignored its core market (adventure sports people) to instead focus on a group that didn't really need them (mass-market that already had cell phones)/ Misunderstanding your customer/market can wipe out any technical advantage, moat, or advantage in resources.

In the 2010s, there was one startup who by the measures of Silicon Valley and Wall Street, seemed destined to be the next big billion-dollar consumer brand. That company was GoPro. GoPro took the world by storm with its game-changing cameras. With radically compact design, tiny form factor, high portability, rugged waterproof exteriors, and reasonable picture quality - GoPro cameras were able to capture never before seen action and perspectives. GoPro was category-leading and category-defining - the company had effectively created and owned an entire category of cameras. It was the pioneer, golden standard, and household name as GoPro was not just the name of the product and company, but also became the unofficial label for any small, portable, action camera on the market.

Yet fast forward to present-day in 2023, less than a decade later, and GoPro’s stock has dropped 95%. How could a company who had all the right ingredients from the measures of Silicon Valley and Wall Street, squander it all in such a short period of time? How could having a market-defining, category-leading product be worth so little? In this episode, we’ll cover the rise and fall of GoPro into 3 eras, their failures in strategy, and how the company’s collapse serves as a crucial lesson on the importance of knowing your market.

Devansh Devansh的更多文章

5 AI Studies Every Builder Must Know (but probably doesn’t)

2025年3月15日

5 AI Studies Every Builder Must Know (but probably doesn’t)

Thank you Tampa for all the love. I couldn’t spend enough to meet all of you who reached out, but I really appreciate…

2 条评论
What you should know in AI, Software, Business, and Tech- 3/4/2025

2025年3月6日

What you should know in AI, Software, Business, and Tech- 3/4/2025

A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI…

2 条评论
The Next AI Revolution: Why Diffusion-Based Language (like Mercury LLM) Models Are a Big Deal

2025年2月28日

The Next AI Revolution: Why Diffusion-Based Language (like Mercury LLM) Models Are a Big Deal

This breakdown was originally published here. To ensure that you get such high-quality articles delivered straight to…

2 条评论
Why Research is Expensive

2025年2月21日

Why Research is Expensive

When most people think about the costs of research, they focus on the obvious: expensive equipment, competitive…

1 条评论
What you should know in AI, Software, Business, and Tech- 2/19/2025

2025年2月20日

What you should know in AI, Software, Business, and Tech- 2/19/2025

Before we get into the AI stuff, I saw "Capt America, Brave New World" today. Didn't really go in with many…

5 条评论
How Google uses AI to save Millions of Dollars on Computing Chip Design

2025年2月12日

How Google uses AI to save Millions of Dollars on Computing Chip Design

Following is an excerpt from my article- "AI x Computing Chips: How to Use Artificial Intelligence to Design Better…

1 条评论
The Chinese Philosophy to Building Flexible AI

2025年1月31日

The Chinese Philosophy to Building Flexible AI

Why you should read “The Tao Te Ching” by Lao Tzu, AI Edition This was originally published in my Free Newsletter-…

8 条评论
How to develop the most Important Skill Required for AI

2025年1月27日

How to develop the most Important Skill Required for AI

How you should learn Math for to get Good at AI The following is an excepr from my article, “What Math do you need to…

5 条评论
Why Deepseek is sharing their R1 AI Model publically

2025年1月24日

Why Deepseek is sharing their R1 AI Model publically

Understanding the misunderstood business of Open Source Software Deepseek's R1 model, which is competitive with…

15 条评论
What you should know in AI, Software, Business, and Tech- 1/22/2025

2025年1月23日

What you should know in AI, Software, Business, and Tech- 1/22/2025

A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI…

2 条评论

See all articles

Interesting Content in AI, Software, Business, and Tech- 04/03/2024

Devansh Devansh

Chocolate Milk Cult Leader| Machine Learning Engineer| Writer | AI Researcher| | Computational Math, Data Science, Software Engineering, Computer Science

Community Spotlight: Tasting History with Max Miller

Previews

Highly Recommended

Other Content

Reach out to me

Devansh Devansh的更多文章

社区洞察

Community Spotlight: Tasting History with Max Miller

Previews

Highly Recommended

Other Content

Reach out to me

Devansh Devansh的更多文章

5 AI Studies Every Builder Must Know (but probably doesn’t)

What you should know in AI, Software, Business, and Tech- 3/4/2025

The Next AI Revolution: Why Diffusion-Based Language (like Mercury LLM) Models Are a Big Deal

Why Research is Expensive

What you should know in AI, Software, Business, and Tech- 2/19/2025

How Google uses AI to save Millions of Dollars on Computing Chip Design

The Chinese Philosophy to Building Flexible AI

How to develop the most Important Skill Required for AI

Why Deepseek is sharing their R1 AI Model publically

What you should know in AI, Software, Business, and Tech- 1/22/2025

社区洞察