Interesting Content in AI, Software, Business, and Tech- 04/03/2024
Devansh Devansh
Chocolate Milk Cult Leader| Machine Learning Engineer| Writer | AI Researcher| | Computational Math, Data Science, Software Engineering, Computer Science
A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 03/27/2024. If you missed last week’s readings, you can find it here.
Reminder- We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/. If you’d like to stay on top of community events and updates, join the discord for our cult here: https://discord.com/invite/EgrVtXSjYf.
Community Spotlight: Tasting History with Max Miller
"Tasting History with Max Miller" is a super interesting YouTube channel that digs through history by going through recipes in old manuscripts. It's always super interesting to see Max go into how those recipes teach us things about the that particular time period and geography. Personally, I don't even care about the food aspects: the historical deep dives into how the culture has evolved is what keeps me subbed. If you're a history nerd, check it out. I'll share a video from them in this reading list.
If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.
Previews
Curious about what articles I’m working on? Here are the previews for the next planned articles-
How do you generate the following:
Flooding x AI
Highly Recommended
These are pieces that I feel are particularly well done. If you don’t have much time, make sure you at least catch these works.
I'll have to study this in more detail, but the idea is definitely very interesting. In the meantime, would love to hear from TSF experts like Valeriy Manokhin, PhD, MBA, CQF
Time series problems are ubiquitous, from forecasting weather and traffic patterns to understanding economic trends. Bayesian approaches start with an assumption about the data's patterns (prior probability), collecting evidence (e.g., new time series data), and continuously updating that assumption to form a posterior probability distribution. Traditional Bayesian approaches like Gaussian processes (GPs) and Structural Time Series are extensively used for modeling time series data, e.g., the commonly used Mauna Loa CO2 dataset. However, they often rely on domain experts to painstakingly select appropriate model components and may be computationally expensive. Alternatives such as neural networks lack interpretability, making it difficult to understand how they generate forecasts, and don't produce reliable confidence intervals.
To that end, we introduce AutoBNN, a new open-source package written in JAX. AutoBNN automates the discovery of interpretable time series forecasting models, provides high-quality uncertainty estimates, and scales effectively for use on large datasets. We describe how AutoBNN combines the interpretability of traditional probabilistic approaches with the scalability and flexibility of neural networks.
A very in-depth explanation of the Mamba architecture that might replace Transformers. Another great writeup by the people at The Gradient
Mamba, however, is one of an alternative class of models called State Space Models (SSMs). Importantly, for the first time, Mamba promises similar performance (and crucially similar scaling laws) as the Transformer whilst being feasible at long sequence lengths (say 1 million tokens). To achieve this long context, the Mamba authors remove the “quadratic bottleneck” in the Attention Mechanism. Mamba also runs fast - like “up to 5x faster than Transformer fast”
...
Here we’ll discuss:
The aforementioned recommendation from Max's channel.
Given how many people are exploring efficient LLM training, this is worth reading
We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed. To prune these models, we identify the optimal block of layers to prune by considering similarity across layers; then, to "heal" the damage, we perform a small amount of finetuning. In particular, we use parameter-efficient finetuning (PEFT) methods, specifically quantization and Low Rank Adapters (QLoRA), such that each of our experiments can be performed on a single A100 GPU. From a practical perspective, these results suggest that layer pruning methods can complement other PEFT strategies to further reduce computational resources of finetuning on the one hand, and can improve the memory and latency of inference on the other hand. From a scientific perspective, the robustness of these LLMs to the deletion of layers implies either that current pretraining methods are not properly leveraging the parameters in the deeper layers of the network or that the shallow layers play a critical role in storing knowledge.
Scientists have discovered a group of three closely related flowers that seem to break the laws of genetics. These mountain beardtongues are pollinated by either bees or butterflies, but not both, and that's the key to an incredibly weird quirk of natural selection.
As a tree supremacist, this is the kind of development that gets me hot and bothered. Very cool research into improving the inference of LLMs by focusing cutting down how many words are looked at. This will be pretty interesting to combine with fine-tuning and possibly RAG to nudge LLMs towards certain directions. Great writeup by Dhruvil Karani
When computing the full softmax, the resulting probability distribution is usually skewed. This means that out of thousands of possible words, only a handful are plausible choices, which is logical. Most English words don’t fit in the blank - I love to play ____. Yet, we compute the probabilities for the entire vocabulary. This is suboptimal.
Can we avoid computing the probability of obviously unlikely words? The answer is yes, and this is what hierarchical softmax achieves.
Great compilation of the research done by a member of our cult. Aziz does great research summaries, so check him out if you're looking for more technical/research-focused resources.
Position embeddings have been used a lot in recent LLMs. In this article, I explore the concept behind them and discuss the different types of position embeddings and their differences.
Welcome to the March edition of AI Tidbits Monthly, where we uncover the latest and greatest in AI. This month has been filled with groundbreaking announcements from industry leaders and exciting progress in open-source AI, showcasing the rapid advancements in the field.
?? Luca Rossi writes some of my favorite productivity, Software Engineering, and Leadership content out there. You should check him out.
An amazing case-study into how GoPro ruined their first mover advantage. The most important lesson is at minute 34- GoPro ignored its core market (adventure sports people) to instead focus on a group that didn't really need them (mass-market that already had cell phones)/ Misunderstanding your customer/market can wipe out any technical advantage, moat, or advantage in resources.
In the 2010s, there was one startup who by the measures of Silicon Valley and Wall Street, seemed destined to be the next big billion-dollar consumer brand. That company was GoPro. GoPro took the world by storm with its game-changing cameras. With radically compact design, tiny form factor, high portability, rugged waterproof exteriors, and reasonable picture quality - GoPro cameras were able to capture never before seen action and perspectives. GoPro was category-leading and category-defining - the company had effectively created and owned an entire category of cameras. It was the pioneer, golden standard, and household name as GoPro was not just the name of the product and company, but also became the unofficial label for any small, portable, action camera on the market.
Yet fast forward to present-day in 2023, less than a decade later, and GoPro’s stock has dropped 95%. How could a company who had all the right ingredients from the measures of Silicon Valley and Wall Street, squander it all in such a short period of time? How could having a market-defining, category-leading product be worth so little? In this episode, we’ll cover the rise and fall of GoPro into 3 eras, their failures in strategy, and how the company’s collapse serves as a crucial lesson on the importance of knowing your market.
Other Content
Temu is everywhere, promising that you can shop like a billionaire buying $10 wireless speakers, $12 sneakers, $20 drones and other cheap gadgets, clothes, backed with the promise of free shipping, 90-day returns, 30-day price adjustments, and deliveries within 2 weeks. But Temu isn’t the first to sell generic, unbranded, mass-produced Chinese products online at radically low prices. Before Temu, there was AliExpress and Wish - who both went to market decades ago with the exact same value prop, unbelievably low prices, and wacky advertising.
Wish was the earliest entrant into this space and the SF-based startup was once one of Silicon Valley’s darling unicorns. It all begs the question - how exactly do these companies stay alive selling $5-10 items online? In this episode, we’ll cover the business of selling cheap Chinese-made junk online through the rise and fall of Wish, the persistence of AliExpress, and the sudden emergence of Temu - and how all of this ties back to greed, growth and Silicon Valley.
Algebraic geometry is often presented as the study of zeroes of polynomial equations. But it's really about something much deeper: the duality between abstract algebra and geometry.
Meant to share this earlier.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Check out my other articles on Medium.?: https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819