DEEP SEEK -- DOWN THE WHALE'S GULLET
Deep Seek
One word: Don't
Let's leave aside that it's a Chinese model, and you have to agree to terms that give away your privacy information and data to an adversarial nation. Apparently, that isn't enough to scare off some corporate entities and a whole lot of people. It's like TikTok: You have to be insane to be on that platform, but it's your life....
Anyway, Deep Seek does many things from a technical perspective that makes it available to run (in its smaller incarnations) on your average laptop or desktop computer with no special hardware required, primarily GPUs. The two biggies are MoEs and Mixed/Low precision. MoE stands for "Mixture of Experts" and is a technique where you have what amount to a bunch of tiny neural networks cooperating to solve a problem. These take the place of huge NNs that take a lot of computing horsepower to run; you have a lot of small processing resources running while your broader landscape "sleeps". Then:
领英推荐
There is "Mixed Precision" or "Low Precision". In machine learning -- and NNs in particular -- Memory is "quantized", think of data word lengths by way of analogy: You have 64 bit, 32 bit... all the way down to 4 bit. You save a LOT of computing resources when you use the lower quanta -- or "Low Precision" -- but the trade off is your AI model is less accurate. It's actually a common technique, most notably used by model providers like META and Google to make small models that will run on conventional PCs; I have used this technique in my own work with a research team that needs to reduce their GPU memory footprint. But again, what you gain in resource conservation you lose in accuracy. So although the Internet is effing gaga over the wonders of Deep Seek, they completely neglect this little caveat.
Speaking of the Internet, it's gone nuts over the Big Splash Deep Seek has made and touts it as the latest wonder. Yeah, well, Deep Seek has been available for nearly a year on a website called "ollama.com" along with dozens of other models that do natural language processing, computer vision, coding, web-scraping.... just about anything you can think of. And nearly all of them will run on your very own laptop. Did I mention the models Deep Seek was built on? Deep Seek's foundation was laid with META's LLAMA and Ali Baba's QWEN models (also available on ollama.com). There were a few others, but those were the primary brick layers. So yes, most of Deep Seek's code is open-sourced because it's, well, somebody else's.
The upshot is this: Deep Seek may be wonderful at siphoning your personal data and shipping it off to Beijing. But would you trust it with a cancer diagnosis, an analysis of the global economy or even a recipe to bake bread?
Caveat Emptor.