Mitko Vasilev的动态

?I’m about to embark on a high-stakes negotiation with my wife that our household?urgently?requires an Apple M3 Studio Ultra with?512GB of Unified RAM?(yes, you read that right—512 gigs of URAM). What’s in the cube: DeepSeek-R1 (the real one 671B) at home, on my desk would mean our future AGI overlord might respect me. 819GB/s RAM bandwidth 80-core GPU, because if I’m going to lose an argument about “why we need this,” I might as well simulate the entire conversation in a real-time s2s LLM. 32-core CPU perfect for compiling my excuses about “why the credit card bill is high” in parallel threads. Now, the?negotiation strategy: Frame the ROI with “Honey, think of it as a?space-saving cube! It’s smaller than a breadbox but delivers petaflops of ‘honey-do list’ efficiency.”?? ?“Remember how you optimized our closet with modular shelving? This is just…?thermal-efficient computational shelving.” Potential Counterarguments: “Why not just use the cloud?” ?Make sure you own your AI. AI in the cloud is not aligned with you; it’s aligned with the company that owns it. ?“What’s wrong with your current setup?” Response: [Loads PowerPoint on "The Moral Imperative of 510GB GPU VRAM"]. P.S.?If you need me, I’ll be rehearsing my pitch in front of the mirror… and benchmarking how many LLMs and SLMs it takes to crash my current rig. With 192 GB VRAM. Spoiler: It’s 14 Make sure you own your AI. AI in the cloud is not aligned with you; it’s aligned with the company that owns it.

  • graphical user interface, website
Mark Steffen ????

Senior Security Engineer @ Precision Castparts | CISSP, CCSP

6 天前

I'm excited to find out what you achieve for t/s on this box with a large model like Deepseek R1 671B. I have a couple of Nvidia Tesla P40 GPUs in a Dell R720 to play around with (because cheap); I can run a Llama 3 70B model Q4 with slightly-faster-than-reading speed output, but I'd like to run larger models with higher precision (at least Q6) in the near future. Agree about "owning" your AI.

Auro Tripathy

Servicing the AI last mile; Inference, SFT, RL. Investor, KubeCost. Advisor, Agentic App Startups

6 天前

the WAF on Projects DIGITS is going to be low, may be good to have a plan b during negotiations (you can decide which one is plan a)

Dr. Tristan Behrens

AI Engineer | Deep Learning | Large Language Models | Agents | Computational Music | Art | PhD in Computer Science

6 天前

Awesome! Tell your wife that "Tristan from the internet" supports your idea!

Matt W.

AI Infra. for People | AI Chips | Edge AI | Level-4 | Less Over-Regulation | Live Off the Land

6 天前

Mitko Vasilev At this price tag, why not give the NVIDIA-based platform a try? Its MGX is actually quite compact.

Mariusz Kurman

AI for Health | MD

6 天前

I'm going to be nice this year and ask Santa to bring me one

回复
David Stevens

Founder, architect, accelerant, shaper, scrappy servant leader, love 0-1, scale up 10-100, intersection of product-eng-business, exponential thinker, autodidact, challenger of the status quo, a crazy one, beach taker.

6 天前

I'm gonna need 4 of them.

Mark Kondor

Software QA Manager & Leader | Software & Quality Engineering | Automation & Strategy

6 天前

$4K-$14K google says. Your $12K tag is top-tier. Another shamelessly overpriced Apple toy. I personally wouldn't buy something so fragile and fast-depreciating, unless I could replace it instantly if it broke. (Yes, this means I drive a cheap car ?? ).

回复
Eric Curtin

Principal Software Engineer working on AI at Red Hat

6 天前

I'd love to see someone run: ramalama run deepseek-r1:671b or ramalama run r1-1776:671b # the new uncensored version from perplexity on one of these!

I recently built a 768GB workstation and have gone through a similar negotiation :) (but I think the unified memory approach is better and another plausible alternative is integrated GPU such as AMD Ryzen Al Max+ - but not sure the performance comparison with Apple silicon or if the CPU can support enough memory channels) The age of large memory is dawning on us!

Tom Richardson

Digital Insurance and Product Leader

5 天前

Have you played around with distributed inference over RPC in llama.cpp Mitko Vasilev ? I found it reasonably easy to set up (though not as easy as Exo, which sadly doesn't yet support multiple GPU nodes) and it scales pretty well even over gigabit ethernet. Regardless of that I also really want an M3 Ultra so let me know your playbook when you strike a winner ??

回复
查看更多评论

要查看或添加评论,请登录