?I’m about to embark on a high-stakes negotiation with my wife that our household?urgently?requires an Apple M3 Studio Ultra with?512GB of Unified RAM?(yes, you read that right—512 gigs of URAM). What’s in the cube: DeepSeek-R1 (the real one 671B) at home, on my desk would mean our future AGI overlord might respect me. 819GB/s RAM bandwidth 80-core GPU, because if I’m going to lose an argument about “why we need this,” I might as well simulate the entire conversation in a real-time s2s LLM. 32-core CPU perfect for compiling my excuses about “why the credit card bill is high” in parallel threads. Now, the?negotiation strategy: Frame the ROI with “Honey, think of it as a?space-saving cube! It’s smaller than a breadbox but delivers petaflops of ‘honey-do list’ efficiency.”?? ?“Remember how you optimized our closet with modular shelving? This is just…?thermal-efficient computational shelving.” Potential Counterarguments: “Why not just use the cloud?” ?Make sure you own your AI. AI in the cloud is not aligned with you; it’s aligned with the company that owns it. ?“What’s wrong with your current setup?” Response: [Loads PowerPoint on "The Moral Imperative of 510GB GPU VRAM"]. P.S.?If you need me, I’ll be rehearsing my pitch in front of the mirror… and benchmarking how many LLMs and SLMs it takes to crash my current rig. With 192 GB VRAM. Spoiler: It’s 14 Make sure you own your AI. AI in the cloud is not aligned with you; it’s aligned with the company that owns it.
the WAF on Projects DIGITS is going to be low, may be good to have a plan b during negotiations (you can decide which one is plan a)
Awesome! Tell your wife that "Tristan from the internet" supports your idea!
Mitko Vasilev At this price tag, why not give the NVIDIA-based platform a try? Its MGX is actually quite compact.
I'm going to be nice this year and ask Santa to bring me one
I'm gonna need 4 of them.
$4K-$14K google says. Your $12K tag is top-tier. Another shamelessly overpriced Apple toy. I personally wouldn't buy something so fragile and fast-depreciating, unless I could replace it instantly if it broke. (Yes, this means I drive a cheap car ?? ).
I'd love to see someone run: ramalama run deepseek-r1:671b or ramalama run r1-1776:671b # the new uncensored version from perplexity on one of these!
I recently built a 768GB workstation and have gone through a similar negotiation :) (but I think the unified memory approach is better and another plausible alternative is integrated GPU such as AMD Ryzen Al Max+ - but not sure the performance comparison with Apple silicon or if the CPU can support enough memory channels) The age of large memory is dawning on us!
Have you played around with distributed inference over RPC in llama.cpp Mitko Vasilev ? I found it reasonably easy to set up (though not as easy as Exo, which sadly doesn't yet support multiple GPU nodes) and it scales pretty well even over gigabit ethernet. Regardless of that I also really want an M3 Ultra so let me know your playbook when you strike a winner ??
Senior Security Engineer @ Precision Castparts | CISSP, CCSP
6 天前I'm excited to find out what you achieve for t/s on this box with a large model like Deepseek R1 671B. I have a couple of Nvidia Tesla P40 GPUs in a Dell R720 to play around with (because cheap); I can run a Llama 3 70B model Q4 with slightly-faster-than-reading speed output, but I'd like to run larger models with higher precision (at least Q6) in the near future. Agree about "owning" your AI.