Transitions
Once upon a time in the land of traditional compute, I would have been excited about the latest announcement coming from Cupertino California. The M(n) series chips have made the Apple laptops quite desirable in terms of working remotely. The caveat here is that most of the "power" of the M(n) series goes unused as it simply doesn't run ML workloads efficiently. Sure, I can run Ollama locally, and work with quantized models, but efficient is not the word I'd use.
For example, I've been teaching students how to write a RAG agent locally and generate the embeddings for the Chroma database using Gemma and Llama, both fine models with great context windows. But, and yes, there's a but, the amount of time it takes to translate chunks into embeddings locally is bad on the M(n) chips (in tens of minutes for 8 documents) and exponentially worse on older machines without Metal or other GPU support.
The point being is there are very few capable machines of executing these models effectively and efficiently without cloud computing or heavy metal resources.
This inflection point triggered an episode of comparisons over time and made me realize that until we can commoditize these resource constraints, the Cloud is indeed the new MainFrame and/or Super Computer and aside from the academic world, only the pay-to-play enterprises will be able to effectively run complex compute tasks that will shape their industries.
In the 1970s companies like Walmart, The Home Depot, First National Bank, (and many FIs) began adopting the computational power or Mainframes and large compute with an ~ten to twenty year ROI. Those investments arguably pushed these companies into the top brands they are today.
How does this apply? Well, to simply put it, the Mainframe / Tandem / High Performance computer was the only heavy-metal capable of running the computations required to balance books, inventory, supply chain, and transactional boundaries that ushered in the digital age. And now with GPUs not being manufactured fast enough to meet demand and large scale commercial contracts, the cloud is the only way to realize the future of the next ten to twenty years, because no other companies will have the capabilities to execute in this space. I say that confidently because the technical debt paid by all of these founders has lead to the understanding of how to distribute payload effectively and efficiently.
This can be further underlined by the example above, using Google Gemini with the exact same code I'm teaching students with, we can generate embeddings for thousands of PDFs in the same time frame. That's the game changing advantage.
So, where does this bring us? Instead of running out and buying a new computer (unless you actually need one), you should consider if that investment would be better placed in buying a cloud subscription and using those resources to save time to run complex compute and take advantage of things like Model Garden on the Vertex Platform so you don't have to hire a team of engineers just to experiment.
I am excited about the world of Generative AI, and more importantly the fields of AI such as voice, vision, and text. But the understanding of these capabilities only being available to large scale computational powerhouses does have me rethinking my investments on personal and professional levels. I'm finding myself encouraging students to find hardware that can save them time, and if they can't find subscriptions that can make their goals attainable. Because if you're out there denouncing the possible outcomes, then you're most likely misunderstanding the distance to the horizon.
** Note - running on local heavy metal has a similar affect as the cloud, but WOULD NOT have the ROI on maintenance over time to just paying for the execution cost.
If you'd like to try these examples:
Local Execution (Using Ollama):
Cloud Execution (Same workflow but using Gemini):
Looking to take my 3 years of full-stack web development experience and apply it to any company or team looking to expand or take on someone new!!
4 个月As one of your students I can attest to all of the time and thought that went into this post / repos. Learning and experimenting with this has really provided a firm foundation to understand what really happenes when generating these RAG models and how important it is to have the cloud tools to not only use, but help you understand how to best implement the model to provide the desired result.