Exploring Google Gemini and Vertex AI Platform for a Swift iOS App
Tapadyuti Chatterjee
Staff Software Engineer at Walmart Labs | GCP PCA Expert | IEEE & IEEE Computer Society Member | Ex at Morgan Stanley , SAP | I Build Mobile Apps for Fun (IOS & Android)
If you've been tracking my past articles, you'd be already aware that I've recently picked up Android App development again, finding joy in it as my weekend projects. ( If not, no worries ??, check it out here: "Rediscovering Android App Development after six years" )
TL;DR;
Took a leap of faith and resumed iOS App development. Identified a feature in my App that could benefit from LLM Integration. Researched available options. Liked Google Gemini and Vertex AI Platform from Google. Sharing my findings in this article
I have always thought of giving iOS app development a try, but getting started felt challenging. Few things kept me away, I need a Mac computer to develop iOS apps, which can be a significant investment. Simulators aside, an Apple device is necessary to test my app as I build it and On top of everything, publishing your app on the App Store requires an annual membership of the Apple Developer Program. ( currently $99 a year ). ( Tried out cross platform app development frameworks ( Link 1, Link 2 ) , but trust me those still require you to have a physical apple device for development completion and a developer account !! )
I was randomly reading Swift documentations at Apple Developer Website ( Swift is The language for iOS App Development, gone are the days of Objective C etc. ), and was pleasantly surprised how far it has come from it's early days.
Apple now has declarative syntax of SwiftUI. Apple also introduced SwiftData in WWDC 2023 which streamlines data storage and retrieval for an App. Xcode ( Apple's IDE for iOS development ) seemed to have incorporated some new tricks under it's belt. There are bunch of other stuffs that caught my eye, but I will save it for another post.
For a while, an App concept had been on my mind. I decided to take a bold move and began developing it in Swift for iOS. I'll hold off on sharing specifics for now since the App is still a work in progress. However, there's one feature that seemed to benefit from a "ChatGpt like functionality"
And let me tell you, "Now" is possibly the best and the worst time to start exploring Large Language Models. Seems like there are too many options out there and each has it's own nuances. Every other company presents their own model as "the best" and put up stats in support.
I have been a core backend developer for most of my career, building massively distributed fault tolerant applications. However, I was lucky to have worked with Machine Learning Models couple of times as well
While I try to stay updated on the latest advancements, I find it challenging to experiment with them all.
This App, however, gave me the opportunity to dive back in and get my hands dirty ??
So lets take a step back and understand few things !
Artificial Intelligence, Machine Learning and LLM Family of Models
I assume by now, you have heard about ChatGpt, Microsoft Copilot and Google Gemini. As of writing this article, I believe these are most popular LLM Based software products out there. Though ChatGpt and Google Gemini are LLM based models, Microsoft Copilot on the other hand integrates with applications and uses LLM under the hood to increase productivity.
So how does all of these fits in the world Of Artificial Intelligence? Let me explain with a diagram below:
Artificial Intelligence (AI) is a broader term, generally used where a computer attempts to mimic human intelligence. There are multiple field inside Artificial Intelligence, and one of them being Machine Learning.
Machine Learning (ML) is a specialized area of AI in which computer algorithms utilize data to recognize patterns, learn from them, and attempt to solve problems, gradually improving their performance over time.
Large Language Models (LLM) are specialized Machine Learning (ML) models that is trained on massive textual data. They learn from this data and generates human like responses. Thats why chatting with any of them seems as if you are talking to a human, because it learnt from us ??.
LLMs has been around for sometime, then why the craze now?
In my opinion, there's been an exponential increase in the volume of information available lately. When someone searches for an answer, they're likely to encounter hundreds of websites offering various versions of the same information. Many of these sites are eager to track your activity for "personalization purposes". For the average user, navigating through this seemingly endless stream of information, along with unwanted personalization, can be overwhelming. Personally, I'd prefer to access this information from a single source without distractions, rather than having it scattered across numerous sites. Moreover, as humans, we often want to ask follow-up questions within the same context. This aspect has greatly influenced the impact of LLM models on productivity. They enable us to obtain answers without switching contexts or splitting our focus across hundreds of search results.
Returning to the App I'm creating, my aim is to assist the user within their specific context.
As of the time of my research, Apple does not provide direct integrations to any of the LLMs providers, but rumor is they are in the talks with Open AI as well as Google for having similar capabilities in Iphone ( Link ). But not sure what will be available as part of Apple Developer program.
Apple does have Core ML Libraries (Read more about it here) which helps in Object / Text Detection, Classifications. But to me it didn't offer the feature and flexibility I was looking for.
I have previous experience working with Google's Cloud Platform, so naturally I started exploring Google's AI offerings.
Google's AI Offerings
Google has been Previewing "Bard" for some time. They recently re branded it as Gemini and expanded their suite. Gemini family of models have the following notable offerings in place ( Some of them are in early developer previews at the time of writing with probably more announcements coming on May 14th)
There are other offerings, but for the sake of simplicity, will consider the above.
Pricing Structures for Google's Pro AI Offerings
Will take a moment to discuss how Google's Offerings are priced by taking screen grab of Gemini 1.0 Pro Pricing
Broadly there are two tiers:
Let's take a moment to grasp how these Pay as you go pricing plans play out in real-world scenarios. For the sake of discussion, let's also assume we're developing an application that sends textual input to these models and expects textual output in return.
For example:
Input:
"Tell me a joke in few words"
Output:
"Why don't scientists trust atoms? Because they make up everything!"
Rate Limits
*These limits are not guaranteed and overall traffic on google infrastructure will impact your available throughput.
Pricing per token
These costs are just to use the Gemini Model and not the associated infrastructure when used through Vertex AI Platform (more about this later)
Perhaps I've over simplified the costing, but it should provide a rough idea of the expenses for running a hobby app. These figures are approximate since Google doesn't reveal precise details about internal workings and token usage. Moreover, Google plans to unveil the complete Pay-As-You-Go plans on May 14th.
Accessing the Gemini Models
Google provides two platforms to deploy and access Gemini Models, Google AI Studio and Vertex AI Platform. Will look into these in details in next section.
Google AI Studio
Google AI Studio gives a quick and easy way to get started with the Gemini models. Gives an intuitive UI to provision and access Pre Trained LLM models. It used to also support legacy PaLM models, however it has been deprecated now.
You can access Google AI Studio using the link: https://aistudio.google.com/
领英推荐
There are three main prompts that can be created in the Google AI Studio. Imagine creating a prompt is equivalent to provisioning a machine learning model.
Training and tuning models on Google AI Studio
Though Google AI Studio does not give you flexibility to train model ground up, however it allows you certain flexibility of tuning pre trained models using Json or CSV data. Imagine, after tuning you will have a fine tuned Gemini 1.0 Pro Model specifically for your use case.
One you have your model, you can generate any one of the above mentioned 3 prompts and test your model for Input and Outputs. If you are satisfied with the performance, it also gives you pre generated code to access your model through Gemini API. But these are NOT production ready code.
Also to access these models over an API, you need to generate an API key which is Unique to your account and a cloud project. This is a secret and should not be exposed to outside world. As you enable billing, malicious actors might get hold of your API key and you end up paying for it's usages.
Vertex AI on Google Cloud Platform
Vertex AI on the other hand is more fleshed out solution offered by Google Cloud Platform ( GCP ) for End to End ML model life cycle management. If you are planning to go beyond prototyping and deploy a production ready app, this is the correct platform to integrate with.
Though I believe I have just skimmed the surface of Vertex AI during my exploration, let's take next few sections to go through what it offers
Model Garden
It offers a collection of models beyond Gemini Family. It names it Model Garden
You can select various Foundational Models as well as Fine Tuned Models to choose from. This is well beyond what Gemini has to offer. Also Pricing is dependent on each model you use.
Model Life Cycle Management
Vertex AI platform gives you entire suit of tools to build, train, deploy, and manage machine learning models. For example
Generative AI on Vertex AI
Vertex AI gives access to Generative AI models like Gemma, Gemini etc. As of today this experience is a bit Fragmented between Google AI Studio and Vertex AI. You can read more about this here: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/overview
Integrating Google AI in a Swift iOS App
In my view, the integration of Google LLM ML models deployed in Google Cloud with a mobile app is currently somewhat fragmented.
There are two approaches to integrating them with these models:
Approach Suitable for Experimentation and Proof-of-concept purposes
This method is primarily suitable for rapidly testing proof of concept applications and should not be used in a production application.
Google AI SDK for Swift: Google provides an AI SDK for Swift Developers.
To utilize it:
import GoogleGenerativeAI
// For text-only input, use the gemini-pro model
// Access your API key from your on-demand resource .plist file (see "Set up your API key" above)
let model = GenerativeModel(name: "gemini-pro", apiKey: APIKey.default)
let prompt = "Write a story about a magic backpack."
let response = try await model.generateContent(prompt)
if let text = response.text {
print(text)
}
Refer to the google documentation available here: https://ai.google.dev/gemini-api/docs/get-started/swift
This approach is not recommended. Google itself has issued a warning in its documentation regarding its usage.
The reason for this caution is that encoding the API key within the App's package poses a risk of exposure through reverse engineering.
For further insights, you may find this article interesting: Secure Secrets in iOS App.
These risks escalate when billing is enabled on your account. If the API key falls into the wrong hands, significant financial losses can occur, potentially amounting to thousands of dollars.
Framework for Integrating Gemini Machine Learning Models into an iOS App
A cleaner approach would involve utilizing intermediary backend services.
This zero-trust policy, coupled with communication over HTTPS channels, would help in secure communication.
Bringing it all together
Looking ahead
If you've made it this far, I hope you found this article insightful. Please pardon any errors I may have made; constructive feedback is always welcome. Over the next few weekends, I'll continue developing my application whenever time allows. As it's my first submission to the Apple App Store, I hope it meets all the required standards.
As always, I'd love to hear your comments, suggestions, or thoughts.
I'm enthusiastic about contributing to free, open-source projects that benefit our community! Let's collaborate!
Feel free to reach out with any questions or ideas. Let's embark on this exciting journey of discovery and creation together!
PS
*Opinions expressed are solely my own and do not express the views or opinions of my employer or any other entity.
*All trademarks belongs to respective copyright owners, I have used them here for illustrative purposes
Senior Software Engineer@Walmart
10 个月Very thorough writing Tapa! Nice!