Exploring Google Gemini and Vertex AI Platform for a Swift iOS App
Image Cred. Google

Exploring Google Gemini and Vertex AI Platform for a Swift iOS App

If you've been tracking my past articles, you'd be already aware that I've recently picked up Android App development again, finding joy in it as my weekend projects. ( If not, no worries ??, check it out here: "Rediscovering Android App Development after six years" )


TL;DR;

Took a leap of faith and resumed iOS App development. Identified a feature in my App that could benefit from LLM Integration. Researched available options. Liked Google Gemini and Vertex AI Platform from Google. Sharing my findings in this article


I have always thought of giving iOS app development a try, but getting started felt challenging. Few things kept me away, I need a Mac computer to develop iOS apps, which can be a significant investment. Simulators aside, an Apple device is necessary to test my app as I build it and On top of everything, publishing your app on the App Store requires an annual membership of the Apple Developer Program. ( currently $99 a year ). ( Tried out cross platform app development frameworks ( Link 1, Link 2 ) , but trust me those still require you to have a physical apple device for development completion and a developer account !! )

I was randomly reading Swift documentations at Apple Developer Website ( Swift is The language for iOS App Development, gone are the days of Objective C etc. ), and was pleasantly surprised how far it has come from it's early days.

Apple now has declarative syntax of SwiftUI. Apple also introduced SwiftData in WWDC 2023 which streamlines data storage and retrieval for an App. Xcode ( Apple's IDE for iOS development ) seemed to have incorporated some new tricks under it's belt. There are bunch of other stuffs that caught my eye, but I will save it for another post.

For a while, an App concept had been on my mind. I decided to take a bold move and began developing it in Swift for iOS. I'll hold off on sharing specifics for now since the App is still a work in progress. However, there's one feature that seemed to benefit from a "ChatGpt like functionality"

And let me tell you, "Now" is possibly the best and the worst time to start exploring Large Language Models. Seems like there are too many options out there and each has it's own nuances. Every other company presents their own model as "the best" and put up stats in support.

I have been a core backend developer for most of my career, building massively distributed fault tolerant applications. However, I was lucky to have worked with Machine Learning Models couple of times as well

While I try to stay updated on the latest advancements, I find it challenging to experiment with them all.

This App, however, gave me the opportunity to dive back in and get my hands dirty ??

So lets take a step back and understand few things !

Artificial Intelligence, Machine Learning and LLM Family of Models

I assume by now, you have heard about ChatGpt, Microsoft Copilot and Google Gemini. As of writing this article, I believe these are most popular LLM Based software products out there. Though ChatGpt and Google Gemini are LLM based models, Microsoft Copilot on the other hand integrates with applications and uses LLM under the hood to increase productivity.

So how does all of these fits in the world Of Artificial Intelligence? Let me explain with a diagram below:


AI, Machine Leaning and LLMs

Artificial Intelligence (AI) is a broader term, generally used where a computer attempts to mimic human intelligence. There are multiple field inside Artificial Intelligence, and one of them being Machine Learning.

Machine Learning (ML) is a specialized area of AI in which computer algorithms utilize data to recognize patterns, learn from them, and attempt to solve problems, gradually improving their performance over time.

Large Language Models (LLM) are specialized Machine Learning (ML) models that is trained on massive textual data. They learn from this data and generates human like responses. Thats why chatting with any of them seems as if you are talking to a human, because it learnt from us ??.

LLMs has been around for sometime, then why the craze now?

In my opinion, there's been an exponential increase in the volume of information available lately. When someone searches for an answer, they're likely to encounter hundreds of websites offering various versions of the same information. Many of these sites are eager to track your activity for "personalization purposes". For the average user, navigating through this seemingly endless stream of information, along with unwanted personalization, can be overwhelming. Personally, I'd prefer to access this information from a single source without distractions, rather than having it scattered across numerous sites. Moreover, as humans, we often want to ask follow-up questions within the same context. This aspect has greatly influenced the impact of LLM models on productivity. They enable us to obtain answers without switching contexts or splitting our focus across hundreds of search results.

Returning to the App I'm creating, my aim is to assist the user within their specific context.

As of the time of my research, Apple does not provide direct integrations to any of the LLMs providers, but rumor is they are in the talks with Open AI as well as Google for having similar capabilities in Iphone ( Link ). But not sure what will be available as part of Apple Developer program.

Apple does have Core ML Libraries (Read more about it here) which helps in Object / Text Detection, Classifications. But to me it didn't offer the feature and flexibility I was looking for.

I have previous experience working with Google's Cloud Platform, so naturally I started exploring Google's AI offerings.

Google's AI Offerings

Google has been Previewing "Bard" for some time. They recently re branded it as Gemini and expanded their suite. Gemini family of models have the following notable offerings in place ( Some of them are in early developer previews at the time of writing with probably more announcements coming on May 14th)

  • Gemini Nano on Android: Gemini Nano is Google's smallest LLM model of the Gemini family. It has the capability to be executed "on-device" using Google AI Edge SDK. Currently this is limited to Google Pixel 8 pro and Samsung S24 devices. It provides better security, cost savings as well as offline access being on device.
  • Gemini Pro Models (1.0 and 1.5): Though 1.5 Pro is still in preview, 1.0 is generally available and can process both text and image (1.0 Pro Vision) data.
  • Gemma: A notable mention here. Even through this is not a Gemini model, but what makes it stand out is it being a light weight Open Model. Though not exactly "Open Source", but Gemma allows anyone to use this model for commercial as well personal usages under an open license. The internal working is partially hidden to keep it a secret ??

There are other offerings, but for the sake of simplicity, will consider the above.

Pricing Structures for Google's Pro AI Offerings

Will take a moment to discuss how Google's Offerings are priced by taking screen grab of Gemini 1.0 Pro Pricing

Gemini 1.0 Pro model pricing Sheet


Broadly there are two tiers:

  • Free: Lets you quickly prototype, however the prompts and responses could be reviewed by a human interpreter and could also be used to train and improve the model. This might be a privacy concern at certain situations. It also imposes significant rate limits on requests, effectively rendering it unusable for actual production purposes.
  • Pay as you go: Gemini also plans to release a Pay as you go model on May 14th, 2024, where the pricing would depend on the amount of usage your model goes through. These have better privacy feature as the Prompts and responses are not reviewed by a human or used to train the model.

Let's take a moment to grasp how these Pay as you go pricing plans play out in real-world scenarios. For the sake of discussion, let's also assume we're developing an application that sends textual input to these models and expects textual output in return.

For example:

Input:
"Tell me a joke in few words"

Output:
"Why don't scientists trust atoms? Because they make up everything!"        

Rate Limits

  • 360 Requests Per Minute: You can ask Gemini to tell you a joke up to 360 times a minute ??
  • 120,000 TPM (tokens per minute): This is a little tricky to explain. Imagine this way the more long and complex the query is the more tokens it consumes to process the query. For Simpler Queries like above, you can assume 1 word is 1 token (spaces excluded) . So for the above request and response it will consumer well below 100 tokens. This is guesstimate and Google does not actually. Gemini 1.0 Pro limits you to maximum of 120,000 tokens per minute
  • 30,000 RPD (requests per day): Gemini model limits you to a maximum of 30,000 requests in a span of 24 hours. You can request Gemini to tell you a joke at a maximum of 30,000 times in a day
  • Additionally to the above rate limits, Gemini 1.0 has Input token limit of 30720 and output token limit of 2048. Which could roughly translate to input text limit of 30720 words and output limit of 2048 words per request.

*These limits are not guaranteed and overall traffic on google infrastructure will impact your available throughput.

Pricing per token

  • $0.50 / 1 million tokens: For our simplified use case, assume Google will charge you $0.50 for every 1 million words you send in the request
  • $1.50 / 1 million tokens: For our simplified use case, assume Google will charge you $1.50 per 1 million words it responds.

These costs are just to use the Gemini Model and not the associated infrastructure when used through Vertex AI Platform (more about this later)

Perhaps I've over simplified the costing, but it should provide a rough idea of the expenses for running a hobby app. These figures are approximate since Google doesn't reveal precise details about internal workings and token usage. Moreover, Google plans to unveil the complete Pay-As-You-Go plans on May 14th.

Accessing the Gemini Models

Google provides two platforms to deploy and access Gemini Models, Google AI Studio and Vertex AI Platform. Will look into these in details in next section.

Google AI Studio

Google AI Studio gives a quick and easy way to get started with the Gemini models. Gives an intuitive UI to provision and access Pre Trained LLM models. It used to also support legacy PaLM models, however it has been deprecated now.

You can access Google AI Studio using the link: https://aistudio.google.com/

Google AI Studio


There are three main prompts that can be created in the Google AI Studio. Imagine creating a prompt is equivalent to provisioning a machine learning model.

  • Chat Prompt: Chat Prompts are Open Ended Prompt that simulate human like conversation. You can ask followup questions and it would answer keeping the context in mind
  • Structured Prompt: These prompts give you more control on the output of the model. For example it lets you specify the Keywords as well as accepts additional parameters like tone etc. for generating jokes
  • Free Form Prompt: This is the most versatile prompt where the Model allows the user to input instructions in common english Language. An example of what we did earlier when we asked the LLM model to "Tell me a joke in few words". Generally these prompts do not maintain context well and it's a new task every time a request is made

Training and tuning models on Google AI Studio

Though Google AI Studio does not give you flexibility to train model ground up, however it allows you certain flexibility of tuning pre trained models using Json or CSV data. Imagine, after tuning you will have a fine tuned Gemini 1.0 Pro Model specifically for your use case.

Tuning Pre Trained Gemini 1.0 model

One you have your model, you can generate any one of the above mentioned 3 prompts and test your model for Input and Outputs. If you are satisfied with the performance, it also gives you pre generated code to access your model through Gemini API. But these are NOT production ready code.

Code Snippets from Google AI Studio


Also to access these models over an API, you need to generate an API key which is Unique to your account and a cloud project. This is a secret and should not be exposed to outside world. As you enable billing, malicious actors might get hold of your API key and you end up paying for it's usages.

Vertex AI on Google Cloud Platform

Vertex AI on the other hand is more fleshed out solution offered by Google Cloud Platform ( GCP ) for End to End ML model life cycle management. If you are planning to go beyond prototyping and deploy a production ready app, this is the correct platform to integrate with.

Though I believe I have just skimmed the surface of Vertex AI during my exploration, let's take next few sections to go through what it offers

Model Garden

It offers a collection of models beyond Gemini Family. It names it Model Garden

Model Garden in vertex AI

You can select various Foundational Models as well as Fine Tuned Models to choose from. This is well beyond what Gemini has to offer. Also Pricing is dependent on each model you use.

Model Life Cycle Management

Vertex AI platform gives you entire suit of tools to build, train, deploy, and manage machine learning models. For example

  • Allows you to clean, prepare and store your data to train your Model
  • Vertex AI offers AutoML and Custom Model Selection Options. It analyzes your training data and recommends the best model using AutoML, eliminating the need for extensive machine learning knowledge. This means you can get started without delving too deeply into the intricacies of the model, which is ideal for simpler use cases like mine.
  • Training and Optimization: Vertex AI offers intuitive tools to train and Optimize your model. Though for my simple use case I might end up using a pre trained and fine tuned model, but it's worthwhile to explore the options provided: https://cloud.google.com/vertex-ai/docs/training/overview. Understanding all the options might take me a lifetime , but it's interesting to explore ??
  • Model Deployment: Similar to other Google Cloud Platform products, Vertex AI enables you to deploy a model within a container. You can configure resources and set limits as needed. Once deployed, the model is accessible via an API endpoint, with access controlled through Google Secrets credentials. Additionally, Vertex AI provides monitoring tools for tracking the endpoint's performance and allows you to set billing alerts.

Generative AI on Vertex AI

Vertex AI gives access to Generative AI models like Gemma, Gemini etc. As of today this experience is a bit Fragmented between Google AI Studio and Vertex AI. You can read more about this here: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/overview

Integrating Google AI in a Swift iOS App

In my view, the integration of Google LLM ML models deployed in Google Cloud with a mobile app is currently somewhat fragmented.

There are two approaches to integrating them with these models:

  • Approach suitable for experimentation or proof-of-concept purposes.
  • A more comprehensive approach where the App accesses the Models via an intermediate Backend Application. The App connects to the backend application through Authentication.


Approach Suitable for Experimentation and Proof-of-concept purposes

This method is primarily suitable for rapidly testing proof of concept applications and should not be used in a production application.

Google AI SDK for Swift: Google provides an AI SDK for Swift Developers.

To utilize it:

  • Include "generative-ai-swift" as a dependency in your Swift App via Xcode package manager.
  • Add your Google Cloud API key to the info.plist of the Swift app.
  • Access the Generative model through Swift code.

import GoogleGenerativeAI

// For text-only input, use the gemini-pro model
// Access your API key from your on-demand resource .plist file (see "Set up your API key" above)
let model = GenerativeModel(name: "gemini-pro", apiKey: APIKey.default)

let prompt = "Write a story about a magic backpack."
let response = try await model.generateContent(prompt)
if let text = response.text {
  print(text)
}        

Refer to the google documentation available here: https://ai.google.dev/gemini-api/docs/get-started/swift

This approach is not recommended. Google itself has issued a warning in its documentation regarding its usage.

The reason for this caution is that encoding the API key within the App's package poses a risk of exposure through reverse engineering.

For further insights, you may find this article interesting: Secure Secrets in iOS App.

These risks escalate when billing is enabled on your account. If the API key falls into the wrong hands, significant financial losses can occur, potentially amounting to thousands of dollars.

Framework for Integrating Gemini Machine Learning Models into an iOS App

A cleaner approach would involve utilizing intermediary backend services.

Mobile App and Google Gemini Integration


  • The Google Gemini API Key and Vertex API Client Secret are securely stored in an encrypted cloud server behind a firewall.
  • App authentication occurs through HTTPS with this intermediary cloud application.
  • The App's request for Gen AI feature is directed through the Encrypted Cloud application to the Vertex AI endpoint using short-lived tokens.
  • Even if the App is compromised or reverse engineered, the API Key remains secure behind the firewall of the cloud server.
  • Multiple layers of firewalls and rate limiters can be placed to prevent DDoS attacks, ensuring enhanced security.

This zero-trust policy, coupled with communication over HTTPS channels, would help in secure communication.

Bringing it all together

  • Choose between Google AI Studio or Vertex AI Platform depending on the complexity of your use case.
  • Opt for a pre-trained model with minor adjustments for simpler use cases.
  • For a more tailored solution, utilize Vertex AI with AutoML and custom training.
  • Avoid using Google AI SDK for Swift in production applications.
  • Refrain from storing Google API Key in the App's info.plist. There's a risk of exploitation through reverse engineering, especially if billing is enabled for your Google AI Model.
  • Build an intermediary backend application for encrypted communication between the App and Google AI API endpoint.
  • If your app handles sensitive data, avoid using Google's Free LLM APIs. The prompts and responses might undergo human review and be utilized for AI model training.

Looking ahead

If you've made it this far, I hope you found this article insightful. Please pardon any errors I may have made; constructive feedback is always welcome. Over the next few weekends, I'll continue developing my application whenever time allows. As it's my first submission to the Apple App Store, I hope it meets all the required standards.

As always, I'd love to hear your comments, suggestions, or thoughts.

I'm enthusiastic about contributing to free, open-source projects that benefit our community! Let's collaborate!

Feel free to reach out with any questions or ideas. Let's embark on this exciting journey of discovery and creation together!


PS

*Opinions expressed are solely my own and do not express the views or opinions of my employer or any other entity.

*All trademarks belongs to respective copyright owners, I have used them here for illustrative purposes

Amulya Holla

Senior Software Engineer@Walmart

10 个月

Very thorough writing Tapa! Nice!

要查看或添加评论,请登录

Tapadyuti Chatterjee的更多文章

社区洞察

其他会员也浏览了