How to Think About Generative AI?

Kushal Bhagia

Wanna Pitch me ? Ping me here -> superdm.com/kushal

发布日期: 2023年6月15日

Eight Long-Term Trends Amidst the Short-Term Hype Cycle — Part 1

This is part one of a four part series with the following parts due to release every week here on?Linkedin?and?Twitter. Follow us to stay updated or share any feedback!

Introduction

Over the past year, the world has seen a massive increase in the adoption of Artificial Intelligence and the associated noise around it. ChatGPT launched on November 30th 2022, quickly gained popularity online, and become a well-known name in a just a month. By February, more than 100 million people were using ChatGPT, that too with OpenAI spending no money on marketing the product. This incredible organic growth has been widely seen as the “iPhone moment of AI”.

Apart from ChatGPT, there has been an explosion of other innovations in AI, such as the release of MidJourney’s V4 and V5, Stable Diffusion 2, and numerous apps using the APIs of these models or open-source models found on GitHub. Suddenly everyone online seemed to have an AI generated profile picture!

GPT itself has gone through several groundbreaking developments at an incredible speed. Just three months after ChatGPT’s launch, Microsoft (OpenAI’s main investor) integrated GPT with its Bing Search engine. At the same time, OpenAI released the powerful GPT-4, with multi-modal capabilities and plug-ins, allowing it to easily work with various apps and live internet data. Indie Developers continue to innovate alongside the large companies by building newer innovations, like AutoGPT, BabyAGI, and many open-sourced Large Language Models (LLMs). It’s hard to ignore what’s happening all around us.

With so many new developments happening every week, it’s hard to tell pick the signal from the noise. How should founders looking to start companies be thinking about AI? How would it affect early stage companies? What will the world be like in years to come and what can entrepreneurs do to stay on the right side of AI?

This series is our attempt to take a stab at some of these answers. We look at overall progress in AI at a macro level, looks for clues in the past to try and make sense of the present and humbly make some predictions for the future. Our goal is to think long-term amid the short-term hype and help founders do the same.

Understanding Generative AI

Generative AI, a key area within the broad field of Artificial Intelligence, focuses on algorithms and methods designed to create entirely new data — such as text, images, and audio. Unlike traditional AI which mainly uses models to find patterns and classify existing data (like your Gmail Spam filter or YouTube video recommender), generative models understand the probabilities behind the data and create entirely new instances similar to real-world examples — for eg drafting the email itself or generating an entire video.

Generative AI is based on complex machine learning techniques, including Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and autoregressive models which form the basis of the well-known Transformer architecture. These algorithms work together to create data that matches the statistical properties of the training set while maintaining a balance between accuracy and variety.

We like to view Generative AI through the below four layers :

Foundation Layer, L1: This bedrock layer consists of Graphics Processing Units (GPUs) and other specialised hardware on which complex algorithms that drive AI innovation actually run. This layer is all about efficient computation and parallel processing and currently NVIDIA is the undisputed king here.
Model Layer, L2: This layer consists of advanced foundational models, such as Large Language Models (LLMs), Visual models like Stable Diffusion, and other state-of-the-art generative architectures that enable the synthesis of novel data instances. Open AI is currently the leader here.
Tooling Layer, L3: This layer comprises of essential tools and services, including Extract-Transform-Load (ETL) processes, vector search, Reinforcement Learning-based Human Feedback (RLHF), model optimisation for machine learning training, and fine-tuning services for construction of vertical and private enterprise-wide LLMs, among others. This tooling streamlines and optimises the deployment and utilisation of generative AI models and make lives easier for developers. We have made an investment here —?Segmind. Other interesting companies here from India include Portkey, Branch.ai
Application Layer, L4: This uppermost tier encompasses apps developed using the APIs from the Model Layer and the services offered by the AI Tooling Layer. This layer represents the tangible manifestation of generative AI technology, it’s the stuff that end-users actually interact with. Some examples here are Notion’s writing assistant, Jasper.ai etc

The fast growth of Generative AI has led to a new age of innovation, impacting various industries like art, entertainment, advertising, and research. This new technology will change how humans create, gain knowledge and will open up endless possibilities.

Historically, Economic growth rested on people’s ability to be productive and create value — known as the Production Possibility Curve (PPC) in academic terms. As the world adopts Generative AI, the PPC expands and shifts rightward, increasing the value produced globally and raising the overall GDP. In simple words, humans aided by AI will be able to do a lot more in the same amount of time with the same resources.

A joint study by Stanford and MIT suggests that Generative AI will boost worker productivity by 13.8% (source). These estimates will go up as AI capability grows, and new models launch with a broader understanding and improved reasoning skills.

Every Human will be able to do more with less, more than ever before.

Trends that will define Generative AI through this decade:

Trend 1: This is the Industrial Revolution for Services

Why We Call It The Next Industrial Revolution

To understand the why, let’s first look at the Original Industrial Revolution, back in the 18th and 19th century. Technological progress, coupled with urbanisation, transformed the world’s economy from one based on agriculture and handicrafts to one dominated by industry and machine manufacturing.

The pivotal event that marked the advent of this transition was the 1746 discovery of the lead chamber process for large scale production of sulphuric acid by John Roebuck. Before the lead chamber process discovery, it used to takes weeks or months to produce one batch of H2SO4, with three times the energy requirement and you would only get a 60% concentration. Lead Chamber process made Sulphuric Acid production faster, cheaper and better (98% concentration).

This faster, cheaper, better H2SO4 was the core ingredient behind an industrial scale production of Fertilisers (more food!), Explosives (more mining, construction and war!) and Dyes (more Clothers and more Paper!). In addition it was also used to product batteries, paints and plastics. All of this became possible because Sulphuric Acid was now an easy to produce commodity.

The Lead Chamber Process Was a Zero to One Technology

What Roebuck had accomplished was truly innovative. While the technology was zero to one, the manufacturing industry of fertilizers, explosives and dyes that scaled on top of it helped take the technology from One to N.

Similarly, Foundational LLMs (0–1)Now Make Services Scalable (1-n) Like Never Before

If AI can programatically write text, speak our language, create photos and videos — that is basically the output of several services that humans already consume on a daily basis. Making this cheaper, faster and better would likely mean we will consume a lot more of them.

Businesses can now offer personalised services to an exponentially larger audience without the need for additional human resources. While we will elaborate this further, you must already be familiar with applications in customer service, content creation etc.

We predict that today’s applications of Generative AI are only a humble start toward what is to come. Between 2023 and 2028, we will see the evolution of its impact on freeing up your time and services in?four major phases:

Phase 1: Ubiquity of Virtual Assistants

The first phase of generative AI’s impact on the scaling of services will see virtual assistants becoming universally accessible, much like a copilot for web-based apps. Think of Bing AI, Bard or ChatGPT as a browser plug-in that can read the tab(s) that are open, summarise the content, reframe it, write new related content, as well as help with Search. For eg, we looked up stuff on Bard and ChatGPT extensively while writing this post. A year back we would google for information and then sift through multiple links to find the right one with the info we needed.

As this industrial revolution of services unfolds, most of us will gain access to these AI-driven companions, which will streamline daily online activities. They will redefine the way users interact with the digital world, making it more user-friendly and intuitive.

This layer will largely be lead by Google, Microsoft/OpenAI.

Phase 2: Specialisation and Omnipresence of Virtual Assistants

In the second phase, virtual assistants will diverge into two categories:?vertical?and?horizontal.

Vertical Assistants:?Tailored to perform complex tasks, catering to specific industries or domains for augmenting the efficiency and capabilities of services. Think of law, data analysis, complex coding, video-editing copilots, and other high-intelligence applications being supported by a fine-tuned virtual assistant helping professionals with their domain-specific tasks.

We predict startups or companies with special access to large expanses of historical data that can be used for training will likely win here. For eg a?PWC?might be really well placed to built a Co-Pilot for Accounting/ Compliance.

Horizontal Assistants:?Readily available to integrate themselves across platforms like calendar, mail, office, media, and other aspects of the digital stack. Think of one LLM having access to all these simple apps, as well as being accessible through those apps via text and voice-input. This omnipresence will ensure a fluid digital experience, as AI will be adept at managing multiple parts of your online life.

Here we feel there is a lot of opportunity for startups to build AI Layers which take data from multiple SAAS tools or internal repositories and build a ChatGPT for companies.?Glean?has a great product out that does this for technology companies and we feel there is opporunity to do this for many more insutries and different types of companies.

Phase 3: Digital Clones for Standardised Tasks

As generative AI evolves, the third phase will bring in the age of digital clones, tailored to execute standardised tasks. These AI-driven replicas will have the capacity to produce creative content, such as generating human videos from scripts or creating visually appealing photo-realistic images of humans. By automating standardised tasks, digital clones will change the way content is created, distributed, and consumed.

Think of a photo-realistic virtual TV News Anchor reading the news from a script fed into it, or an AI replicated avatar of Leonardo De Caprio being fed with a movie script, followed by the generation of the entire movie with acting quality as good as that of him, within seconds.

Wherever the consumption is in mass, and creation requires recurring effort (hosting the news, reality shows, acting for new sitcom episodes, recording new classes, etc.), we will see digital clones of humans beginning to take over.

Startups like?Synthesia?or?Rephrase.ai?have an early lead here in text to video creation for any use case. We however expect a plethora of tools here solving for multiple use cases like — for eg auto creation of a short highlights reel from a game streamer ‘s hours long footage or a 2 hours long podcast, Auto Generation and filing of GST returns based on Invoicing and bank statement data,

Phase 4: Custom Live Events Facilitated by Digital Clones

The fourth and final phase of this journey will be the digital clones’ capabilities expanding to offer custom live events for eg — video calls for consulting purposes. These AI-driven replicas will possess the intelligence and adaptability required to engage in real-time interactions, good enough to show up instead of the humans they cloned. This new ability will redefine the landscape of professional services, offering the potential for a more personalised and better client engagement.

Digital clones will revolutionise not only the delivery of services but also the very fabric of human interaction. Think of any and all forms of personalised consulting — lets say the best IPR lawyer offering One-on-One consulting sessions to a thousand clients simultaneously, all via video-calling, where it’s actually a LLM trained on the lawyer’s knowledge and expertise combined with a video avatar of his face, while the actual lawyer (the human being) is not present in any of these calls. You could extend the IPR lawyer example to several other fields such as medical consulting, educational counselling, coaching, management consulting, and much more.

Wherever the need for customisation is high, and time is a limiting factor that artificially raises the cost (example in any type of consulting), we will see the impact of these Phase-4 digital clones. These clones are science fiction right now but we think will be very much possible within the coming decade.

Trend 2: Beginning of the End Of the GUI Era

Humans started interacting with computers first via punch cards. They were inserted into a computer and read by a card reader and used to store date or program instructions.

Next came Teletypewriters. They were used to input data and program instructions to a computer. They worked like a typewriter, but the text was sent to the computer instead of a printer.

Then came Monitors and keyboards. They became the standard way to interact with computers in the 1970s. Monitors displayed text and images, and keyboards were used to input data and program instructions. Till this point less than a Million people used computers.

Then came Point-and-click interfaces in the 1980s popularised by the Macintosh. They allowed users to interact with a GUI (Graphical User Interfaces) by pointing and clicking on icons and menus with a mouse. GUIs till date continue to be the dominant form of human computer interaction. GUIs + affordability of personal computers pioneered by the?WinTelduopoly propelled computing to millions of users.

The next big leap came with Touch Screens — iOS and Android made it possible to touch and interact with a computing device and this lead to the smartphones reaching four billion+ users. In recent years, voice assistants like Amazon’s Alexa and Google Assistant have now allowed users to go a step further and interact with devices through spoken commands.

Every new leap in the ease of using computers (a field called Human Computer Interaction) — resulted in an exponential increase in the number of people who used computers (aided also by a decrease in the cost of computing thanks to Moore’s law).

The last leap of mobile and voice search/voice notes has been particularly revolutionary for countries like India as even people who can’t read or write in any language now use using tools like Whatsapp (with voice notes) and youtube (with voice search). However, about ~400M+ smartphone users in India basically only use their phones for Whatsapp, Facebook and Youtube — mostly because the UI/UX of everything else is just too complicated for them.

The Future Is Now Here

Starting 2023, Human-Computer Interaction (HCI) is undergoing a paradigm shift. The traditional Graphical Computer Interface (GCI) is going to give way to voice or text via Conversational User Interfaces (CUI).

Basically we can now just ask the computer to do something, in a language we are comfortable in and it will just do it. There will be no learning curve left, or very little, at best.

This will happen on PC and mobile both but will be orders of magnitude more powerful for users who are mobile first and have never used a PC. Powered by Generative AI and LLMs, this new HCI built on voice and text interactions will simulate human-like conversation, epitomised by the CUI.

Software adoption will witness a spike as the learning curve diminishes. For eg Microsoft Copilot or Bloomberg GPT will make using Microsoft Office or Bloomberg Terminal 10X easier to use.

Most of us in the tech world don’t realise it but coaching classes, like the ones in the image above, are fairly common in the developing world. India has an entire industry that coaches students in computer skills like how to send an email, how to use Microsoft office, how to use Tally to more advanced stuff like .Net, Java, Hadoop Training courses, how to do stock market trading etc. A large chunk of this might become irrelevant as software becomes 10X easier to use. They won’t need to exist anymore because users can just ask Microsoft office (or any other software) how to do something and the AI Copilot will tell you how to do it or just do it itself.

In our view the transition to Conversational User Interfaces (CUIs) across all applications will have far-reaching impact on the developing world. Steve Jobs used to describe Computers as a?“Bicycle of the mind”. Think of how prescient this was in 1990 — think of all that we do now using computers that was unimaginable in 1990. GUIs brought made these mind-bicycles usable for millions of users for the first time ever.

There is similarly a massive opportunity now for startups to build AI-first tools, using CUIs, that enable millions more users to ride these mind-bicycles and to learn, to create, to code, to share and to thrive in their lives using them.

Everything that first world citizens had the privilege to do using computers, third world users will soon be able to do the same with CUIs and mobile. Some early examples —?Replit?Mobile App,?KissanAIetc.

This is an area where we believe founders from the developing world also have an edge in understanding the needs and wants of these users better and we are excited to meet more founders that could bring the power of computing to more humans.

This is part 1 of a 4 part series on long-term trends in Generative AI. Stay Tuned for the next post! If you are building in this space or need to bounce of ideas, we are happy to chat. Thanks to?Sibesh?from?Maya?and?Rohit?and?Harish?from?Segmindfor helping proof read this!

Connect with the authors:

Kushal Bhagia (LinkedIn,?Twitter, kb[at]allincapital.vc)

Sparsh Sehgal (LinkedIn,?Twitter, sparsh[at]allincapital.vc)

Mihir T.

Building Ostrich AI | Connecting Data, Infra & People

1 年

Looking forward to part 2 Kushal Bhagia !

1 次回应

Raj Chandra

Lead Engineer - Integrations @ HighLevel Inc.

1 年

Fascinating trends indeed! As a founder, I find it essential to stay updated with these emerging technologies in Gen AI. It's intriguing to see how these trends could reshape our industry. Appreciate the insights, and I'll be sure to share our own experiences in implementing AI solutions. Keep up the great work!