Personal AI — A 5-Layer Grounding Framework for Personal AI Models
Original post on Medium: Personal AI — A 5-Layer Grounding Framework for Personal AI Models | by Sam Bobo | Jun, 2024 | Medium
Stepping into the Client Experience Center at IBM Watson’s HQ in NYC, I was presented with a long flatscreen spanning at least 30ft in width. Suddenly, high quality 2-dimensional human figures appeared, cascading down the screen, each representing a different career: doctor, lawyer, human resources, financial advisor, etc. One by one, we stepped into the shoes of these individuals, understanding the pain points of today and the futuristic problem-solved universe of “tomorrow.” From a customer perspective, the empathy built for each of these personas resonated with at least someone in the audience enough to proceed with sales discussions.
The Watson Client Experience Center was an impressive AI-forward tour and qualitative demonstration on the power of AI. These avatars represented real human beings, real stories, in the real world where AI could make a positive difference. During my time at IBM Watson, I did get the opportunity to work with innovators who were tackling many of these problems, from solving cumbersome prior art research for patents to responding to both positive and negative reviews in the travel industry. I’ve long shared that the power of AI is real and the classic example of “solving” breast cancer by feeling an image model with hundreds-of-thousands of pictures annotated with indications of breast cancer and nearly the equivalent amount without. With the advent of Large Language Models helping to power new modalities of conversation and have a more conversational interface to backend systems, it democratizes access to information and also opens the aperture to new innovations.
In a recent conversation with a medical practitioner, I discovered a truly unbearable pain point — paperwork! The gentleman whom I conversed with was a highly regarded primary doctor. He shared his passion for treating patients, making that human connection, and providing answers and remedies (as applicable). His primary complaint — paperwork, 2–3 hours a night! Two examples include patient visit summaries, typically handled by a nurse or an physician assistant, and the other was renewing prescriptions. The former already has highly regarded AI products in market, Nuance Dragon Ambient Experience (DAX) now owned and managed by Microsoft. The latter, he continued, requires checking with the pharmacy to see if the prescription was picked up, prior history, and confirming eligibility for refill. Immediately in my mind came the notion of an Autonomous Agent with Human-in-the-Loop (HITL). In a futuristic example, the doctor could initiate the request for prescription refill and then agent would go and perform the sourcing and checking of information, providing the results back to the doctor to approve or reject.
The second anecdote, education. Teachers, primarily in K-12 education, are facing a number of challenges including student absenteeism, funding, cheating, skill gaps, and more! Creating lesson plans tailored to a cohort of students of varying skills, learning modalities, and motivation is a pedagogical art and the sheer quantity of work required to craft syllabi, detailed lesson plans, and assessments while abiding by school, local, state, and federal policies is not an easy task. Navigating these hurdles while maintaining a hyper-focus on the end goal of student learning outcomes can be quite daunting. New tools such as Khanmigo are available to teachers and students to aid in the learning process, as I referenced in “Education Updates at Major Developer Conferences .”
Lastly, for computer scientists and software engineers, breaking the flow of programming to write README files, set up code scaffolding, bug fixing, and code management is time consuming. Furthermore, for those entering the field of computer science or looking to program to solve a problem could find the modern app directory and file structure intimidating. Millions of developers and associated organizations are trusting Microsoft GitHub Copilot and Copilot Workspace to streamline development efforts and act as an expert pair programmer.
The three examples hit on the essential marketing claims of AI, that AI systems:
As my readers know, I am immensely passionate about the field of Artificial Intelligence and the impact the technology can have on society. The foundation exists to achieve such as vision, such as autonomous agents and visual recognition (healthcare), content translation and generation (education), and coding (engineering) to name a few, but society as a whole needs to shift its locus of obsession from Artificial General Intelligence (AGI) towards solving industry-specific needs. I will start to build a framework on optimizing AI practices to realize such a dream.
Traditionally, there were three layers to Artificial Intelligence Systems:
Today, I am proposing two additional layers to the model:
These five layers build the model expert that extends oneself into the “limitless” either and helps us as humans achieve more and fulfill that promise. It provides a basic understanding of the world (foundational layer), institutional training (industry layer), occupation, company, or task specific information (use case layer), oneself (likeness layer), and ability to take action as needed (autonomy layer). The true question is, how is this achieved? TRUST
Trust is by far the largest battle to be won within AI. I recently references the massive security credits required to participate in Artificial Intelligence solutions:
Where both companies thrive is a new architectural pattern emerging to combat the trust obstacle as well as optimize for latency: the hybrid cloud. For both Microsoft and Apple, sensitive computing occurs on-device with a portfolio of Machine Learning models (for example, the Phi models from Microsoft) and Apples proprietary models to reduce latency (a critical factor in user experience) and maintain privacy (well…so long as the information is encrypted). Any workloads that are much larger in nature such as large document summaries and non-personal open-ended questions such as a search query, are outsources to cloud-hosted models and third party models, OpenAI for both companies.
Both Microsoft and Apple had to earn the right to make AI more intrusive in one’s personal life through security and trust, which, is quite an undertaking given the rhetoric in society around data privacy and security and how AI plays into both aspects. The article also mentions the hybrid cloud infrastructure, using small language models on devise and large language models in the cloud to employ personal context (SLMs) where needed on devise and general knowledge (LLM) to fill the gaps. (I would be remiss if I did not reference the Expert Leader Agent Model as my plea to focus on industry models) In this framework, I only think the former, Small Language Models, are the correct model modality given the specific targeted use case one would be tackling with a complete 5-layer AI solution. Small Language models both perform well, can be locally integrated into personal computing devices, and can handle more robust encryption with integrated players (e.g Apple Silicon).
Finally, capitalizing on the vertical integration, understanding of one’s actions and the tasks required to complete them is essential for the autonomy layer. Microsoft with Windows, Apple with iOS, own the operating system later and thus the GUI and underlying APIs that developers build upon. This is a tremendous opportunity to mimic Large Action Models (LAMs) coined by Rabbit and its R1 device to understand how to invoke systems, tasks, etc required by the 5-layer AI solution to achieve that level of autonomy and replicate the steps the human would entail.
Imagine a futuristic world whereby all 5 layers of the AI solution evolved in harmony:
Some day, we might be able to walk into the Watson Experience Center and see ourselves on that wall, this time, with the problem solves and new future progress challenges to resolve!