Grounding the World Around Us — Inference on the Edge
Generated using Microsoft Designer

Grounding the World Around Us — Inference on the Edge

Original Post on Medium: Grounding the World Around Us — Inference on the Edge | by Sam Bobo | Jan, 2025 | Medium

Take a moment and look about the room around you, notice the colors, sounds, people, and objects. They say a picture is worth a thousand words, correct? Well, try to compute the number of thoughts, stimuli, and information passing through the neurons of your brain during our exercise.

The Artificial Intelligence community around us are clamoring about the lack of information and data available to train the newest multi-billion parameter models. On a complete orthogonal path resides a path dedicated to improving model response “intelligence” through applying reinforcement learning or using sophisticated prompting techniques to improve reasoning at the tradeoff of response time, call it “thinking time” if you will. After listening to a podcast recently about Meta’s AR plans, the realization finally occurred to me and prompted the creation of this article: the physical world around us will serve as physical model grounding unique to our individual needs. Let me explain.

Go back to my earliest example of Augmented Reality and Artificial Intelligence, the language learning application I described in Augmenting Reality with the Overlay of AI

My concept — fusing together the augmented reality-powered Google Glasses with the software functionality of Google Translate. Note: This was a silver-level project so the concept manifested in a clickthrough demonstration. My concept entailed a user wearing Google Glasses at the top of a ski slope looking out into horizon. The wearer utters the phrase “Hey Google, quiz me in Spanish.” Instantly, rectangular bubbles with question marks overlay on top of ordinary concepts such as “snow,” “tree,” “mountain,” “skis,” and so on. The user points to an object within his/her view and utters “nieva” (in Spanish) to guess the Spanish term, getting instant feedback on whether the uttered word was correct or incorrect. Effectively, this was an early concept using machine translation and augmented reality to build a language learning application.

Simply put, the visual input from the glasses camera (ergo your eyes) acted as grounding data for any inquiry posed to the underlying model. Simply uttering, “how do you say this in Spanish?” may quickly interpret the visual recognition of “snow” and return “nieva.” In the vision video produced for Microsoft during the Build conference, the engineer working in the machine could easily ask questions generically and get responses grounded in the world around them.

Lets get technical quickly. For reference, grounding in the AI world refers to information passed to the model as truth. Historically this was done in the supervised machine learning tasks of building a model, however, in the era of foundation models, the concept of grounding quickly shifted to supportive information injected at runtime. Commonly within Retrieval Augmented Generation (“RAG”) patterns, the model will reference grounding information, perform the task at hand, and return the result. One common example is asking a question to a Generative AI bot, referencing a FAQ webpage, and returning a response that summarizes the results. The same goes for asking questions, say, against a contract to get responses.

This trend of real-time information capture for natural language based models started recently and only until now have I started to piece the information together. Lets explore a few scenarios:

· Microsoft Recall — Recall is a feature available to Copilot+ PCs whereby a question can be asked to the native Copilot model on the machine about anything (emphasis on anything) that happened on the computer and get a response. For example, asking questions about a tech strategy article from a known analyst one visited in the morning or the marketecture diagram discussed with your team during an online meeting. This is done through taking frequent snapshots of your screen and stitching together a “timeline” of events to pass as grounding into the model when asking a question.

· Pins — A new market is emerging for AI backed pins. These pins are basic hardware that performs one basic function — records audio either constantly or upon wake up and sleep words and streams to a (hopefully secure) cloud storage. Thereafter within the corresponding application, one can ask questions against anything that transpired during the day and captured by the pin. For example, say you wanted to recall a specific point about a conversation you and your brother had earlier in the day, one can effectively pull that information, using the recordings as grounding.

Created using Napkin.AI

Now layer on smart glasses that visually record the world around you constantly. Do you see the trend? The world is becoming grounding data layered on top of foundation models (or specialty models presumably in the case of the Microsoft video). Effectively speaking, information captured on the edge is augmenting foundation models for personal use. One might even take this a step further and claim that the information being captured can serve as a tap in the constant flow of new information into training newer models. Specifically, as adoption of these IoT Edge devices scale, so does the data.

That brings me to the final point in this post, hesitations with the adoption of AR and other edge computing devices — data privacy. While I have not specifically read upon all of the terms and conditions of these features and devices, one can only be skeptical that, in a world of scarcity (data, at this point), tradeoffs might be made to acquire additional resources. This parallels scraping YouTube videos for video-generation models or other information generally available on the web and sure augments the points made by data privacy activists. I simply encourage that (1) solution providers do not skirt around the fundamentals of data privacy when faced with an ever-abundant flow of new data and (2) are transparent with how information is treated for those users. Furthermore, customers should heed caution when adopting these devices in the short term until trust is built in the general populus.

Created using

Overall, marrying AI with Augmented Reality is a massive opportunity (I personally am experimenting with start-up ideas that combine these two technologies) and grounding the world around us makes the value provided by the models that much more immense but should also ensure that information is used responsibly and kept secure to foster the adoption and help bring our collective visions to reality.

要查看或添加评论,请登录

Sam Bobo的更多文章

  • Preserving AI for Social Good — How AI can Preserve History

    Preserving AI for Social Good — How AI can Preserve History

    Original publication on Medium: Preserving AI for Social Good — How AI can Preserve History | by Sam Bobo | Feb, 2025 |…

  • Adaptation Artifacts for Tuning Models

    Adaptation Artifacts for Tuning Models

    Originally posted on Medium: Adaptation Artifacts for Tuning Models | by Sam Bobo | Feb, 2025 | Medium Perfect theories…

  • My Journey into AI

    My Journey into AI

    Original Post on Medium: My Journey into AI. A short post detailing my entry into AI… | by Sam Bobo | Feb, 2025 |…

    1 条评论
  • The Opportunities Lost with Human Replacement

    The Opportunities Lost with Human Replacement

    Original post on Medium: The Opportunities Lost with Human Replacement | by Sam Bobo | Jan, 2025 | Medium “The Giver”…

  • AI Model Management and Lock-In Potentials

    AI Model Management and Lock-In Potentials

    Original Post in Medium: AI Model Management and Lock-In Potentials | by Sam Bobo | Jan, 2025 | Medium Entering the…

  • Blogging Year in Review 2024

    Blogging Year in Review 2024

    Originally posted on Medium: 2024 Blogging Year in Review. Humbled by my readership, I recap 2024… | by Sam Bobo | Dec,…

  • Standardizing for Commoditization — Anthropic’s Model Context Protocol (MCP)

    Standardizing for Commoditization — Anthropic’s Model Context Protocol (MCP)

    Originally posted on Medium: Standardizing for Commoditization — Anthropic’s Model Context Protocol (MCP) | by Sam Bobo…

    1 条评论
  • A Thesis on Artificial Intelligence

    A Thesis on Artificial Intelligence

    Original post on Medium: A Thesis on Artificial Intelligence | by Sam Bobo | Dec, 2024 | Medium Artificial Intelligence…

    1 条评论
  • Smoothing the Double Edge Sword of Tools

    Smoothing the Double Edge Sword of Tools

    Unlocking the full potential of productivity remains an inspiration — from large corporations, to small teams or…

    1 条评论
  • Claiming AI Advantages — Signals of Successful AI Companies

    Claiming AI Advantages — Signals of Successful AI Companies

    Original Post on Medium: Claiming AI Advantages — Signals of Successful AI Companies | by Sam Bobo | Nov, 2024 | Medium…

社区洞察

其他会员也浏览了