When Generative AI Meets Notebook App
Jason Fung
Senior Director, Offensive Security Research & Academic Research Engagement, Intel Corporation
Introduction
Google Labs recently introduced a new type of Generative AI (GAI) tool in the form of a notebook. NotebookLM is marketed as an AI notebook for everyone. It enables users to leverage Large Language Models (LLMs) to extract insights from users’ notebook contents without the need to first feed notebook data to an AI Chatbot like ChatGPT.? Similar to AI Code Assistants, the marriage of GAI and notebook applications is expected to be a welcoming addition to the GAI family. By eliminating the need for prompt engineering, AI Notebooks offers simplicity and ease of use, attracting more users to experience the power of LLMs on their data.
Yet, the rise of AI Notebooks could also open up new privacy and information disclosure concerns to prospective users and organizations. This article aims to explore the emerging risks associated of this new GAI usage model.
Improving LLM Capabilities
LLMs are trained using vast amounts of data from various sources such as articles, books, Wikipedia, websites, blogs, and more. However, not all accessible data can be used for LLM training due to factors such as legal, privacy, and ethical considerations. In fact, these considerations have been the focus of media debates and regulatory discussions regarding the acceptable source data for AI service providers.
Furthermore, LLM capabilities continue to improve with user consumption. For example, AI service providers may offer free AI Chatbots to the public, gathering prompts from users and feedback to enhance their offerings over time.
An Illustrative Example
Let's consider Google Labs' NotebookLM as an example to understand how an AI Notebook service could be rolled out by a service provider. NotebookLM aims to provide users with the power of LLMs to gain insights from their data quickly. Users can obtain summaries, ask questions, and generate new ideas based on the data shared with LLMs.
Sharing notebook contents with LLMs is made simple by "grounding" them to specific Google Docs as data sources. Users are given full control over which documents to share without worrying about context window size limitations.
Emerging Security and Privacy Concerns with AI Notebooks
The rise of AI Chatbots has raised privacy and information disclosure concerns that impact end users and organizations. A similar trend is expected with the rise of AI Notebooks, but potentially in a more subtle and dangerous manner.
The presence and severity of these risks depend on various factors, including service offerings, implementations, usage terms and conditions. It is important to note that the risks discussed below pertain to the broad category of AI Notebooks and not specific service offerings. Google Labs, for instance, clearly states their AI Principles and Responsibility and emphasizes:
领英推荐
We’ve built NotebookLM such that the model only has access to the source material that you’ve chosen to upload, and your files and dialogue with the AI are not visible to other users. We do not use any of the data collected to train new AI models.
Personal Data Concern
While user contents within Google Docs are already stored in the cloud, LLMs are not expected to scrape or consume them unless the documents are made publicly accessible. However, when users "ground" their documents to AI Notebooks, LLMs are authorized to access and consume the contents to provide the advertised services.
Many users, including myself, organize their notes into various folders and tabs. Notes on related topics may be spread across multiple files or pages created at different time periods or situations (e.g., conferences, work meetings, classes). The benefit of using AI to extract insights from notes is to relieve users from curating a list of files containing relevant information. However, this can also lead to the temptation of feeding all data to AI without filtering out sensitive information that should not be shared.
Enterprise Information Security Concern
The introduction of AI Notebooks may introduce new information disclosure threats to organizations, particularly when enterprise users are not security-minded and cautious.
Engineers often keep notes and learnings in online notebooks hosted within enterprise cloud environments. Some of these notes may contain sensitive, proprietary materials extracted from internal presentation slides and product specifications. These notes receive the same level of enterprise-class IT protections as other sensitive documents, emails, source code, and more.
While users understand the security implications of copying and pasting content to an AI Chatbot when asking questions about internal documents, adding internal documents to an AI Notebook may only involve a few clicks. This ease of use can lead enterprise users to unintentionally leak proprietary information when trying out a new AI Notebook service.
This situation can be exacerbated when service providers offer cloud notebook solutions that include both enterprise and personal versions, causing confusion even among discerning users.
Conclusion
In summary, the rise of AI Notebooks as a new GAI usage model brings potential privacy and information disclosure risks that can impact personal data belonging to individuals and proprietary information belonging to organizations. User education to raise awareness and the timely definition of enterprise usage policies are crucial in minimizing these risks.
Director at Blue Ant Consulting | Operations, Transformation and Product -working across other functions as needed | PMP | PSPO | Applied Artificial Intelligence
5 个月Nice. Here’s how I took the podcasting feature a little further to give voices *and faces* to a fascinating topic: my CV. https://www.dhirubhai.net/posts/andresvarela_recruiters-dont-always-get-me-so-i-generated-activity-7246123744392306688-a8Lw