Is your AI assistant a spy and a thief?
Mark Montgomery
Founder & CEO of KYield. Pioneer in Artificial Intelligence, Data Physics and Knowledge Engineering.
Millions of workers are disclosing sensitive information through LLM chatbots
According to a recent survey by the US National Cybersecurity Alliance (NCA), 38% of employees share sensitive work information with AI tools. The NCA polled 7,000 people globally found that 46% of Gen Z and 43% of millennial workers admitted to doing so whereas ‘only’ 26% of Gen X and 14% of baby boomers disclosed doing so. (See this article by Tara Seals at Dark Reading). ??
"A financial services firm integrated a GenAI chatbot to assist with customer inquiries," Lisa Plaggemier tells Dark Reading . "Employees inadvertently input client financial information for context, which the chatbot then stored in an unsecured manner. This not only led to a significant data breach, but also enabled attackers to access sensitive client information, demonstrating how easily confidential data can be compromised through the improper use of these tools."
This NCA survey confirms what many of us have been warning about since ChatGPT was unleashed to the public nearly two years ago. Although OpenAI and others warn users not to disclose sensitive information, consumer LLM bots have been widely promoted as productivity tools for work. Employees are generally incentivized to use them in various ways, but few understand the risks.
The problem is that it’s essentially impossible to use consumer LLM bots for knowledge worker productivity without disclosing highly sensitive information as everything pasted or typed into the bots becomes data for training that is then reconstructed in a myriad of ways for other users, including legal teams, competitive intelligence professionals, and used for industrial espionage and state intelligence agents.
“These types of bots are dream spies , representing IP theft on a scale that is beyond anything imagined would ever be attempted, conducted and/or allowed by governments–especially by the U.S. Government.” – Mark Montgomery. ?
Training is necessary but insufficient as only a very small number of top-tier experts understand how LLMs connect dots sufficiently to engage safely. The only method to use LLM bots safely is to control the data and access to it.
Although some LLM bots allow users to opt-out of training on their data, as Niloofar Mireshghallah says (AI privacy researcher at the University of Washington), “In general, everything is very black box.”
Moreover, there isn’t much chance of getting caught leaking sensitive information. For example, as of April of 2024, Microsoft Copilot spokesperson Donny Turnbaugh said “Microsoft takes steps to deidentify data before it is used, helping to protect consumer identity”. Unfortunately, and perhaps intentionally, that also makes it nearly impossible to establish any kind of accountability for employees handing over confidential and sensitive data to LLM firms and Big Techs.
Business software training on customer data
Some business software records meetings and automatically posts the transcripts to public web sites, disclosing sensitive information such as which employees were identified to be laid off . However, the topic discussed could be anything, including internal investigations, potential or ongoing litigation, mergers and acquisitions, research and development, decisions on which patents to pursue, or anything else.
"As new capabilities are added, even to very reputable SaaS applications, the terms and conditions of those applications is often updated, and 99% of users don't pay attention to those terms," Galit Lubetzky explains. "It is not unusual for applications to set as the default that they can use data to train their AI models.” ?
领英推荐
Transference of knowledge capital
A much broader financial and economic problem is the transference of knowledge capital from individuals and organizations to companies who own the bots, which then automatically recompile the knowledge and sell to almost anyone. By going direct to workers, consumer LLM chatbots and even business software vendors are essentially attempting to transfer the knowledge economy to their own companies without permission.
Peter Drucker popularized the term knowledge-based economy in 1969 in his book The Age of Discontinuity. However, the concept of the knowledge economy is attributed to Fritz Machlup, whose 1962 study analyzed the production and distribution of knowledge in the United States.
“The production of new knowledge (measured by research and development activity) and increased educational attainment accounted for more than 80 percent of U.S. economic growth in the post-World War II era.” – Timothy Hogan, Ph.D., Professor Emeritus, ASU
Unauthorized transfer of knowledge capital is a much more severe injustice and form of theft than copyright infringement as the LLM bots are attempting to glean and sell the underlying knowledge represented by the copyright across all data types, which includes tens of trillions of dollars of investments over decades, and today represents anywhere between 43-80% of the U.S. economy, depending on the source.
This is a more sophisticated form of industrial espionage than China’s decades-long efforts targeting the U.S. and other Western democracies, and much more rapid. Although consumer LLMs chatbots are limited in their ability to accelerate discovery today, the vast amount of sustained investment will undoubtedly result in rapid improvements. Moreover, the knowledge base already scraped from the web enables other methods, not just LLM bots. The knowledge capital is just waiting to be unlocked in massive data stores controlled by a very few Big Techs.
It’s unconscionable that the U.S. Federal Government hasn’t put a stop to this unprecedented theft of intellectual property for nearly two years now. I believe it represents the greatest theft in dollar terms in history, even if it will require many years to manifest, by which time it may be too late for countless businesses and industries to recover.
KYield’s approach
We take an opposite approach. Our KOS (the first EAI OS) protects and helps expand knowledge for individual workers, teams and organizations. The KOS and DANA (digital assistant) boosts productivity by focusing on high quality data rather than quantity.
Data quality is improved in the natural work process as eight functions produce results across the enterprise. Security wasn’t an afterthought – it was a priority from inception. Today the KOS offers four types of security. Data remains under the ownership and control of customers at all times. It takes longer to perform GenAI functions, particularly for smaller organizations with limited data, such as writing extensive reports automatically, but the quality is much higher and tailored to each individual and organization. Data for GenAI can be augmented by legally licensing data for the relevant domains.
Bottom line:
As I said in the interview with Brian Jackson at Info-Tech Research Group , featured in their Tech Trends 2025 report, "knowledge creation and protection are essential for success in enterprise AI. Sovereignty in the modern economy requires maintaining control over your knowledge capital."
Master Builder / Senior Architectural Designer Architectural technician / Turnkey Projects /Artist /Life Coach / Mentor
1 个月I guess the answer is never to share information with AI
CISO | Protecting Sensitive Data for the Midmarket | Passionate about Cybersecurity | Artificial Intelligence Pioneer | ZeroTrust Advocate
1 个月Large Leakage Models have to be controlled and contained or they deliver more risk than benefit!
Agreed. This is why Stickley On Security is working with developers of next Gen AI firewall solutions to put up guardrails around working with AI solutions. It's a separate/additional firewall solution from your standard firewall that specifically addresses the issue of preventing data from getting outside of your network.
We totally agree - which is exactly why we're beginning to implement federated learning for marketing - so nothing actually ever leaves the users device. It's such a difficult area to navigate when the law and legislation doesn't know where the limits are. More education is needed so that people can make educated decisions!
AI-Driven Operations Leader: Specialising in Cloud IT, Advanced AI Systems at Scale, for Business Operations & Project Portfolio Delivery with compliance, milestones, risk and supplier performance management.
1 个月Great points Mark Montgomery highlighting potential security risks of consumer-grade Gen-AI, especially when sensitive information is involved. Many enterprises are rushing to adopt Gen-AI without fully considering the data privacy and security challenges, which can lead to costly breaches or misuse of intellectual property. It's important to realise that relying on Gen-AI alone isn't the answer for business operations, in the enterprise environments. We advocate a "Compound AI" approach - use the correct AI tools for specific business challenges. We combine various AI technologies such as: Classic AI for structured KPI monitoring - avoids hallucinations. Machine learning for performance predictions based on real-time data. Natural language processing (NLP) for real-time, multilingual communication. AI-driven automation for screen navigation, all while ensuring data privacy with strict role-based access control. Wrap AI tools in a secure cloud infrastructure with user role permissions and focus on data sovereignty, will avoid many of the pitfalls seen with consumer-facing AI models. Before diving in the question is, how to use AI intelligently to ensure the right tools are applied safely, securely, to solve business challenges?