ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

How to Leverage Your Own Data to Improve genAI Trust & Confidence

Imran Chaudhri

Chief Architect genAI, AI, HC & LS @ Progress | M.Eng.

å‘å¸ƒæ—¥æœŸ: 2023å¹´8æœˆ30æ—¥

With the growing relevancy of #AI in platforms such as #ChatGPT in the #enterprise, the topic of #DataPrivacy is always a priority. However, ChatGPT can provide new answers and insights using suitable large language models #LLM.

This article was first published by Progress and can also be found here.

Artificial intelligence #AI, similar to many cutting-edge technologies, will always raise some questions. And companies that are deploying it in their tech stacks will face challenges.

One primary concern is using private company data in tandem with publicly facing and internal AI platforms. These use cases are ubiquitous and can range from a healthcare organization retaining patient data, a large corporation's employee payroll information or research of private pharmaceutical data.

Creating effective AI requires a large sample size of high-quality public and/or private data. Companies with access to confidential data, such as healthcare companies with medical records, have a competitive advantage when building AI-based solutions.

However, there is an enormous responsibility when using sensitive data or leveraging your proprietary data, such as business operations data. The same enterprises and organizations must consider ethical and regulatory requirements surrounding data privacy, fairness, explainability, transparency, robustness and access.

IT and security professionals must prioritize protecting sensitive data while leveraging it to improve outcomes for their organization and its customers. And this has never been more crucial when trying to utilize AI tools responsibly.

What Are Large Language Models and Why Are They Important for AI?

Large language models (LLM) are powerful AI models trained on text and other audio visual data to perform various natural language processing tasks, including language translation, question answering, summarization and sentiment analysis. These models are designed to analyze language in a way that mimics human intelligence, allowing them to process, understand and generate human speech.

The models are trained on vast amounts of data, which allows them to detect patterns and make predictions that would be difficult or impossible for a human to do manually. This has many potential applications in healthcare, finance and customer service.

However, the complexity of these models comes with ethical challenges for your business as well as technical challenges. These can include data bias, copyright infringement and potential libel cases.

What Are Some Examples of AI Platforms Used in Business?

OpenAI ChatGPT

Introduced to the world in November 2022 by OpenAI, ChatGPT took the world by storm and is an AI chatbot designed to mimic human conversations, provide business information, create pitches/marketing copy and write stories. Training for ChatGPT involved taking data from the Internet, private data repositories and programming languages.

è°·æŒ Bard

Google Bard received its initial release in March 2023, and works as a conversational AI chat service. Bard is powered by Googleâ€™s proprietary LLM, PaLM 2 and Googleâ€™s Language Model for Dialogue Applications (LaMDA). Much like other chatbots, Bard is capable of coding and solving complex math problems.

Anthropic Claude

Anthropic Claude is a chatbot with capabilities similar to Bard and ChatGPT. But its one major distinguishing feature is its support of 100,000 tokens of context tokens, which is significantly more than other chatbots. Anthropic has also spent a lot of time focused on the dangers of AI and is training their AIs to be â€œhelpful, harmless and honestâ€ thereby improving the trustworthiness of the AI. Another trait of Claude is its ability for users to delete conversations and support VPN browsing. Also, trivia, Claude is named after the â€œfather of information theoryâ€, Claude Shannon.

What Are the Risks and Challenges Involved with Private Data and AI?

While there are many use cases for chatbot AIs and their effectiveness, they do face their fair share of difficulties.

Hallucinations

A #hallucination, as the literal dictionary definition, is "an experience involving the apparent perception of something not present." When it comes to AI, a hallucination is when it confidently reports error-filled answers to the user. Because of the way the LLMs predict the next word, these answers really do sound plausible, but the information may be incomplete or false. For example, if a user asks a chatbot what the average revenue was of a competitor, chances are those numbers may be way off. These kinds of errors are a regular occurrence. In fact, they will happen between 15% to 20% of the time and need to be kept in mind when querying your AI.

Fairness/Biases

LLMs, like any other AI system, have limitations and potential issues, including exhibiting biases, meaning that they may produce results that reflect the biases in the training data rather than objective reality. For example, a language model trained on a predominantly male dataset might produce biased output regarding gendered topics. In fact, a Progress conducted a research study and found the following three statistics based on data bias:

65% of businesses and IT executives currently believe there is data bias in their respective organizations
13% of businesses are currently addressing data bias

78% believe data will become a bigger concern as AI/ML use increases

LLMs have the potential to produce biased outcomes, and they require a lot of computation power and data to train, leading to concerns around data privacy and energy consumption. While the potential benefits of these models are immense, users should carefully examine the ethical and practical considerations.

Reasoning/Understanding

LLMs may need help with tasks that require deeper reasoning or understanding of complex concepts. A LLM can be trained to answer questions that require a nuanced understanding of culture or history. Sometimes, these models perpetuate stereotypes or provide misinformation if not carefully monitored and trained. As with any AI system, it is crucial to be aware of these limitations and potential issues, and to use these models responsibly with a keen eye toward ethical considerations.

Data Cutoffs

Given that it takes a lot of resources to train the models, their model memory tends to be out of date.

Explainability

Often, it is quite difficult to understand how the LLM generated its response. LLMs should be trained or prompted to show their reasoning and reference the data they used to construct their response.

é¢†è‹±æŽ¨è

The Lowdown on ChatGPT

Engtal 2 å¹´å‰

AI ONE ON ONE: How to Use ChatGPT for Business?

Aleksandra Przegalinska 2 å¹´å‰

Artificial Intelligence #92: ChatGPT: Taking AI from statistical to a linguistic model and making AI more inclusive

Artificial Intelligence #92: ChatGPT: Taking AI fromâ€¦

Ajit Jaokar 2 å¹´å‰

Robustness

As with all technologies, guarding against unexpected inputs or adversarial situations is even more important with LLMs.

If we can successfully address these issues, the trustworthiness of our solutions will increase along with user satisfaction ultimately leading to the solutionâ€™s success.

How Can MarkLogic and Semaphore Assist with Using Private Data for ChatGPT?

Now, with all the above information, how can someone receive more accurate answers when using ChatGPT? Can a user influence a language model with private data to obtain correct answers?

One of the standout capabilities of Progress MarkLogic includes its ability to store and query structured and unstructured data. Additionally, Progress Semaphore can capture subject matter expert (SME) content via its intuitive GUI in the form of semantic graphs. The resulting knowledge graphs can extract facts found within the data and can also tag the private data with semantic knowledge. In turn, Semaphore can also use this semantic knowledge to start tagging user questions/inputs and specific genAI answers with this said knowledge. Users can then use MarkLogic and Semaphore to fetch semantically relevant private data for genAI.

A strong, secure, transparent, governed AI knowledge management solution can be created from combining these modules.

The best way to compare how MarkLogic and Semaphore works in conjunction with LLMs is to think about closed book and open book exams.

No alt text provided for this image — Closed Book Exam â€“ ChatGPT Only

Describing the closed book exam model can be done in two steps:

1. The user asks ChatGPT the question or requests information.

2. ChatGPT will provide answers to these inquiries based on the knowledge embedded within the language model that is currently being utilized.

Without semantic and tagged data, users may not get entirely accurate information. In fact, they may not even be aware of a responseâ€™s inaccuracies.

When using MarkLogic's abilities alongside ChatGPT, here is what users can experience:

1. We ingest the content into MarkLogic and create smaller sections of the documents so that search results can fit into the ChatGPT prompt window.

2. MarkLogic searches for the most relevant private document sections based on the userâ€™s question.

3. The middle tier or MarkLogic can then generate a customized prompt for ChatGPT using relevant private data from search along with the user query.

4. ChatGPT provides the final answer.

With MarkLogic, users can increase accuracy and efficiency while introducing governance. Despite the increase in accuracy and efficiency, LLMs, not to mention their users, can still need help with answers. Especially since the data can lack context and meaning.

But, when using both products, users will obtain knowledge and information more easily. And through this, users can gain expert-level insights.

1. We ingest the content into MarkLogic and create smaller sections of the documents so that search results can fit into the ChatGPT prompt window.

2. Semaphore semantically tags, categorizes and fact extracts private data from the document sections.

3. Semaphore tags key concepts in user queries too.

4. The middle tier or MarkLogic can then generate a customized prompt for ChatGPT using only semantically relevant private data from semantic search with the user query. The semantically relevant private data can be even further filtered to retain only the semantically relevant sentences.

5. The final answer is a combination of ChatGPTâ€™s answer validated with semantic knowledge from Semaphore.

The combined results will allow LLMs and users to easily access and fact check the results against the source content and the captured SME knowledge graphs.

MarkLogic and Semaphone enhance the overall user experience with ChatGPT, thereby improving AI trustworthiness.

There are conversations about the ongoing worry of AI taking over #AIAlignment... but, what corporate users should be worried about right now is who is using the AI to gain a competitive advantage. Something that has always been prevalent in business, is using the most effective tools based on growing technology platforms. Regardless of if they are a member of the IT team or assist with creating marketing campaigns, users are always seeking the best tools to gain a competitive advantage. After all, no one wants to miss out on an opportunity to grow their business.

Generative AI tools offer this opportunity. However, businesses and organizations are responsible for implementing trustworthy AI in the best way possible. Not just from the fear of missing out.

If you want to see more in-depth examples of LLMs and private data in action, register for the upcoming Convergence of Private Data and AI webinar.

You can click this link to Register for the Webinar.

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Imran Chaudhriçš„æ›´å¤šæ–‡ç«

ChatGPT 4o Serious Racial Bias

2024å¹´5æœˆ27æ—¥

ChatGPT 4o Serious Racial Bias

Ok, So yesterday I tried to get OpenAI #chatGPT4o to create an generated digital avatar for me from a picture. I endedâ€¦
Benefits of Generative AI and Enterprise Semantic Data Integration

2023å¹´11æœˆ16æ—¥

Benefits of Generative AI and Enterprise Semantic Data Integration

This article was first published here. AI is swiftly finding its way into enterprises, impacting various processes andâ€¦

2 æ¡è¯„è®º
ONC CURES Act Data Management

2020å¹´7æœˆ22æ—¥

ONC CURES Act Data Management

The ONC Cures Act regulations enable patients and healthcare payers and providers to share data at a digital level andâ€¦
Healthcare Price & Quality Transparency Still Insanely Broken!

2018å¹´9æœˆ15æ—¥

Healthcare Price & Quality Transparency Still Insanely Broken!

Data, data everywhere, Nor any drop to drink. Thank you for the nice description Coleridge.
How NoSQL Can Help Analytics in Life Sciences and Healthcare

2016å¹´11æœˆ3æ—¥

How NoSQL Can Help Analytics in Life Sciences and Healthcare

by Imran Chaudhri (Original article link) I often get asked â€” how does MarkLogic help with analytics? Where doesâ€¦

2 æ¡è¯„è®º
Decimating Data Silos With Multi-Model Databases

2016å¹´4æœˆ26æ—¥

Decimating Data Silos With Multi-Model Databases

A short time ago, I had a discussion with some architects working for one of the worldâ€™s largest health systems. Theyâ€¦

11 æ¡è¯„è®º
Data modeling in the #bigdata #nosql world

2015å¹´6æœˆ6æ—¥

Data modeling in the #bigdata #nosql world

Imran Chaudhri and Damon Feldman, June 2015 This post provides general data modeling design guidelines for XML and/orâ€¦

3 æ¡è¯„è®º

See all articles

How to Leverage Your Own Data to Improve genAI Trust & Confidence

Imran Chaudhri

Chief Architect genAI, AI, HC & LS @ Progress | M.Eng.

What Are Large Language Models and Why Are They Important for AI?

What Are Some Examples of AI Platforms Used in Business?

What Are the Risks and Challenges Involved with Private Data and AI?

Hallucinations

Fairness/Biases

Reasoning/Understanding

Data Cutoffs

Explainability

é¢†è‹±æŽ¨è

Robustness

How Can MarkLogic and Semaphore Assist with Using Private Data for ChatGPT?

Imran Chaudhriçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

From hype to action: charting your AI course

???? AI. News! "Claude 2.0 Challenges ChatGPT" ????, How to Create Images from Doodles ????, OpenAI Faces More Lawsuits ????... SUMMARY week #AI

DeepSeek Vs ChatGPT (Conventional AI)

DeepSeek vs. ChatGPT: A Comprehensive Analysis of Features, Applications, and Future Trends

Elevating Your AI Strategy: A Side-by-Side Comparison of 6 Key Criteria

Is an AI system detecting text generated by ChatGPT a high-risk AI system?

Introducing Lensa and ChatGPT

The Limits of Generative AI Writ Large

ChatGPT or DeepSeek: Which AI Takes the Lead?

What Are Large Language Models and Why Are They Important for AI?

What Are Some Examples of AI Platforms Used in Business?

What Are the Risks and Challenges Involved with Private Data and AI?

Hallucinations

Fairness/Biases

Reasoning/Understanding

Data Cutoffs

Explainability

é¢†è‹±æŽ¨è

Robustness

How Can MarkLogic and Semaphore Assist with Using Private Data for ChatGPT?

Imran Chaudhriçš„æ›´å¤šæ–‡ç«

ChatGPT 4o Serious Racial Bias

Benefits of Generative AI and Enterprise Semantic Data Integration

ONC CURES Act Data Management

Healthcare Price & Quality Transparency Still Insanely Broken!

How NoSQL Can Help Analytics in Life Sciences and Healthcare

Decimating Data Silos With Multi-Model Databases

Data modeling in the #bigdata #nosql world

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

From hype to action: charting your AI course

???? AI. News! "Claude 2.0 Challenges ChatGPT" ????, How to Create Images from Doodles ????, OpenAI Faces More Lawsuits ????... SUMMARY week #AI

DeepSeek Vs ChatGPT (Conventional AI)

DeepSeek vs. ChatGPT: A Comprehensive Analysis of Features, Applications, and Future Trends

Elevating Your AI Strategy: A Side-by-Side Comparison of 6 Key Criteria

Is an AI system detecting text generated by ChatGPT a high-risk AI system?

Introducing Lensa and ChatGPT

The Limits of Generative AI Writ Large

ChatGPT or DeepSeek: Which AI Takes the Lead?

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†