How Does ChatGPT Answer Our Questions? - With Knowledge Graph and Graph Database
ChatGPT, a free chatbot with artificial intelligence, has gone viral all around the world in January 2023. According to a UBS research report, ChatGPT’s active users have reached 100 million in January and the figure keeps increasing, making it the fastest growing application in history. OpenAI, the owner and developer of ChatGPT, will soon release a Plus version, costing $20 per user per month, following the $42 Pro version.
When ChatGPT has quickly attracted such a huge number of users, public attention, and commercial values, people began to wonder the technologies backing it up as well. In this article, let’s look into how ChatGPT processes and queries massive amounts of data from the perspective of knowledge graph and graph computing technology.
ChatGPT as an “AI” Application
Those who have tried ChatGPT out may notice that it seems to be “smarter” than Siri, for not only can it answer our questions with impressive conversational skills, and interact with us based on given contexts, but it can also reason and create, and even reject questions or requests it “thinks” inappropriate, more than just replying in a human-like way. It displays some degrees of “artificial intelligence” suspiciously…
According to the well-known Turing Test (also known as Turing Judgement), from Turing’s "Computing Machinery and Intelligence" published in 1950: "If a person cannot distinguish the difference between the responses of humans and AI machines, it can be regarded that the machine has artificial intelligence".
Supposing that we are completely ignorant of ChatGPT’s being a chatbot, based on what it “knows” and how it “chats” that almost shows a character, can we be 100% certain that it is another human being or an AI application we are having conversations with? Hmm...that’s where we hit the tricky part: ChatGPT essentially belongs to the Deep Learning criteria, where there are a lot of black boxes and in-explainability. However, from the perspective of technological development only, it can even be regarded as having equipped with the technology to pass the Turing test and having “artificial intelligence” to some extents…
Why is ChatGPT Able to Answer?
How does ChatGPT manage to give high-quality (in the sense that not all of us may be able to…) answers within such a short response time? after completing an entire process with a corpus of 300 billion words and 175 billion parameters as well as the information our questions bring to it. The answers is that ChatGPT also has a “brain”, and it can “learn” just like us human beings as shown in image 1:
By NLP (Natural Language Processing), object recognition, multi-model recognition, etc., ChatGPT constructs a number of random files such as texts and images into a knowledge graph according to their semantic structure. And this knowledge graph is the “brain” of ChatGPT. In healthcare scenario, ChatGPT transforms multi-sourced data into knowledge graphs for medication-relevant question answering, search, and generate answers on pharmaceutical research and development.
How, Exactly, Does ChatGPT Answer Our Questions?
To answer this question, we would need to look into what a knowledge graph consists of: knowledge graphs are often composed of nodes (entities) and edges (relations), which can integrate relevant information such as people, stories, and stuffs, to form a comprehensive graph, as shown in the image below.
When we ask ChatGPT "Who is the founder of openAI?", the ChatGPT 's brain starts to search and query it quickly in its database. It first zooms onto the target word “openAI” in our question as a starting node, then goes to another node -the founder "Sam Altman".
In fact, ChatGPT did more than that: it has found all neighbors of our target node in its own database. Therefore, when a relevant question pops up, it has predicted the answer of the question that we would ask beforehand (as well as the question itself in this sense?). For example, when we asked: "Is Musk a member of the founder team of openAI?", it had triggered queries to all other members, as shown in the image below.
In addition, if it has previous "learning materials" inputs in its database, then its “brain” would also associate relevant graphs and nodes such as "What products have artificial intelligence too?", as shown in the image below.
Of course, same as human beings, ChatGPT can NOT KNOW either: it capability of answering questions is limited by its own knowledge storage, i.e., the data stored in its database. However, the quality and the speed of ChatGPT’s answers has to rely on the computing power of constructing knowledge graphs powered by GDBMS.
GDBMS? Why not RDBMS?
Some may question: why does ChatGPT have to be on top of graphs and GDBMS, why not tables and RDBMS, can they not do the same? Unfortunately (or fortunately…) the answer is no, they can hardly manage to. It has been a norm where the construction of general-purpose knowledge graphs based on RDBMS are always about NLP and data visualization (in a limited way), whereas issues such as the real-timeness, the flexibility of data modeling, and the explainability of queries and computing overall have to be regarded as less prominent if not neglected in the most cases. Especially in this era where the world is transforming from big data to deep data, the traditional SQL or NoSQL-based graphs are no longer capable of processing sea-volume, complex, and dynamic data efficiently, let alone traversing, penetrating, and analyzing to give valuable insights. To summarize, the challenges faced by RDBMS-based knowledge graphs mainly include:
Slowness: the underlying architecture of knowledge graphs built on SQL or NoSQL simply does not allow high efficiency when it comes to sea-volume data: the Cartesian Product generated by joining of multiple tables results in a drastic drop of SQL query efficiency where T+1 or T+N becomes too common a problem that we yield to without necessary acknowledgement to it. What’s more, a knowledge graph without sufficient underlying computing power does not have the ability to conduct deep data drilling down either.
Poor flexibility: knowledge graphs based on relational databases, document databases, or low-performance graph databases are usually limited by the underlying architecture and does not have the capability to simulate the real-world relations between entities to a satisfying degree. For example, some of them only support simple graphs, and when importing multi-edge graph data, either information is easily lost, or it takes a high cost to compose the graph.
Black-Box: once data is in data warehouse or data lake, the time-to-value is lengthy and data visualization is hardly possible, which simply does not allow for analytics with fine granularity to take place afterwards. For complex queries, SQL/stored-procedures are hard to develop, explain, or maintain, therefore being a black-box.
If RDBMS Is a Habit, We Should Quit It For GDBMS.
Graph database is a practice of graph theory (where Graph is a data structure composed and defined by nodes [see reference 2] and edges [see reference 2]) in computing technology: users can store property data of entities as well as the relationship data between entities.
"Graph" is the foundation of knowledge graph storage and application services. It is a natural high-dimensional architecture that allows for fast data connection and analytics capabilities, and it is greatly valued by both science and business.
领英推荐
The image above shows that with the help of a strong graph database and graph computing engine, we can find deep-down data connections in real time from almost all industries, and even find an optimal one that is too complex for human intelligence to dig out. Its high dimensionality lies in that "Graph" is not only a tool that conforms to how a human brain works and can intuitively model the real world, but also can generate profound insights from deep traversal.
Risk management is one of the most typical cases. If we look back to 2008 financial crisis, we would notice a fact surprisingly that the only trigger was the collapse of one bank: Lehman Brothers, the fourth largest investment bank in the United States. It was out of everyone’s expectation that the collapse would spread to a series of bankruptcies in the international banking industry. However, with the real-time graph database (graph computing) technology, it is possible to locate all the key nodes (entities), factors, and propagation paths of risks and to even warn people about a coming financial catastrophe.
Although a number of database providers claim to be able to construct highly available knowledge graphs, among every 100 database companies, less than 5% provide high-performance databases or have satisfactory computing power. (same for the graph database vertical). We would like to introduce Ultipa Graph to readers, the world’s only 4th-generation real-time graph database. While leveraging x86 CPU’s parallel computing capabilities (note: in terms of general computing capabilities, GPU still is far from CPU...), Ultipa Graph realizes ultra-deep drilling down of data sets of any magnitude in a real-time fashion with Ultipa’s award-wining technologies including high-density parallel computing, dynamic pruning, multi-level storage computing acceleration. Ultipa Graph is featured with 3 key advantages over other vendors:
Powerfulness
When looking for a UBO (Ultimate Beneficiary Owner, also known as the actual controller) of an enterprise, the real-world trick people play is: there are often multiple hops (Shell Company Entities) between UBO and the inspected corporate, or there are numerous investment and equity participation paths between various natural persons or corporate entities to implement the controls over target companies, which calls for deep data penetrations that can only be implemented with the help of high computing power. Traditional RDBMS or document databases, even most graph databases, cannot complete the task in real time.
Ultipa Graph addresses this pain point with its innovative high-concurrency data architecture, high-performance computing and storage engine, which supports deep data drilling down at a speed of 100+ times faster than other graph systems, in order to locate UBO or discover sprawling investment chains in real time (within microseconds!). It is important to note that microsecond latency means higher concurrency and system throughput, which is a 1000x performance improvement compared to systems that claim to have millisecond latency!
Taking a real-world case: Bernard L. Madoff, the famous pyramid fraudster and financier as an example, with white-box penetration, Ultipa Manager (a smart 2D/3D data visualization tool) digs out and visualizes the complex relationships between him and all the affected companies, and targets on the ultimate behind-the-scene bosses in real time.
Flexibility
The flexibility of graph databases can be a very broad topic: it covers up data modeling, query and computing logic, query visualization, interface support, scalability ect…..?
Data modeling is the basis of all knowledge graphs and it is also closely related to the underlying computing capacities of a graph database. It can be noticed that a column-based graph database system like ClickHouse does not have the capacity to run a huge financial transaction graph at all, for one of the most typical features of a transaction network is that multiple transactions can take place between two accounts, whereas Clickhouse would combine them into one, resulting in data distortion. Some graph databases that are based on simple graphs (where two nodes are connected by only one edge) are inclined to store transaction data in nodes instead of edges. This architectural deficiency would not only enlarge the data amount (which leads to unnecessary storage costs), but also cause an inevitable query complexity upgrade (causing huge time costs), dragging computing power down. Ultipa Graph supports complex graphs that can have multiple edges between two nodes, circular paths, and even more sophisticated data architecture that can unleash computing speed and free the system from storage redundancy.
In terms of query and computing logic, looking for a causal (strong correlation) relationship between any two people or two stuffs like that in Butterfly Effect can be too difficult to make and trace for any data processing architectures that are not scaffolded by sufficient graph computing technology. Mind we are not talking about any ostensible connections that any traditional computing engine, big data, or NoSQL framework or even a relational database can find, but the deeply hidden or the least connected ones, for instance, the relations between Newton and Genghis Khan that seem to be not in any way related (in a typical BFSI scenario such as Asset & Liability Management, sea-volume banking data could make things even worse). With Ultipa Graph, several approaches are available to us: in-depth (10 hops +) path query between two nodes, auto-net query among multiple nodes, template query with flexibly customizable conditions, and graph-native text path search like Web search engines. We can manage to do a great number of tasks on Ultipa Graph with its flexibility and powerfulness, not only just finding nodes, edges, and paths with flexible property filters, but also conducting schema recognition, community detection, customer discovery, finding k-hop neighbors…with real-timeness.
A graph database’s user interface friendliness is closely related to user experience. Simply put, if a graph system in a production environment only supports importing CSV. files, then data stored in all other formats must be converted to CSV formats before being imported into the graph. However inefficient, this operation remains an elephant in the room for many graph databases. With a series of tools built on top of Ultipa Graph, such as Ultipa Transporter and Ultipa Maker, users can enjoy both cross-platform and cross-language convenience with their data: data from SQL, Neo4j, data files can be imported to and exported from Ultipa Graph with designed ease-of-use. ?
Low code, highly visualized
Ultipa Graph is featured with white-box interpretability as well as low-code (or even no-code) operations, offering great business-personnel friendliness. It can empower smart enterprises to unlock more values from their data assets with 5X time-to-market: developers can operate on their data in Ultipa Manager by using declarative and intuitive Ultipa GQL (UQL); business personnels can fully operate on their data by using UQL shortcuts and plug-ins to query in a code-free manner, which not only optimizes work efficiency, but also offers a unified platform for organizations and departments’ synergy. These smart designs from Ultipa are offering an elegant approach to cancel the technological threshold that used to strictly bar businessmen and businesswomen from all industries out: they, as the most closely involved users and beneficiaries of business data, were not able to get their hands on it or have rather limited access to it instead.
A collaboration of knowledge graph and graph database is the key to all industries’ successes in realizing their business blueprints, especially BFSI (Banking, Financial Services, and Insurances) industries that require high level of consistency, security, stability, real-timeness, and accuracy, which RDBMS are not able to provide satisfactory computing performance on even in T+N, some of them cannot manage to complete the computing tasks at all.
All in all, not only for smart enterprises from BFSI, but graph computing technology is also the one-and-only approach for AI infrastructure and industries to get digitally revolutionized and leveled up. Simply put: Graph Technology = Graph Augmented Intelligence + eXplainable AI (XAI), both of which are derived from the powerfulness, flexibility, and white-box explainability that can only be achieved by real-time graph databases. And with graph computing technology, we can expect more products and applications with AI like ChatGPT or much more futuristic than ChatGPT to show up in the market.
About Ultipa:
Ultipa?is a Silicon Valley based next-generation graph XAI and database company with operations in EMEA and APAC. The team at Ultipa believes that graph augmented intelligence and XAI empower enterprises with their digital transformation process, the process requires the convergence of data intelligence and infrastructure revolution -- in another word Ultipa graph database augments and accelerates advanced and smart data analytics, as well as Machine Learning and AI, and warrants the benefits of white-box explainability, flexibility and faster time-to-market and value. Ultipa builds the next-generation leading graph XAI and database products and killer applications in vertical domains like?Asset Liability Management (ALM),?Liquidity Risk Management (LRM), Low-code Graph-augmented BI Platform, Data Governance (RDA), and etc. Ultipa is backed by prestigious sovereign wealth fund and venture capital firms.
【Reference】
[2] What is a node;
[3] What is an edge.