Data Visualization with ChatGPT: How far is AI in effectively communicating data insights?
Chadi Abi fadel
Senior Data Scientist @ Student World | Python, Process Optimization, Automation
Executive Summary
Data Visualization
Using AI
Assessing the Results of my experiment
Further Considerations
Introduction
Data is now more abundant than ever—a fact that no longer surprises us. However, what continues to evolve are the tools we use to access and communicate this data. Among these, data visualization plays a pivotal role. With the rise of artificial intelligence (AI), integrating AI into data visualization and communication seems promising. But how effective is AI in enhancing data visualization? Specifically, what are the key considerations when employing a large language model for data visualization?
In addressing these questions, I draw on insights from a recent data communication course led by Dr. Khimji Vaghjiani , where I learned about the underlying philosophies that drive effective data analysis and storytelling. This article will evaluate a visualization generated using GPT-4, guided by standards from my coursework at UNSW. This comparison aims to establish a solid benchmark for assessing the quality of outputs produced by GPT-4.
Throughout this edition of "Digital Da Vincis," I will also focus on user experience and the intuitiveness of chat interfaces, exploring how modern tools might evolve to meet future needs.
Data Visualization
Data Visualization for Data Communication
To evaluate the quality of AI-driven data visualization, we must first understand the essence and purpose of data visualization itself.
For the last fifty years, the volume of data has grown exponentially. Today, more data is collected than can possibly be analyzed thoroughly (The Economist Wikipedia). Consequently, teams often lack the time to review all available data. For instance, business leaders might focus on aggregated values like total sales for May rather than analyzing every individual transaction. This aggregation simplifies data interpretation but also involves trade-offs, such as losing specific details like the exact sales figures on, let's say, May 12th. Without raw datasets, obtaining metrics like a monthly average would require additional computations, using tools like Excel, Python, SQL, or specialized Business Intelligence (BI) software. BI software in particular facilitate intuitive data interfaces for more efficient data communication. They do this by integrating complex statistical operations and UX design principles to enhance accessibility and understanding for users. It's no wonder that the global business intelligence market size was valued at $29.42 billion in 2023 & is projected to grow from $31.98 billion in 2024 to $63.76 billion by 2032 (Read More at: Fortune Business Insights)
When communicating data, visualisations are a powerful tool for building a story behind the data. it's easier for us to understand a graph or a bar chart, compared to an extensive Excel sheet. After all, as humans stories stick to our minds more than a benign collection of numbers. Choosing the riight visualisation according to the data at hand is essential for conveying a message more effectively. (https://perceptualedge.com/articles/misc/Graph_Selection_Matrix.pdf). For example, when delving intno geospacial data , points are your best bet, as points can be conveniently spread on a map, compared to box plots which make life a lot harder.
This data is communicated to an audience with specific needs, so it's essential who we are talking with. Our final goal is to have an integrated experience tha caters to our audience. When communicating data, visualizations are a powerful storytelling tool, transforming abstract numbers into compelling narratives. Graphs and bar charts, for example, are easier to understand than extensive spreadsheets. As humans, we find stories more memorable than collections of numbers. Choosing the right visualization for the data at hand is essential for effectively conveying a message (source: Perceptual Edge). For instance, when exploring geospatial data, using points on a map is often more practical than employing box plots, which can complicate the presentation.
Data is communicated to an audience with specific needs, making it vital to understand who we are addressing. Our ultimate goal is to provide an integrated experience that caters to our audience. This is where the customer personas, a concept borrowed from UX and marketing, help us? by mapping out key characteristics of our target audience, which influences the final presentation (source: UX Design Institute). For example, in a recent project on the Economic visions of the GCC, I targeted technologists, data scientists, and management consultants who are either starting their careers in the GCC or considering moving there. This audience shares common traits: they are busy and tech-savvy. This understanding led me to choose a voice-over PowerPoint format with recorded automation for the slides.?
This format is particularly effective because it minimizes the time required to digest the information, while the inclusion of PowerPoint slides allows for the addition of detailed data as needed. Furthermore, given the tech proficiency of the audience, there's no concern about their ability to engage with the medium. This approach also helps put together all the elements into a coherent narrative that resonates.
Audience-Centric Visualizations
Correctly visualizing our data is important, but enhancing these visualizations to be more engaging and understandable is equally vital. After all, data points are just numbers unless they're given meaning. For instance, merely visualizing the GDP of the UAE and Saudi Arabia is less informative than contextualizing it within their economic diversification efforts, highlighting that GDP growth is expected to stabilize as these countries continue their diversification. In addition to presenting the data, this explains its significance, enhancing the viewer's understanding. (https://www.uxdesigninstitute.com/blog/what-are-ux-personas/), taken from the field of UX or marketing, come into play, where we map ouit the important characteristics of our target audience, affecting the final presentation we have in hand. For example, I was recently engaged in a project centered on the Economic visions of the GCC (See more: Chadi Abi Fadel - Essential Skills in a Tech-Driven Economy), and this portfolio would benefit any technologist, data scientist or management consultant at the beginning of their career in the GCC area or wanting to move there. These target audience have in common that they are busy and they are tech profiscient, giving me the ability to send a voice over powerpoint, that has recorded automation with the slides. This format fits the personas and my goals so well on many ways: first, the automated nature of the presentation reduced the time needed to consume this information, and having the powerpoint slides grants the possibility to include more information if they need it. Second, given that this persona is well versed with tech, I wouldn't have to worry about them not being able to play it. THis is paired with bringing everything into a coherent story that sticks.
Making correct visualisation of our data is one thing, but augmenting them to become engaging and more understandable is another. After all, values are just values, we need to understand what we are talking about, and what is our point. For exmple, it's one thing to visualise the GDP of the UAE and Saudi only, versus contextualizing the whole thing with their economic visions of diversification and mentioning the reason behind using the visual, in this case it's to say that the GDP growth is expected to stabilize looking into the future due to the diversification efforts of these countries.
My Coding Adventure With GPT-4
The Setup for Using AI for Data Visualization
I tried using AI for visulizations by replicating the steps outlined in a Medium article (See more: CSV to PDF: Prompting GPT-4 for Automatic Data Viz Creation), using GPT-4 to create visual representations of data. The process, while promising, had its challenges. Initially, I encountered several errors related to the file management system rather than the AI's logic, underscoring the early yet potent capabilities of AI in handling complex tasks.
To begin, I uploaded a CSV file into ChatGPT, which utilized its recent feature to read and query rich text files (Click Here: Open AI file Upload FAQ). By employing its code interpreter capabilities, GPT-4 imported the data using the pandas library. This step set the stage for the subsequent data manipulation and visualization.
Creating Visuals with AI
The following screenshot is the prompt I used:
The visualization process revealed the robustness of GPT-4's coding capabilities.
Visuals
In the first run of the prompt, GPT4 generated code to generate convincing visuals.
领英推荐
Global map
For the global map, GPT-4 employed libraries such as geopandas, matplotlib, and seaborn. These tools were instrumental in reading Geometric Information System (GIS) data and producing visually appealing maps.
Bar chart
For the bar chart, it tried on multiple occasions to generate it, bu the code interpreter kept on running into errors mainly due to libraries not already set oup.
After retrying different codes multiple times, a matter that used to take human analysts hours or days, it generated the following visual using pandas and matplotlib:
Box plot
For the boxplot, numpy was used in addition to pandas and seaborn to generate this image:
Creating the Dashboard
To construct the dashboard, I used ChatGPT along with the PIL library to organize the data into an image. Regrettably, I no longer have the visual available as I did this analysis a month prior to writing this article. However, I can say that the dashboard didn't provide a compelling data story and lacked the user experience elements needed to make it truly engaging.
I attempted this prompt three times, encountering different errors on each trial, which showed the value of maintaining simplicity in system design - if something can go wrong, it likely will. Eventually, I managed to produce a visual. Although aesthetically pleasing - it reminds me of something one might see at an art exhibition - it lacked the precision required for effective data visualization.
Assessing the viz
Evaluating ChatGPT's performance requires using established benchmarks. For this purpose, I applied the marking rubric from my data visualization course at UNSW as my benchmark. The results were promising: although ChatGPT struggled to meet some of the more advanced criteria that demanded precise and engaging data storytelling, it excelled in two fundamental areas. Firstly, it effectively described the data, and secondly, it was capable of generating complex visuals. These tasks, which traditionally require extensive coding skills and considerable time spent consulting resources like Stack Overflow and various library documentation sites, were accomplished easily by ChatGPT.
Further Considerations
Looking ahead, it's evident that GPT-4, with its new capabilities such as the code interpreter and file reader, can already assist with the initial stages of data visualization. However, there is much room for improvement. Given the ongoing developments in large language models (LLMs) and a history of rapid innovation, I anticipate significant enhancements in these areas.
Here are steps we can take now:
Conclusion
As we delve deeper into the era of data abundance, the significance of evolving our tools for data access and communication cannot be overstated. This exploration into the integration of AI, specifically through the use of large language models like GPT-4 for data visualization, has revealed both promising potentials and notable challenges. Throughout this journey, from the insightful teachings of Dr. Khimji Vaghjiani to the practical application of AI in visual storytelling, we have seen how AI can transform complex data into accessible and engaging narratives.
The experiences detailed in this edition of "Digital Da Vincis" emphasize the critical role of user experience and the intuitiveness of chat interfaces in the evolution of data visualization tools. The technology is moving rapidly, but as we have seen, it is not without its pitfalls. Errors in file management, difficulties with software libraries, and the need for better data storytelling are all reminders that while AI can significantly enhance our capabilities, it does not replace the need for careful planning, understanding of the audience, and thoughtful design.
Our journey with GPT-4, from generating global maps to attempting intricate dashboards, illustrates the vast capabilities of AI in data visualization. Yet, it also underscores the necessity for ongoing enhancements and the development of more intuitive, user-friendly tools that align with the needs of diverse audiences.
As we look forward, refining our prompts, ensuring the robustness of our systems, and preparing for the unexpected are imperative. By embracing these practices, we can harness the full potential of AI in our quest to turn data into meaningful stories that resonate with and inform our audiences. The future of data visualization is bright, and with AI, we are just scratching the surface of what's possible. Let's continue to innovate, learn, and grow as we navigate this exciting landscape together.
Let's Stay Connected
I trust you enjoyed this deep dive into the convergence of technology and its profound influence on our modern world. For those keen on further exploring the nuances of digital innovation and its implications, I invite you to stay connected:
I look forward to continuing the conversation and exploring the future of AI together.
#DataVisualization #ArtificialIntelligence #GPT4 #DataScience #BusinessIntelligence #DecisionMaking #DataStorytelling #BigData #AIinBusiness #DigitalTransformation #DataDrivenDecisionMaking #AIforDataViz #TechnologyInnovation #DataCommunication #ExecutiveSummary #LargeLanguageModels #DataAnalysis
Your Success Matters
4 个月Interesting!