??PandasAI: The future of data science analysis??
Pandas AI is an additional Python library that enhances Pandas, the widely-used data analysis and manipulation tool, by incorporating generative artificial intelligence capabilities.
As technology advances, so do the needs and expectations for data science. It is important to keep up with the rapidly changing technologies of data analysis to get insightful findings and make informed decisions.
There is one library that excels in data analysis in Python, and that library is called Pandas. For more than a decade, Pandas has been the preferred tool for modifying and analyzing structured data. But as datasets get bigger and more complicated, a tool that can easily manage these difficulties is required. This is where PandasAI comes in. It is a fascinating recent development in data analysis.
Image from Pandas-AI
PandasAI integrates the strength of Pandas with artificial intelligence abilities to deliver a seamless and intuitive data analysis experience. PandasAI's advanced algorithms and automated features enable it to handle large datasets with ease, cutting down on the time and effort needed to carry out complex data manipulations. Making data-driven decisions with confidence is made possible by its ability to intelligently spot trends, outliers, and missing figures. Whether you are a data scientist, analyst, or researcher, Pandas can greatly simplify and streamline your data analysis tasks, as it is a powerful tool for data analysis and visualization.?
Pandas AI is user-friendly for beginners; even someone with limited technical knowledge may use it to carry out challenging data analytics tasks. Its usefulness makes it possible to examine data and draw insightful conclusions quickly. With the help of this revolutionary tool, you may speed up, improve, and even enjoy your data analysis chores. It is not a replacement for Pandas but rather to be used in conjunction with Pandas.
Recommendation: When using PandasAI, benefit from its automatic data cleansing tools. You may save a lot of time and effort when preparing your data by utilizing functions like clean_data() and impute_missing_values(). Before beginning an analysis, it's usually a good idea to explore the data and assess its quality. You will avoid problems in the future if you take this simple step. We promise!
?
Basic Uses of Pandas AI
We will load the Netflix movie data using Panda's library in this basic use case. The dataset contains more than 8,500 Netflix movies and TV shows.
import pandas as pd
df = pd.read_csv("netflix_dataset.csv", index_col=0)
df.head(3)
We can get Pandas AI to perform analyses and manipulate the dataset by passing it a data frame and a prompt. In this scenario, we will prompt Pandas AI to display the records of the five longest-duration movies.
pandas_ai.run(df, prompt='What are 5 longest duration movies?')
As we can see, the longest-duration movie is Black Mirror, which runs for 312 minutes.
Let’s ask it to only display the names of the five longest-duration movies.
pandas_ai.run(df, prompt='List the names of the 5 longest duration movies.')
['Black Mirror: Bandersnatch', 'Headspace: Unwind Your Mind', 'The School of Mischief', 'No Longer kids', 'Lock Your Girls In']
To further protect your privacy, instantiate PandasAI with enforce_privacy = True, which won't send dataset headers (but only column names) to the LLM.
In a second example, we'll create and utilize three data frames to generate analysis with Pandas AI.
领英推荐
Pandas AI will initially join df1 and df2 based on "store" and then df2 and df3 based on "location." After that, it will process the merged dataset and return a result in a few seconds. A data scientist would have needed at least 10 minutes to comprehend the data and develop a solution.
# DataFrame 1
df1 = pd.DataFrame({
'sales': [100, 200, 300],
'store': ['Walmart', 'Target', 'Walmart']
})
# DataFrame 2
df2 = pd.DataFrame({
'revenue': [400, 500, 600],
'store': ['Walmart', 'Target', 'Walmart'],
'location': ['North', 'South', 'West']})
# DataFrame 3
df3 = pd.DataFrame({
'profit': [700, 800, 900],
'location': ['North', 'South', 'West'],
'employees': [20, 25, 30]})
pandas_ai.run([df1,df2,df3], prompt='How many employees work at Walmart?')
50
Pandas AI Command Line Interface (CLI)
Pandas AI CLI is an experimental tool that can be installed by cloning the repository and going directly to the project.
!git clone https://github.com/gventuri/pandas-ai.git
%cd pandas-ai
Following that, we will design and activate a virtual environment using poetry.
!poetry shell
Please keep in mind that if poetry is not already installed on your system, you can do so using curl
-sSL https://install.python-poetry.org | python3 -
Install the dependencies inside the activated environment using the code below.
!poetry install
Finally, open a terminal and use the Pandas AI CLI tool. A dataset, model name, and prompt must be provided. Pai will obtain the token from the .env file if no token is given.
!pai -d "netflix_dataset.csv" -m "openai" -p "What are 5 longest duration movies?"
? -d, --dataset: The dataset's file path.
? -t, --token: Your HuggingFace or OpenAI API token.
? -m, --model: The LLM model to use. Options are: openai, open-assistant, starcoder, falcon, azure-openai, or google-palm.
? -p, --prompt: The prompt for PandasAI to execute.
?
More functions and features that can help you streamline your workflow can be found in the Pandas AI documentation.
?
To conclude
Pandas AI has the ability to transform data analysis by leveraging large language models to derive insights from datasets. While data scientists often spend a substantial amount of time cleaning, exploring, and visualizing data, Pandas AI automates many of these tedious activities.
Pandas AI, like any AI tool, has limits and cannot replace humans. Human verification of the analyzed data is often required to verify accuracy and spot any edge situations.
In the examples above, we learned how to utilize Pandas AI for data analysis in this post. To get business insights, Pandas AI was used to do data analytic tasks.
Contact Synchronous Services for data analysis with PandasAI solutions. We will derive meaningful insights for you from complex, large, and big data sets. Hence, it saves you time and makes you ready for informed decision-making.?