登录查看更多内容

Trying Google Gemini for Data & Code Analysis

Elias Dabbas

Digital Marketing meets Data Science ––> advertools

发布日期: 2024年5月15日

This is a quick overview and my first attempt to really see how well this works. This is not a proper test, and does not provide a final verdict. It can, I hope, provide some hints and insights.

Approach

I used Gemini for two tasks, both involving coding. One is analyzing data using very well-known and popular libraries, and one for writing code to crawl a website with a set of specifications, using a less known library. The idea is to see how well it can figure out things without tons of data.

First task: Data analysis with two of Python's most popular libraries (pandas and plotly)

Prompt:

The following is a table called "data" of a Basketball player's career stats. Can you please perform some exploratory data analysis on it? Please use pandas and plotly for data processing and visualization. Please provide the code you used to run the analysis

This is the actual response, which I just copied and pasted, without any changes, and ran the code:

The code is very clear, systematic, knows when to use which type of chart, and took its own liberty to include two metrics in one (assists and steals). It also numbered the steps, and provide clarifications before each step as comments.

Running the code produced these charts:

Not only did it provide the explained code, it also provided its own analysis, here is a sample of what is said:

Benjamin Bennett Alexander 5 个月前

The Evolving Skill Requirements For Data Analysts In…

Enterprise DNA 10 个月前

Which Data Science Skills are core and which are…

Gregory Piatetsky-Shapiro 5 年前

Data Overview:
Points Trend: The player had his highest scoring output in his early career, with a gradual decline over time. However, there are notable fluctuations throughout the years.
Field Goal Percentage: Overall, the player has maintained a consistently high field goal percentage throughout his career, with a slight dip in the middle and then an improvement later.
Rebounds: The player's rebounding numbers have been relatively stable throughout his career, with some variation year to year.

The "Field Goal Percentage" was the only metric is got wrong. The rest it described very nicely. Note how well it described the "Points Trend" chart.

Second task: Crawling a website with advertools, much less known library than the previous ones

Prompt:

please write code to crawl a large website with advertools:

stop crawling after 50,000 pages
only follow links matching the pattern "/shoes/"
don't overwhelm the server, speed is not an issue

It was extremely surprising how well it knows the code, the functions:

All function parameters are correct, the setting names are exactly right.

Only, this code doesn't work.

For some reason it felt that it had to unnecessarily complicate things with a strange loop that achieves nothing. If you don't know how to use advertools, or you don't know enough Python, this code is not useful. Actually, it would be really confusing for a beginner.

This is not a comprehensive study, just two random examples.

So, the observation so far is, with topics that have tons of data, millions of examples, it can handle quite well. The data analysis is very smart, code, as well as descriptions (I asked for exploratory only). At the same time, for stuff that does not have many examples, it can make stupid mistakes, even while writing Python code (many millions of examples).

My opinion remains: LLMs are great text and natural language processing tools. They are just not "intelligent".

Trying Google Gemini for Data & Code Analysis

Elias Dabbas

Digital Marketing meets Data Science ––> advertools

Approach

First task: Data analysis with two of Python's most popular libraries (pandas and plotly)

Prompt:

领英推荐

Second task: Crawling a website with advertools, much less known library than the previous ones

更多精彩文章

社区洞察

其他会员也浏览了

Data Scientist Journey with the 100 Days of Code Challenge - Part 1

The Mystery of NULL Values: Why They Matter and How to Tackle Them

Practice Window Functions for Data Analysis with SQLite and Jupyter Notebook

Loading Data in GraphDB: Best Practices and Tools

Quantitive Data Humanism with Pokemon

Streamlit how to guide: advanced tips for Data Scientists

Choosing the Right Graphical Representation: Understanding the Differences between Bar Charts and Histograms

So you’re a data scientist? That don’t impress me much.

Meet Christopher Kusha - Data Analyst

Visualizing Data on a map has never been easier ??????? Kepler.GL

Approach

First task: Data analysis with two of Python's most popular libraries (pandas and plotly)

Prompt:

领英推荐

Second task: Crawling a website with advertools, much less known library than the previous ones

Word Similarity Matrix - Python Code

2023年3月23日

XML Sitemap Analysis - ForeignAffairs.com

2023年1月8日

Crawling and Parsing JSON-LD Data

2022年12月24日

advertools SEO Crawler - Analytics UI

2022年10月1日

advertools v0.13.0 new features

2022年2月11日

Migration and Population Density Dashboard - WorldBank Data

2019年12月22日

Gold Reserves per Country - Quarterly (updated up to Q3-2019)

2019年8月1日

Global Terrorism Database Dashboard

2018年3月21日

社区洞察

其他会员也浏览了

Data Scientist Journey with the 100 Days of Code Challenge - Part 1

The Mystery of NULL Values: Why They Matter and How to Tackle Them

Practice Window Functions for Data Analysis with SQLite and Jupyter Notebook

Loading Data in GraphDB: Best Practices and Tools

Quantitive Data Humanism with Pokemon

Streamlit how to guide: advanced tips for Data Scientists

Choosing the Right Graphical Representation: Understanding the Differences between Bar Charts and Histograms

So you’re a data scientist? That don’t impress me much.

Meet Christopher Kusha - Data Analyst

Visualizing Data on a map has never been easier ??????? Kepler.GL