ChatGPT - A master at data comprehension, a pretty good analyst and an entry level Data Scientist

ChatGPT - A master at data comprehension, a pretty good analyst and an entry level Data Scientist

Chilling on a Sunday, I thought i'll take Chat-GPT for a spin to uncover how good it can understand structure from unstructured data, find insights and may do some analysis, and push it with ML. It blew away my expectations. The conversation became so interesting that I kept going, until it hit me with a throttle cap on usage. I am sharing the actual conversations here with commentary -

I started by prompting GPT to act like a crawler, analyst focussed on real-estate(largest banking product category in the US).

No alt text provided for this image

It came back with a standard answer. I think this is linked to the work OpenAI is doing to build safety around it.

No alt text provided for this image

I realized that i will have to pivot my approach here. I gave it a link to a house in Zillow and asked if it understood this page.

No alt text provided for this image

The answer made me more curious. It was able to extract the exact address and various other fields of the house outlined in the web-page. This gave me confidence on my hypothesis that GPT can find structure from unstructured data. To confirm the same, I took it further by asking all the other details about the house, and it came back with accurate answers.

No alt text provided for this image

I wanted to take it a step further to understand if the underlying embeddings understand the relationship between words and are contextual. The power of using self-attention(as outlined in the paper -Attention is all you need) is that embeddings are contextual. This means a word like "big" would mean different things depending on what it's nearby words are. This also worked like a charm as outlined in the example below -

No alt text provided for this image

I wanted to see under the hood as to what all GPT understands. I tried providing it another Zillow link and asked it to tell me what all it understood about the page.

No alt text provided for this image

The answer confirmed my belief that the future of search has changed forever. When the underlying system/model can understand all the structure in the data then why return links when you can answer the actual question. The link model was limited by not having the right technology that can create a knowledge graph of the entire web. Chat-GPT is a manifestation that now this is possible. I recollect Google bought a company ages back to build a small knowledge graph just out of Wikipedia. This is the knowledge graph of the entire WEB. You can ask questions and summarize anything.

Happy with this, I wanted to see if GPT can also work like an analyst so i fed it another link, and asked it create a merged table out of it. Again it worked like a charm.

No alt text provided for this image

It missed out the price of the house but I was able to prompt it to read the same as shown below -

No alt text provided for this image

This gave me confidence that GPT can learn structure using the self-attention based embeddings. For it to work like an analyst, it needs to understand numbers and kind of do operations with them. To validate this I tried the following -

No alt text provided for this image

Then I built my dataset and asked GPT to create a pivot table on the features of the house and my dataset was ready -

No alt text provided for this image

For fun, I tried doing a transpose of this matrix and it came back with something cool -

No alt text provided for this image

Time to see if could work like an analyst

No alt text provided for this image

I realized that may be it's running out of the size of the context window so I tried to re-prompt it to go look for tax data again in the table. It did work. The part which I don't understand is how it's able to go look for local context and also retain global context.

No alt text provided for this image

This was very cool. As a language model, it doesn't now how to sort data but this is where the plugin architecture introduced by Open AI makes it feel like magic. It wrote the python code, and ran it on this data to give me the right answer. This was WOW!. What more can it do?

No alt text provided for this image

Then I asked it to summarize this data for me. This was shown by Microsoft in their co-pilot for XLS demo. I was stunned with it's ability to aggregate data, and summarize by each attribute of the house

No alt text provided for this image

I wanted to take it a step further to see if it can give any insights on this data. This is the holy-grail of data analysis for executives. I was quite impressed on how it was able to use various mathematical functions on each attribute of the house and even write a narrative around the same.

No alt text provided for this image

Can it do some basic stats on this data as well to give better answers

No alt text provided for this image
No alt text provided for this image

Let's try to enter the feature engineering arena to find which are the best features to predict house prices

No alt text provided for this image

Let's enter the modeling arena by asking it which model would be good to fit this dataset

No alt text provided for this image

Can it understand which dimensions of the data are categorical and which are numbers that would need scaling. Oh Yes!.

No alt text provided for this image
No alt text provided for this image

It also gave a simple explanation of the code. This was quite useful. I knew that it has access to python runtime so I asked it to run and fit the model, and here came the output -

No alt text provided for this image

and a simple and human friendly explanation outlining what it has learnt

No alt text provided for this image

I was curious that can it help me get into the ML arena which is a very iterative and experimental field.

No alt text provided for this image

I was further curios if it COULD recommend which is the right direction for me. This is where the experience of a data scientist comes into play to prune out the other possibilities. I don't how it does this.

No alt text provided for this image

Can it run this technique?

No alt text provided for this image
No alt text provided for this image

It did it!

Does this(Lasso regression) improve the eventual prediction

No alt text provided for this image

The error did go down from 0.78 above to 0.73. Yay!! What does GPT think?

No alt text provided for this image

Can we push it even further? Can it suggest how to improve the error further?

No alt text provided for this image

Can it choose one technique basis my dataset and current model performance? This is what a lot of data-scientists spend time on. I don't how it's doing this. I am very curious to find this out.

No alt text provided for this image

Let's go! Let's code, run and see the improvement in error.

No alt text provided for this image

So did Random-Forest improve over Multi-linear regression backed by Lasso regression, and the answer is YES!

No alt text provided for this image

Can it suggest more ways to make it better?

No alt text provided for this image

Let's use what's recommending and here is the code.

No alt text provided for this image

This is a 1000X better developer experience. Much better than CoPilot in IntelliJ as well. I think the future is where we have CoPilot to edit the code and ChatGPT to help us with data collection, data analysis, data cleanup, feature engineering, and actually training/tweaking/iterating on model.

I want a world where CHAT-GPT is connected to the cloud and I can access CPU, GPU's and all ML frameworks with CoPilot integrated. This will be the new way to improve developer productivity. I could do all of this in net 1 hour!

Himadri Sarkar

Improving the world one line of code at a time.

1 年

It seems like support for reading from the internet was completely removed. It is no more able to access the content from links even if they are older than 2021. Did you have to copy paste the content for the zillow property links?

回复
Nitin Gopi

Architect | Serverless | Data Engineering | BI | Microservices | Devops | AWS | Mobile | Web | ETL | JLPT N5

1 年

Nice

回复
Krishnaprasad Shivdasan

Engineering Leader | Architect | Entrepreneur | Cloud | SaaS | Distributed Systems | AI/ML

1 年

A fun read, like an actual discussion with a data scientist

Chris Pease

CEO at Bid Vector, Urgent VOICE, & VeloCam Services

1 年

Great to see the iterative interaction, reflection, and copiloting boost to productivity. Thanks for posting Inder Singh

Ram Ganapathy

VP Engineering / Group Director leading Search, Personalization and Omni Services at Walmart Global Tech India, learning diverse challenges.

1 年

This is a real world prototype of Chat GPT as a copilot IMO.

要查看或添加评论,请登录

Inder Singh的更多文章

社区洞察

其他会员也浏览了