ChatGPT - A master at data comprehension, a pretty good analyst and an entry level Data Scientist
Chilling on a Sunday, I thought i'll take Chat-GPT for a spin to uncover how good it can understand structure from unstructured data, find insights and may do some analysis, and push it with ML. It blew away my expectations. The conversation became so interesting that I kept going, until it hit me with a throttle cap on usage. I am sharing the actual conversations here with commentary -
I started by prompting GPT to act like a crawler, analyst focussed on real-estate(largest banking product category in the US).
It came back with a standard answer. I think this is linked to the work OpenAI is doing to build safety around it.
I realized that i will have to pivot my approach here. I gave it a link to a house in Zillow and asked if it understood this page.
The answer made me more curious. It was able to extract the exact address and various other fields of the house outlined in the web-page. This gave me confidence on my hypothesis that GPT can find structure from unstructured data. To confirm the same, I took it further by asking all the other details about the house, and it came back with accurate answers.
I wanted to take it a step further to understand if the underlying embeddings understand the relationship between words and are contextual. The power of using self-attention(as outlined in the paper -Attention is all you need) is that embeddings are contextual. This means a word like "big" would mean different things depending on what it's nearby words are. This also worked like a charm as outlined in the example below -
I wanted to see under the hood as to what all GPT understands. I tried providing it another Zillow link and asked it to tell me what all it understood about the page.
The answer confirmed my belief that the future of search has changed forever. When the underlying system/model can understand all the structure in the data then why return links when you can answer the actual question. The link model was limited by not having the right technology that can create a knowledge graph of the entire web. Chat-GPT is a manifestation that now this is possible. I recollect Google bought a company ages back to build a small knowledge graph just out of Wikipedia. This is the knowledge graph of the entire WEB. You can ask questions and summarize anything.
Happy with this, I wanted to see if GPT can also work like an analyst so i fed it another link, and asked it create a merged table out of it. Again it worked like a charm.
It missed out the price of the house but I was able to prompt it to read the same as shown below -
This gave me confidence that GPT can learn structure using the self-attention based embeddings. For it to work like an analyst, it needs to understand numbers and kind of do operations with them. To validate this I tried the following -
Then I built my dataset and asked GPT to create a pivot table on the features of the house and my dataset was ready -
For fun, I tried doing a transpose of this matrix and it came back with something cool -
Time to see if could work like an analyst
I realized that may be it's running out of the size of the context window so I tried to re-prompt it to go look for tax data again in the table. It did work. The part which I don't understand is how it's able to go look for local context and also retain global context.
This was very cool. As a language model, it doesn't now how to sort data but this is where the plugin architecture introduced by Open AI makes it feel like magic. It wrote the python code, and ran it on this data to give me the right answer. This was WOW!. What more can it do?
Then I asked it to summarize this data for me. This was shown by Microsoft in their co-pilot for XLS demo. I was stunned with it's ability to aggregate data, and summarize by each attribute of the house
I wanted to take it a step further to see if it can give any insights on this data. This is the holy-grail of data analysis for executives. I was quite impressed on how it was able to use various mathematical functions on each attribute of the house and even write a narrative around the same.
Can it do some basic stats on this data as well to give better answers
领英推荐
Let's try to enter the feature engineering arena to find which are the best features to predict house prices
Let's enter the modeling arena by asking it which model would be good to fit this dataset
Can it understand which dimensions of the data are categorical and which are numbers that would need scaling. Oh Yes!.
It also gave a simple explanation of the code. This was quite useful. I knew that it has access to python runtime so I asked it to run and fit the model, and here came the output -
and a simple and human friendly explanation outlining what it has learnt
I was curious that can it help me get into the ML arena which is a very iterative and experimental field.
I was further curios if it COULD recommend which is the right direction for me. This is where the experience of a data scientist comes into play to prune out the other possibilities. I don't how it does this.
Can it run this technique?
It did it!
Does this(Lasso regression) improve the eventual prediction
The error did go down from 0.78 above to 0.73. Yay!! What does GPT think?
Can we push it even further? Can it suggest how to improve the error further?
Can it choose one technique basis my dataset and current model performance? This is what a lot of data-scientists spend time on. I don't how it's doing this. I am very curious to find this out.
Let's go! Let's code, run and see the improvement in error.
So did Random-Forest improve over Multi-linear regression backed by Lasso regression, and the answer is YES!
Can it suggest more ways to make it better?
Let's use what's recommending and here is the code.
This is a 1000X better developer experience. Much better than CoPilot in IntelliJ as well. I think the future is where we have CoPilot to edit the code and ChatGPT to help us with data collection, data analysis, data cleanup, feature engineering, and actually training/tweaking/iterating on model.
I want a world where CHAT-GPT is connected to the cloud and I can access CPU, GPU's and all ML frameworks with CoPilot integrated. This will be the new way to improve developer productivity. I could do all of this in net 1 hour!
Improving the world one line of code at a time.
1 年It seems like support for reading from the internet was completely removed. It is no more able to access the content from links even if they are older than 2021. Did you have to copy paste the content for the zillow property links?
Architect | Serverless | Data Engineering | BI | Microservices | Devops | AWS | Mobile | Web | ETL | JLPT N5
1 年Nice
Engineering Leader | Architect | Entrepreneur | Cloud | SaaS | Distributed Systems | AI/ML
1 年A fun read, like an actual discussion with a data scientist
CEO at Bid Vector, Urgent VOICE, & VeloCam Services
1 年Great to see the iterative interaction, reflection, and copiloting boost to productivity. Thanks for posting Inder Singh
VP Engineering / Group Director leading Search, Personalization and Omni Services at Walmart Global Tech India, learning diverse challenges.
1 年This is a real world prototype of Chat GPT as a copilot IMO.