Sampling when modeling large data sets
Keith McCormick
Teaching over a million learners about machine learning, statistics, and Artificial Intelligence (AI) | Data Science Principal at Further
I thought it would be fun to share one of the videos in my new course that is not a preview video. I find that folks will do almost anything to avoid sampling data during modeling. Frankly, computers are so powerful and fast these days that I rarely have to do it, but it is crazy how much effort goes into avoiding it. It's not difficult to do. It does not ruin the model. It can be both simple and effective.
The course as a whole tries to introduce just a bit of skepticism about large data sets. We all have large data sets, but do we worry too much about that? I think we might. Most of my colleagues feel the same way but I felt that I couldn't simply say "Don't worry" - I had to explain the whole process step by step. I'm very pleased that LinkedIn Learning gave me the opportunity to do exactly that in this unique course. If while taking the course you conclude that you do have to upgrade your infrastructure to tackle increasing data volume I have some good recommendations for other courses in the library that cover that well.