课程: Predictive Analytics Essential Training: Data Mining

今天就学习课程吧!

今天就开通帐号,24,600 门业界名师课程任您挑!

Selecting relevant data

Selecting relevant data

- [Instructor] This next topic is one of my favorites because there's so much confusion around it. Folks often think that it's either easier or better or both to use all of your data. When big data first became a popular phrase, a widely read book came out that tried to suggest that drawing a sample from a population was old fashioned, that the only reason we used a sample was that computers at the time couldn't handle large datasets. It creates this image that we just throw the data in and let the algorithm figure it out. We still sample for lots of reasons. One good one is you wouldn't want to drain the whole river to test the water, but there are other reasons you can't or shouldn't use all the data. So our next element is that you have to be thoughtful about the data that you select. And here, we're focused on selecting the cases or instances. In other words, the rows of the dataset. The most important reason that we…

内容