课程: Python: Working with Predictive Analytics

免费学习该课程!

今天就开通帐号,24,700 门业界名师课程任您挑!

Divide the data into test and train

Divide the data into test and train

- [Instructor] We are still in the data preparation step of the predictive analytics roadmap. At this stage, we need to divide the data into train and test datasets. The train dataset contains known outputs and is used to train the prediction model. The test dataset, however, is used to evaluate how well the model performs on unseen new data. Imagine our data now as separate wooden blocks where each column is an individual data frame. Stacking them together, gives us the final data frame. Sometimes we might reduce dimensions to make processing faster, but we won't cover that here. Well, why do we need to split the data? The trained dataset is used to train the model, while the test dataset ensures the model generalizes well to unseen data. In other words, it doesn't memorize the model. Think of it like a fast food line. You've seen kids ordering from the kids menu many, many times before. Consider that as training data. When you see a new kid in the line that represents unseen new…

内容