The Titanic Route and the Data Set
Albert Anthony D. Gavino, MBA
Book Writer | Data Science | Cloud Solutions
If you worked on the introductory data set on Kaggle, the Titanic data set is the default data that you could work on but it has some variable on Port of Embarkation that I had to find on the map: they were Southampton, Queenstown and Cherbourg - on the Voyage to New York.
And we found this from an Articles Factory website depicting the details of the Ports:
Southampton, located on the south coast of England, was of the Titanic’s departure. In 1907 White Star began to use Southampton as a major port.
Cherbourg in France was the first of stops to pick up passengers. Compared with Southampton Cherbourg was a smaller port without the facilities. At 6.30 on the evening of 10 April the ship anchored off the shore of Cherbourg and two small boats then serviced it and bought passengers on board.
Queenstown on the South Coast of Southern Ireland was the last passenger pick-up of the Titanic. It anchored off Roches Pint in Cork Harbour the day after leaving Cherbourg. As in Cherbourg it was serviced by other boats due to the restrictions of the port. Eight people left the ship and 123 joined for the onward journey.
New York Harbour was due to be the final destination of the Titanic but it never made it. Instead survivors reached New York via another ship, Carpathia. Carpathia responded to distress calls from the Titanic and arrived a few hours after the ship had sunk. The 710 survivors were taken on board and then reached New York.
The Titanic Data Set composes of the following variables
Passenger ID, Passenger Name, Sex, Passenger Class (1,2,3), Port of Embarkment (C,Q,S), Parents and Children, Siblings and Spouse, Ticket Number, Fare and Cabin.
You will mostly deal with this with Logistic Regression
You also need to figure it out with True Positives, True Negatives, False Positives and False Negatives. More so in Recall, Precision and F1 Scores.
More or Less, work with the Training Data to create a model and then use the Test Set to predict who Survived over who did not Survive. This is a Default Model you can work on with Titanic Data Set.
That is our Quick Default Intro to the Titanic Data Set.
#######################################################################
Bert loves to work on Data Science Projects and Data Insights that can be driven from the Data. I know that feels so much redundant but with Granular Data gives us so much more to discover. Aside from doing Data Science Work, Bert tries to work out on his mountain bike or does long walks on what to write next. It could be about Behavioural Psychology or about the Stochastic Gradient Descent or the Chain Rule and Power Rule altogether on Theta values.