How to get started with Google Colab and?Kaggle
Akash Bhiwgade
Accomplished Deep Learning Engineer, Algorithms, Modelling, Web Development, Machine Learning, Natural Language Processing, Large Language Models, Computer Vision.
Contents
1. Introduction
2. Pre-requisites
3. Connections
4. Loading Dataset
Introduction
As I began my data science journey, I had hardware and processing unit constraint. I am sure lots of us had these issues. There were two options infront of me viz.
(1) Buy an expensive laptop with great processing GPU's
2) Get access to cloud
I chose to go with a cloud platform. After some research, I stumbled across Google Colab. It was exactly what I needed. Google Colab offers free GPU services to enhance the processing time. Also, my favorite part is that it is easy to connect it with Kaggle to join any competition or load any huge dataset.
This blog is about how you can achieve the same. Let's jump right into it.
Pre-requisites
1) Google Account
2) Kaggle account
Connections
Firstly, before we begin, we must ensure that Google colab can access datasets list and competitions list. Once it , we can start building our ML pipeline.
Step 1: Setup Notebook on Google Colab
Visit the Google Colab home page (P.S. Ensure you are logged into your google account), you will see below screen. Here, click NEW PYTHON 3 NOTEBOOK button.
This will fire up your Jupyter Notebook ready to use. You can rename the notebook as per choice.
Step 2: Get Kaggle Access API token
Visit the Kaggle home page, login to your account. On the right top corner, click your profile image, open your profile. Click the more tab and choose Account option. Scroll down and find API section, click on Create New API Token button. Please see below image for more insights.
This will download a json file. This file will connect your Google Colab to your Kaggle account
Loading Dataset
Navigate back to your google colab notebook you created earlier. Here use the below code to upload your json file downloaded earlier.
from google.colab import files
files.upload()
Please see below
After the cell execution is completed, execute below code
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
After executing that code, execute the below code
!kaggle datasets list
As you can now see the datasets list, you load any dataset of your choice. Here I am loading FIFA19 dataset. Please see below.
# download the dataset
!kaggle datasets download -d karangadiya/fifa19
# view the list of files
!ls
# unzip and view the list of files
!unzip fifa19.zip
!ls
# load dataset
import pandas as pd
data = pd.read_csv('data.csv')
data.head()
As the data is loaded, we can start building the ML pipeline.
Conclusion
At this point, your google colab notebook is connected with Kaggle datasets. In case you want to load dataset of active competitions, All you need to change the commands as below
!kaggle competitions list
!kaggle competitions download -c LANL-Earthquake-Prediction
Rest of code remains the same.
Hope this was helpful. Do let me know your comments.
Thanks..!!
Consultant & Trainer - Six Sigma, Lean, Data Science, ML, NLP
5 年Very useful to many