How to get started with Google Colab and?Kaggle

How to get started with Google Colab and?Kaggle

Contents

 1. Introduction

 2. Pre-requisites

 3. Connections

 4. Loading Dataset

Introduction

As I began my data science journey, I had hardware and processing unit constraint. I am sure lots of us had these issues. There were two options infront of me viz.

(1) Buy an expensive laptop with great processing GPU's

2) Get access to cloud

I chose to go with a cloud platform. After some research, I stumbled across Google Colab. It was exactly what I needed. Google Colab offers free GPU services to enhance the processing time. Also, my favorite part is that it is easy to connect it with Kaggle to join any competition or load any huge dataset.

This blog is about how you can achieve the same. Let's jump right into it.

Pre-requisites

 1) Google Account

 2) Kaggle account

Connections

Firstly, before we begin, we must ensure that Google colab can access datasets list and competitions list. Once it , we can start building our ML pipeline.

 Step 1: Setup Notebook on Google Colab

Visit the Google Colab home page (P.S. Ensure you are logged into your google account), you will see below screen. Here, click NEW PYTHON 3 NOTEBOOK button.

No alt text provided for this image

This will fire up your Jupyter Notebook ready to use. You can rename the notebook as per choice.

Step 2: Get Kaggle Access API token

Visit the Kaggle home page, login to your account. On the right top corner, click your profile image, open your profile. Click the more tab and choose Account option. Scroll down and find API section, click on Create New API Token button. Please see below image for more insights.

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

This will download a json file. This file will connect your Google Colab to your Kaggle account

Loading Dataset

Navigate back to your google colab notebook you created earlier. Here use the below code to upload your json file downloaded earlier.

 from google.colab import files
files.upload()

Please see below

No alt text provided for this image

After the cell execution is completed, execute below code

!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

After executing that code, execute the below code

!kaggle datasets list
No alt text provided for this image

As you can now see the datasets list, you load any dataset of your choice. Here I am loading FIFA19 dataset. Please see below.

# download the dataset
!kaggle datasets download -d karangadiya/fifa19
# view the list of files
!ls
# unzip and view the list of files
!unzip fifa19.zip
!ls
# load dataset
import pandas as pd
data = pd.read_csv('data.csv')
data.head()
No alt text provided for this image

 As the data is loaded, we can start building the ML pipeline.

Conclusion

At this point, your google colab notebook is connected with Kaggle datasets. In case you want to load dataset of active competitions, All you need to change the commands as below

!kaggle competitions list
!kaggle competitions download -c LANL-Earthquake-Prediction

Rest of code remains the same.

Hope this was helpful. Do let me know your comments.

Thanks..!!

Ramaswami (Rama) Viswanathan

Consultant & Trainer - Six Sigma, Lean, Data Science, ML, NLP

5 年

Very useful to many

回复

要查看或添加评论,请登录

Akash Bhiwgade的更多文章

社区洞察

其他会员也浏览了