Smartsheet Integration BigQuery
Smartsheet to Big Query

Smartsheet Integration BigQuery

The goal of this post is to go from the basics of data preprocessing or scraping data from Smartsheet, consolidate the data and store result in Big Query.

Here is the table of contents of this tutorial:

  1. Load Packages: In the first part, we will install Smartsheet in python environment and load package in environment.
  2. Create Smartsheet Developer Key: In the second part, we will create Smartsheet Developer Key.
  3. Find Smartsheet ID's for each sheet: In the third part, we will find Smartsheet ID's for each sheet and extract data for identified sheet.
  4. Data Preprocessing: In the fourth part, we will extract content of identified page.
  5. Load data frame to Big Query: In the last part, load final data frame to Big Query.

1. Install and Load Packages 


Install command:
pip install smartsheet-python-sdk
Load packages :
import json
import os
import requests
import smartsheet, csv
import pandas.io.gbq as pd_gbq
import pandas as pd
import warnings
warnings.filterwarnings('ignore')


2. Create Smartsheet Developer Key

  • Go to Smartsheet web portal > Profile > Apps & Integration
  • Click API Access > Generate new access token
  • Finally, make sure to save your API token.

Note : You cannot save your token later. In case missed, re-create a new one.


3. Find Smartsheet ID’s for each sheet

# Create base client object and set the access token
ss_client = smartsheet.Smartsheet('<INSERT KEY HERE>')

# Include all information 
response = ss_client.Sheets.list_sheets(include="attachments,source", include_all=True)
sheets = response.data

# print each sheet IDs and Name for further use (i.e. data extraction)
for single_sheet in response.data:
    print (single_sheet.id, single_sheet.name)
123456789012  smartsheet_name_1
123456789013  smartsheet_name_2


4. Data Preprocessing

Load data for identified Smartsheet. For example, if we would like to scrape data for Smartsheet name : smartsheet_name_2.

Add Smartsheet ID below

# Extract the conetent of Programs; JSON object
# Header details as described on SmartSheet Developer Page. 

smartsheet = 'https://api.smartsheet.com/2.0/sheets/'
sheetid = <SMARTSHEET ID>
uri = smartsheet + sheetid
header = {'Authorization': "Bearer <INSERT KEY HERE>" ,
          'Content-Type': 'application/json'}

req = requests.get(uri, headers=header)
data  ?= json.loads(req.text)

cols = []
for col in data['columns']:
    cols.append(col['title'])
 
df = pd.DataFrame(columns=cols)
 
for row in data['rows']:
    values = [] #re-initilise the values list for each row
    for cell in row['cells']:
        if cell.get('value'): #handle the empty values
            values.append( cell['value'])
        else:
            values.append('')
    df = df.append(dict(zip(cols, values)), ignore_index=True)


5. Load data frame to Big Query

#Need to rename columns for BQ to capture (with allowed BQ specification)
df.columns = ['A', 'B']

#Project Name 
project_id = 'INSERT-PROJECT-NAME'
pd_gbq.to_gbq(df, 'DATSET-NAME', project_id, if_exists='replace')

It concludes the blog post, feel free to reach out for any questions.

Follow my blog here: https://yashkarwa.github.io/posts/Smartsheet/

要查看或添加评论,请登录

Yash Karwa的更多文章

  • Overview of ML Ops: A must-have

    Overview of ML Ops: A must-have

    Alright, Let me turn from AI/ML in business one of my previous blogs to showing ML Ops value and running ML in…

  • AI/ML: A Starter Guide for Business

    AI/ML: A Starter Guide for Business

    Let me start my second blog for AI for business. If you have n't read my first blog - Data Science in Reality - Please…

    3 条评论
  • Data Science Skills in Real World

    Data Science Skills in Real World

    I see a couple of Challenges, I see in the courses vs. Real-time.

  • Data Science Lifecycle

    Data Science Lifecycle

    Well, I don’t want to go over another post on Data Science framework that you can find on the Internet by just googling…

社区洞察

其他会员也浏览了