Smartsheet Integration BigQuery
The goal of this post is to go from the basics of data preprocessing or scraping data from Smartsheet, consolidate the data and store result in Big Query.
Here is the table of contents of this tutorial:
- Load Packages: In the first part, we will install Smartsheet in python environment and load package in environment.
- Create Smartsheet Developer Key: In the second part, we will create Smartsheet Developer Key.
- Find Smartsheet ID's for each sheet: In the third part, we will find Smartsheet ID's for each sheet and extract data for identified sheet.
- Data Preprocessing: In the fourth part, we will extract content of identified page.
- Load data frame to Big Query: In the last part, load final data frame to Big Query.
1. Install and Load Packages
Install command:
pip install smartsheet-python-sdk
Load packages :
import json import os import requests import smartsheet, csv import pandas.io.gbq as pd_gbq import pandas as pd import warnings warnings.filterwarnings('ignore')
2. Create Smartsheet Developer Key
- Go to Smartsheet web portal > Profile > Apps & Integration
- Click API Access > Generate new access token
- Finally, make sure to save your API token.
Note : You cannot save your token later. In case missed, re-create a new one.
3. Find Smartsheet ID’s for each sheet
# Create base client object and set the access token ss_client = smartsheet.Smartsheet('<INSERT KEY HERE>') # Include all information response = ss_client.Sheets.list_sheets(include="attachments,source", include_all=True) sheets = response.data # print each sheet IDs and Name for further use (i.e. data extraction) for single_sheet in response.data: print (single_sheet.id, single_sheet.name) 123456789012 smartsheet_name_1 123456789013 smartsheet_name_2
4. Data Preprocessing
Load data for identified Smartsheet. For example, if we would like to scrape data for Smartsheet name : smartsheet_name_2.
Add Smartsheet ID below
# Extract the conetent of Programs; JSON object # Header details as described on SmartSheet Developer Page. smartsheet = 'https://api.smartsheet.com/2.0/sheets/' sheetid = <SMARTSHEET ID> uri = smartsheet + sheetid header = {'Authorization': "Bearer <INSERT KEY HERE>" , 'Content-Type': 'application/json'} req = requests.get(uri, headers=header) data ?= json.loads(req.text) cols = [] for col in data['columns']: cols.append(col['title']) df = pd.DataFrame(columns=cols) for row in data['rows']: values = [] #re-initilise the values list for each row for cell in row['cells']: if cell.get('value'): #handle the empty values values.append( cell['value']) else: values.append('') df = df.append(dict(zip(cols, values)), ignore_index=True)
5. Load data frame to Big Query
#Need to rename columns for BQ to capture (with allowed BQ specification) df.columns = ['A', 'B'] #Project Name project_id = 'INSERT-PROJECT-NAME' pd_gbq.to_gbq(df, 'DATSET-NAME', project_id, if_exists='replace')
It concludes the blog post, feel free to reach out for any questions.
Follow my blog here: https://yashkarwa.github.io/posts/Smartsheet/