Snowpark Python On Jupyter Notebook
- Snowpark provides a data programmability interface for Snowflake.?It provides a developer experience that brings deeply integrated, DataFrame-style programming to Snowflake for the languages developers like to use including Scala, Java and Python.
- Snowpark is a new developer library in Snowflake that provides an API to process data using programming languages like Scala , Java and Python instead of SQL.
- All our data remains inside our Snowflake Data Cloud when it is processed so indirectly it reduces the cost of moving data out of the cloud to other services. Snowpark uses its own (snowflakes) processing engine that's why it eliminates the cost of external infrastructure like Spark infrastructure.
Below are the Prerequisites to run snowpark-python using anaconda.
- Download/Install jupyter notebook on your local machine.
- Create snowflake account and create objects as well.
- Python 3.8.
Step 1 : We are using Anaconda on our local machine so we need to? create a conda env.
Please? Run below command on your command prompt.
?1. conda env create -f jupyter_conf_env.ym
2. conda activate getting-started-snowpark-python
Please create jupyter_conf_env.yml file.?
name: getting_started_snowpark_python
channels:
? - https://repo.anaconda.com/pkgs/snowflake
dependencies:
? - python=3.8
? - scikit-learn==1.0.2
? - jupyter==1.0.0
? - numpy
? - matplotlib
? - seaborn
? - snowflake-snowpark-python[pandas]==0.7.0
Conda will automatically install snowflake-snowpark-python==0.7.0 and all other dependencies for you.
Step 2 : Once Snowpark is installed, create a separate? kernel? for Jupyter:
?python -m ipykernel install --user --name=getting-started-snowpark-python
Step 3 : Now, launch Jupyter Notebook using below command on your local machine:
jupyter notebook
Step 4 : Open up the config.py file in Jupyter and modify it with your account, username, and password information.
领英推è
snowflake_conn_prop = {
? "account": "ACCOUNT",
? "user": "USER",
? "password": "PASSWORD",
? "role": "ACCOUNTADMIN",
? "database": "snowpark_quickstart",
? "schema": "TELCO",
?"warehouse": "sp_qs_wh",
}
Step 5 : Now, you are ready to get started with the notebooks. For each notebook, make sure that you select the getting-started-snowpark-python kernel when running. You can do this by navigating to:?
Kernel => Change Kernel => select getting-started-snowpark-python after launching each Notebook.
Step 6 : Write a Python program to import necessary dependencies in jupyter notebook.
from snowflake.snowpark.session import Session
from snowflake.snowpark import functions as F
from snowflake.snowpark.types import *
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
#Snowflake connection info is saved in config.py
from config import snowflake_conn_pro
# lets import some tranformations functions
from snowflake.snowpark.functions import udf, col, lit, translate, is_null, iff
from snowflake.snowpark import version
Step 7 :Write the actual Python code which Joins employee and salary table on empId and id using data frame?
#create a session using snowflake conf file
session = Session.builder.configs(snowflake_conn_prop).create()
dfLoc = session.table("employee")
dfServ = session.table("salary")
dfJoin = dfLoc.join(dfServ,dfLoc.col("id") == dfServ.col("empId"))
dfResult = dfJoin.select(col("id"),col("name"),?col("salary"))
dfResult.show()
Output of above python code :
In above screenshot we can see how the data frame is processed by? snowflake and also in background snowflake converts the data frame into SQL query and run it on its environment.
GCP Certified Professional ML Engineer | Data, Generative AI, Data Scientist@LTIMindtree
2 å¹´Thanks to your blog, i was stuck yesterday for a while on the snowpark connection error, then i remembered about your blog and it helped.
Co-Founder and CEO at Atgeir Solutions
2 å¹´Good one Nihal, thanks for sharing.