Unlocking High-Performance Snowflake Integrations with OpenResty

Unlocking High-Performance Snowflake Integrations with OpenResty

INTRODUCING RESTY-SNOWFLAKE

Efficiency and performance are paramount in the ever-evolving landscape of web applications and data management. Today, I am thrilled to introduce resty-snowflake, a Lua library that bridges the powerful capabilities of Snowflake with the high-performance web platform OpenResty. It leverages the Snowflake SQL REST API and supports fast JSON processing using libraries like rsjson, dkjson, or cjson. Users can configure the library via environment variables or directly in Lua scripts. The provided Taskfile simplifies key setup, JWT generation, and connectivity testing. Ideal for boosting performance, simplifying architecture, and ensuring scalability in modern web applications.

This article will delve into this integration's motivations, benefits, and technical intricacies, showcasing how resty-snowflake can revolutionize your data handling workflows.

Why resty-snowflake?

Snowflake has emerged as a leading cloud data platform, renowned for handling vast amounts of data with speed and scalability. However, integrating Snowflake into web applications often involves additional backend layers, which can introduce latency and complexity. This is where resty-snowflake shines.

The Motivation

The primary motivation behind resty-snowflake is to harness the power of Snowflake directly within OpenResty, eliminating the need for a separate backend. This approach offers several compelling advantages:

  1. Performance Boost: By embedding Snowflake interactions within OpenResty, you remove the overhead of an additional backend layer, resulting in faster response times and reduced latency.
  2. Simplified Architecture: Reducing the number of moving parts in your architecture makes your system easier to manage and maintain.
  3. Scalability: OpenResty's non-blocking I/O and LuaJIT's performance capabilities allow it to handle high traffic efficiently, making it an ideal choice for scalable applications.

The Power of OpenResty and LuaJIT

OpenResty is a robust web platform based on Nginx and LuaJIT, designed to deliver high-performance web applications. LuaJIT is a Just-In-Time Compiler for Lua, known for its speed and efficiency. Combining these with Snowflake's data management prowess creates a potent solution for modern web applications.

Key Features of resty-snowflake

  • Fast JSON Processing: resty-snowflake utilizes rsjson, a high-performance JSON library written in Rust, ensuring rapid JSON parsing and serialization.
  • Snowflake SQL API provides operations that you can use to:
  • Integration with CortexAI Services: The library supports advanced AI services through Snowflake SQL REST API, allowing seamless integration of AI-powered functionalities within your Snowflake queries.

GETTING STARTED

Setting Up Your Snowflake User

To make resty-snowflake work, you must generate public and private keys using the generate-keypair task. Then, assign the public key to the user in Snowflake (for the sake of the demo, we will imagine DEVUSER):

ALTER USER DEVUSER SET RSA_PUBLIC_KEY='MII...';
DESC USER DEVUSER;
SELECT TRIM((SELECT "value" FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
 ?WHERE "property" = 'RSA_PUBLIC_KEY_FP'), 'SHA256:');        

The result of this query will provide you with the Public Key Fingerprint for the key-pair authorization. You can read about it more at https://docs.snowflake.com/en/developer-guide/sql-api/authenticating

Taskfile for Automation

The following Taskfile simplifies the setup and testing of resty-snowflake:

version: 3
silent: true

vars:
  DOCKER_TAG: resty-snowflake-demo
  PRIVATE_KEY_PATH: _keys/rsa_key.p8
  SNOWFLAKE_ACCOUNT_NAME: YOUR_ACCOUNT_NAME
  SNOWFLAKE_USERNAME: YOUR_USERNAME
  SNOWFLAKE_PRIVATE_KEY:
 ?  sh: cat {{.PRIVATE_KEY_PATH}}
  SNOWFLAKE_PUBLIC_KEY_FINGERPRINT: "YOUR_PUBLIC_KEY_FINGERPRINT"

tasks:
  generate-keypair:
 ?  cmds:
 ? ?  - mkdir -p _keys
 ? ?  - openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out _keys/rsa_key.p8 -nocrypt
 ? ?  - openssl rsa -in _keys/rsa_key.p8 -pubout -out _keys/rsa_key.pub

  test-keypair:
 ?  cmds:
 ? ?  - snowsql -a {{.SNOWFLAKE_ACCOUNT_NAME}} -u {{.SNOWFLAKE_USERNAME}} --private-key-path _keys/rsa_key.p8 -q "SELECT CURRENT_TIMESTAMP() AS now"

  generate-jwt:
 ?  cmds:
 ? ?  - |
 ? ? ?  HEADER='{
 ? ? ? ?  "alg": "RS256",
 ? ? ? ?  "typ": "JWT"
 ? ? ? ?}'

 ? ? ?  ACCOUNT="{{.SNOWFLAKE_ACCOUNT_NAME}}"
 ? ? ?  USER="{{.SNOWFLAKE_USERNAME}}"
 ? ? ?  PUBLIC_KEY_FP="{{.SNOWFLAKE_PUBLIC_KEY_FINGERPRINT}}"
 ? ? ?  NOW=$(date +%s)
 ? ? ?  EXP=$(($NOW + 3540))

 ? ? ?  QUALIFIED_USERNAME=$(echo "$ACCOUNT.$USER" | tr '[:lower:]' '[:upper:]')

 ? ? ?  PAYLOAD=$(jq -n --arg iss "$QUALIFIED_USERNAME.SHA256:$PUBLIC_KEY_FP" \
 ? ? ? ? ? ? ? ? --arg sub "$QUALIFIED_USERNAME" \
 ? ? ? ? ? ? ? ? --argjson iat $NOW \
 ? ? ? ? ? ? ? ? --argjson exp $EXP \
 ? ? ? ? ? ? ? ? '{
 ? ? ? ? ? ? ? ? ? ? iss: $iss,
 ? ? ? ? ? ? ? ? ? ? sub: $sub,
 ? ? ? ? ? ? ? ? ? ? iat: $iat,
 ? ? ? ? ? ? ? ? ? ? exp: $exp
 ? ? ? ? ? ? ? ? ? }')

 ? ? ?  HEADER_BASE64=$(echo -n "$HEADER" | openssl base64 -A | tr -d '=' | tr '/+' '_-')
 ? ? ?  PAYLOAD_BASE64=$(echo -n "$PAYLOAD" | openssl base64 -A | tr -d '=' | tr '/+' '_-')
 ? ? ?  SIGNATURE=$(echo -n "$HEADER_BASE64.$PAYLOAD_BASE64" | openssl dgst -sha256 -sign {{.PRIVATE_KEY_PATH}} | openssl base64 -A | tr -d '=' | tr '/+' '_-')
 ? ? ?  GENERATED_JWT="$HEADER_BASE64.$PAYLOAD_BASE64.$SIGNATURE"
 ? ? ?  echo $GENERATED_JWT

  test-sql:
 ?  cmds:
 ? ?  - |
 ? ? ?  JWT="$(task generate-jwt)"
 ? ? ?  echo "Testing SQL..."
 ? ? ? ?
 ? ? ?  curl -iv -X POST \
 ? ? ?  -H "Content-Type: application/json" \
 ? ? ?  -H "Authorization: Bearer $JWT" \
 ? ? ?  -H "X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT" \
 ? ? ?  -d '{"statement": "SELECT CURRENT_TIMESTAMP()"}' \
 ? ? ? ? ?"https://{{.SNOWFLAKE_ACCOUNT_NAME}}.snowflakecomputing.com/api/v2/statements"

  docker-build:
 ?  cmds:
 ? ?  - docker build --progress plain -t {{.DOCKER_TAG}} .

  docker-run:
 ?  cmds:
 ? ?  - |
 ? ? ?  docker run -it -d \
 ? ? ? ?  -p "8000:80" \
 ? ? ? ?  -v "$(pwd)/app:/app" \
 ? ? ? ?  -v "$(pwd)/lib:/app-libs" \
 ? ? ? ?  -v "$(pwd)/nginx/conf/nginx.conf:/usr/local/openresty/nginx/conf/nginx.conf" \
 ? ? ? ?  -e SNOWFLAKE_ACCOUNT_NAME="{{.SNOWFLAKE_ACCOUNT_NAME}}" \
 ? ? ? ?  -e SNOWFLAKE_USERNAME="{{.SNOWFLAKE_USERNAME}}" \
 ? ? ? ?  -e SNOWFLAKE_PRIVATE_KEY="{{.SNOWFLAKE_PRIVATE_KEY}}" \
 ? ? ? ?  -e SNOWFLAKE_PUBLIC_KEY_FINGERPRINT="{{.SNOWFLAKE_PUBLIC_KEY_FINGERPRINT}}" \
 ? ? ? ? ?{{.DOCKER_TAG}}        

  • Test Keypair: The test-keypair task allows you to test connectivity using the snowsql CLI.
  • Generate JWT: The generate-jwt task generates a JWT required for authentication, a functionality also present in the snowflake.lua module.
  • Test SQL: The test-sql task allows you to test the JWT and connectivity to the Snowflake SQL REST API.

SETTING UP RESTY-SNOWFLAKE

Installation

To start with resty-snowflake, you can try it with Docker, so you need Docker installed on your machine. The following Dockerfile builds your Docker image:

FROM openresty/openresty:alpine

RUN mkdir -p /app/lib && mkdir -p /vendor/lib \
 ?  && apk add --no-cache curl jq libc6-compat perl ca-certificates \
 ?  && opm get bsiara/dkjson \
 ?  && opm get fffonion/lua-resty-openssl \
 ?  && opm get pintsized/lua-resty-http \
 ?  && opm get SkyLothar/lua-resty-jwt \
 ?  && opm get fffonion/lua-resty-openssl \
 ?  && apk del curl

EXPOSE 80

CMD ["/usr/local/openresty/bin/openresty", "-g", "daemon off;"]        

Configuration

Ensure that your nginx.conf is set up to include the necessary Lua scripts. Here's an example configuration. Let's do a bare minimum (the environment variables used in his config are described below):

worker_processes auto;
error_log /dev/stdout info;

env SNOWFLAKE_ACCOUNT_NAME;
env SNOWFLAKE_USERNAME;
env SNOWFLAKE_PRIVATE_KEY;
env SNOWFLAKE_PUBLIC_KEY_FINGERPRINT;

events {
 ?  worker_connections 1024;
 ?  multi_accept on;
 ?  use epoll;
}

http {
 ?  default_type application/octet-stream;

 ?  resolver 1.1.1.1 1.0.0.1 valid=300s;
 ?  resolver_timeout 5s;

 ?  lua_package_path "/usr/local/openresty/lualib/?.lua;/app-libs/?.lua;/vendor/lib/?.lua;;";
 ?  lua_package_cpath "/usr/local/openresty/lualib/?.so;/app-libs/lib?.so;;";

 ?  lua_shared_dict my_cache 10m;

 ?  lua_code_cache off; #comment for production

 ?  init_by_lua_file /app/nginx_init.lua;

 ?  sendfile on;
 ?  tcp_nopush on;
 ?  tcp_nodelay on;
 ?  keepalive_timeout 65;
 ?  types_hash_max_size 2048;

 ?  log_format main '$remote_addr - $remote_user [$time_local] "$request" '
 ? ? ? ? ? ? ? ? ?  '$status $body_bytes_sent "$http_referer" '
 ? ? ? ? ? ? ? ? ?  '"$http_user_agent" "$http_x_forwarded_for"';
 ?  gzip on;
 ?  gzip_disable "msie6";
 ?  gzip_vary on;
 ?  gzip_proxied any;
 ?  gzip_comp_level 6;
 ?  gzip_buffers 16 8k;
 ?  gzip_http_version 1.1;
 ?  gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;

 ?  ssl_protocols TLSv1.2 TLSv1.3;
 ?  ssl_prefer_server_ciphers on;

 ?  server {
 ? ? ?  listen 80;

 ? ? ?  location / {
 ? ? ? ? ?  default_type 'text/plain';
 ? ? ? ? ?  content_by_lua_file /app/demo.lua;
 ? ? ?  }
 ?  }
}        

Initialization

In nginx_init.lua, the resty-snowflake library is initialized and configured to use rsjson for JSON processing. However, you can choose any JSON library that fits your requirements, such as dkjson or cjson. The key is to ensure that the chosen library is globally mapped as json.

json = require ("rsjson")        

Mapping the JSON Library Globally

The essential requirement is that whichever JSON library you choose, it must be globally accessible as json. This allows resty-snowflake to use it seamlessly for JSON parsing and serialization. Here’s a detailed step-by-step guide:

  1. Install the JSON Library: Use OPM or LuaRocks to install the library of your choice.
  2. Configure nginx_init.lua: Ensure the selected library is required and assigned to json.

Here’s a consolidated example for nginx_init.lua showing the flexibility:

-- Uncomment the JSON library you want to use

-- json = require("rsjson")  -- High-performance Rust-based library
-- json = require("dkjson")  -- Pure Lua library
-- json = require("cjson") ? -- Fast C-based library

-- Initialize the snowflake library
local snowflake = require("snowflake")

snowflake.init()        

Benefits of Custom JSON Libraries

  • Performance: Choose rsjson or cjson for high-performance JSON processing.
  • Compatibility: Use dkjson if you prefer a pure Lua implementation.
  • Flexibility: Easily switch between libraries based on your application’s needs and the environment.

By allowing users to choose their preferred JSON library, resty-snowflake provides the flexibility to optimize performance and compatibility according to specific requirements. As long as the selected library is globally mapped as json, resty-snowflake will function seamlessly, making it adaptable to various scenarios and user preferences.

CONFIGURATION OPTIONS: ENVIRONMENT VARIABLES VS. LUA CONFIGURATION

When using resty-snowflake, you have the flexibility to configure the library either through environment variables or directly within your Lua scripts. This dual approach ensures that you can choose the configuration method that best suits your deployment environment and personal preferences. Let’s explore both options in detail.

Using Environment Variables

Environment variables offer a convenient way to configure resty-snowflake, especially in dynamic and containerized environments like Docker. By setting environment variables, you can decouple your configuration from your code, making it easier to manage different environments (e.g., development, staging, production) without altering your Lua scripts.

Setting Environment Variables

In your Docker configuration or shell environment, you can set the necessary variables as follows:

docker run -it -d \
  -p "8000:80" \
  -v "$(pwd)/app:/app" \
  -v "$(pwd)/lib:/app-libs" \
  -v "$(pwd)/nginx/conf/nginx.conf:/usr/local/openresty/nginx/conf/nginx.conf" \
  -e SNOWFLAKE_ACCOUNT_NAME="your_account_name" \
  -e SNOWFLAKE_USERNAME="your_username" \
  -e SNOWFLAKE_PRIVATE_KEY="your_private_key" \
  -e SNOWFLAKE_PUBLIC_KEY_FINGERPRINT="your_public_key_fingerprint" \
  resty-snowflake-demo        

In your nginx.conf, ensure you declare the environment variables to pass them to the worker processes:

env SNOWFLAKE_ACCOUNT_NAME;
env SNOWFLAKE_USERNAME;
env SNOWFLAKE_PRIVATE_KEY;
env SNOWFLAKE_PUBLIC_KEY_FINGERPRINT;        

The resty-snowflake library will automatically pick up these environment variables without any additional code changes.

Configuring Directly in Lua

For more fine-grained control, or if you prefer to keep configuration within your Lua scripts, you can set the necessary configurations directly in the Lua code. This method can be beneficial in environments where modifying environment variables is less convenient or if you need to configure multiple instances with different settings programmatically.

Setting Configuration in Lua

To configure resty-snowflake directly in Lua, you need to pass a configuration table to the init function of the library. Here’s an example of how to do this:

local snowflake = require("snowflake")

-- Initialize with Lua configuration
snowflake.init({
 ?  account_name = "your_account_name",
 ?  username = "your_username",
 ?  private_key = "your_private_key",
 ?  private_key_password = "your_private_key_password", -- optional
 ?  public_key_fp = "your_public_key_fingerprint"
})

ngx.header.content_type = "text/plain"
ngx.log(ngx.INFO, "Accessing Snowflake...")

-- Execute a sample SQL query
local result, err_exec_sql = snowflake.execute_sql("SELECT CURRENT_TIMESTAMP()")
if err_exec_sql then
 ?  ngx.say("Error: " .. err_exec_sql)
 ?  return
end

ngx.say("Result: ", result.data[1])        

Benefits of Each Approach

Environment Variables:

  • Decoupling Configuration: Keeps configuration separate from code, making it easier to manage different environments.
  • Security: Environment variables can be more secure, especially for sensitive data like private keys.
  • Simplicity: Easy to set up in containerized environments like Docker.

Lua Configuration:

  • Flexibility: Allows dynamic configuration changes and supports complex setup logic.
  • Control: Useful when you need to programmatically adjust settings or configure multiple instances with different settings.
  • Self-Contained: Keeps everything within the Lua script, which can be easier to manage in certain deployment scenarios.

Choosing the Right Method

The choice between environment variables and Lua configuration depends on your specific use case and deployment environment:

  • Use Environment Variables if you are deploying in a containerized environment or need to manage configurations across multiple environments dynamically.
  • Use Lua Configuration if you need more control within your code, require dynamic adjustments, or prefer a self-contained setup.

More advanced example

Here is a sample demo of how to use the resty-snowflake library.lua script:

local snowflake = require("snowflake")

ngx.header.content_type = "text/plain"

snowflake.init()

ngx.log(ngx.INFO, "Accessing snowflake...")

-- Get current timestamp
local result, err_exec_sql = snowflake.execute_sql("SELECT CURRENT_TIMESTAMP()")

if err_exec_sql then
 ?  ngx.say("Error: " .. err_exec_sql)
 ?  return
end

ngx.say("Result: ", result.data[1])

-- Testing bindings
local result_bindings, err_exec_sql_with_bindings = snowflake.execute_sql("SELECT 1=?", {
 ?  ["1"] = {
 ? ? ?  type = "FIXED",
 ? ? ?  value = "123"
 ?  }
})

if err_exec_sql_with_bindings then
 ?  ngx.say("Error: " .. err_exec_sql_with_bindings)
 ?  return
end

ngx.say("Binding Result: ", result_bindings.data[1])

-- Testing CortexAI Translation services
ngx.log(ngx.INFO, "Testing CortexAI Translation services...")
local result_translation, err_translation = snowflake.execute_sql("SELECT SNOWFLAKE.CORTEX.TRANSLATE(?, '', ?) AS TRANSLATION_RESULT", {
 ?  ["1"] = {
 ? ? ?  type = "TEXT",
 ? ? ?  value = "Skitsanos does pretty awesome things!"
 ?  },
 ?  ["2"] = {
 ? ? ?  type = "TEXT",
 ? ? ?  value = "de"
 ?  }
})

if err_translation then
 ?  ngx.say("Error: " .. err_translation)
 ?  return
end

ngx.say("Translation Result: ", result_translation.data[1])        

Final Thoughts

The resty-snowflake library offers a streamlined solution for integrating Snowflake directly with OpenResty, significantly enhancing performance and simplifying web application architectures. It reduces latency, boosts response times, and simplifies system maintenance by eliminating the need for additional backend layers. This integration leverages OpenResty's non-blocking I/O and LuaJIT's performance capabilities, ensuring scalability even under high-traffic conditions.

With resty-snowflake, developers can efficiently manage configurations, dynamically adjust setups, and harness Snowflake's powerful data handling capabilities directly within their web applications. This approach optimizes performance and supports advanced functionalities like fast JSON processing and secure JWT authentication.

Embracing resty-snowflake means fully utilizing Snowflake's potential while maintaining a lean, efficient, and scalable web infrastructure, making it a valuable addition to any modern web application's toolkit.

Sources: https://github.com/skitsanos/resty-snowflake

Sabine VanderLinden

Activate Innovation Ecosystems | Tech Ambassador | Founder of Alchemy Crew Ventures + Scouting for Growth Podcast | Chair, Board Member, Advisor | Honorary Senior Visiting Fellow-Bayes Business School (formerly CASS)

3 个月

Innovative approach streamlining data access and web delivery.

要查看或添加评论,请登录

Evgenios Skitsanos的更多文章

社区洞察

其他会员也浏览了