How to use Vertica Stored Procedures
Unlock Vertica Potential

How to use Vertica Stored Procedures

Overview

This article covers a range of key features of Vertica Stored Procedures, where we share essential tips, best practices, and practical examples. By exploring the capabilities of Vertica SP, you can gain valuable insights into how they can effectively address common SQL syntax challenges and enhance your data management workflows. Join us on this journey to unlock the full power of Vertica Stored Procedures. All input data provided here is a sample dataset created for the purposes of the examples demonstration and does not reflect real-world data.

The Pros and Cons of Stored Procedures

Stored procedures offer several benefits due to its focus on online analytical processing (OLAP) and optimized architecture. This is particularly advantageous for analytical workloads, such as information lifecycle management (ILM) activities and data preparation for machine learning tasks. Vertica's stored procedures excel at tasks like swapping partitions based on age, exporting data at end-of-life and dropping partitions, and capturing inputs, outputs, and metadata from machine learning models. Additionally, stored procedures in Vertica can operate on objects with higher privileges than the caller, enabling controlled execution of sensitive operations.?

However, it's important to note that frequent small transactions should be avoided, as they can potentially impact performance.?

Vertica Stored procedure Cheat Sheet

The following cheat sheet provides an overview of the syntax and commands commonly used in Vertica stored procedures. Remember to refer to the Vertica documentation for more detailed information and additional options.

No alt text provided for this image
Vertica Stored procedure Cheat Sheet

Stored Procedures Monitoring

Stored procedures run in an internal session that you can view with the following statement: select * from vs_sessions;

Stored procedures code is stored in the Vertica catalog but with no access via v_vatalog.*?

To see the SP code use EXPORT_OBJECTS()?

To export a particular SP, specify either the types or both the names and types of its formal parameters.?

The following example specifies the types:???SELECT EXPORT_OBJECTS('','MyStoredProcedureName(int, int)');

Stored procedures - use case examples

1. How to use a loop and a cursor in a Vertica SP to process rows from a result set one at a time.

The provided example below demonstrates the use of Vertica Stored Procedures to retrieve rows from a table that have a value greater than 1 using a loop and a cursor.

CREATE TABLE t1(f1 int, f2 varchar(20));		-- New table named "t1" is created with two columns
COPY t1 FROM STDIN DELIMITER ',' ABORT ON ERROR;	-- Populate the "t1" table with values from STDIN
1,one
2,two
3,three
4,four
\.

DO $$		-- Define a SP using a DO statement and $$ delimiter
DECLARE							
? ? c CURSOR (key int) FOR SELECT * FROM t1 WHERE f1 > key ORDER BY 1 DESC;
-- A cursor named "c" is declared, which takes an input parameter called "key" of type integer.
-- The cursor is defined to select all rows from table "t1"?
-- Where the value in column "f1" is greater than the provided "key" parameter.?
-- The result set is ordered in descending order by column "f1".
-- Two variables, "f1_" and "f2_" are declared to hold the values of columns "f1" and "f2" from the cursor

? ? f1_ int;
? ? f2_ varchar(20);
BEGIN
? ? OPEN c(1); -- only retrieve rows where f1 > 1
-- The cursor is opened using the OPEN statement, passing the parameter value of 1.?
-- This means that only rows with "f1" greater than 1 will be retrieved.
? ? f1_,f2_ := FETCH c;	-- Retrieve the first row from the cursor into the variables "f1_" and "f2_"
? ? WHILE FOUND LOOP		-- This loop will iterate as long as rows are found in the cursor
? ? ? RAISE NOTICE 'cursor returned %, %, FOUND=%', f1_, f2_, FOUND;
? ? ? f1_,f2_ := FETCH c;	-- After displaying the values, another FETCH statement is used?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-- to retrieve the next row from the cursor, updating the values of "f1_" and "f2_"
? ? END LOOP;
END;
$$;

-- Following is the output of the RAISE NOTICE statements
-- Each output line represents a row fetched from the cursor, with values of "f1_", "f2_", and the FOUND attribute
-- The output shows three rows being retrieved from the table "t1" where "f1" is greater than 1.?
-- The values of these rows are displayed in descending order,?
-- along with the value "FOUND=t" indicating that rows were successfully fetched.
output:
vsql:test.sql:25: NOTICE 2005:? cursor returned 4, four, FOUND=t
vsql:test.sql:25: NOTICE 2005:? cursor returned 3, three, FOUND=t
vsql:test.sql:25: NOTICE 2005:? cursor returned 2, two, FOUND=t
DO        

2. Bulk-creation of directed queries

The meta-function SAVE_PLANS lets you create multiple optimizer-generated directed queries from the most frequently executed queries automatically as explained in Vertica documentation.

Here is a script name save_all_queries_as_dq.sql? to demonstrate: save, activate, deactivate and delete directed queries for all queries mentioned in v_monitor.query_requests in the last 10 days.

-- Note: Before each call to the following SP run the following 2 statements

SET ESCAPE_STRING_WARNING TO OFF;
SET STANDARD_CONFORMING_STRINGS TO OFF;
\echo '############################### Create a SP to DEACTIVATE directed queries..'
CREATE OR REPLACE PROCEDURE deactivate_dq() LANGUAGE PLvSQL AS $$
DECLARE
??? queryd varchar(10000);
??? c CURSOR (key int) FOR SELECT 'DEACTIVATE DIRECTED QUERY \'' || query_name || '\''
?????????????????????????? FROM V_CATALOG.DIRECTED_QUERIES
?????????????????????????? WHERE is_active
?????????????????????????? GROUP BY query_name;
BEGIN
??? OPEN c(1);
??? queryd := FETCH c;
??? WHILE FOUND LOOP
????? -- RAISE NOTICE 'Debug note: cursor returned %, FOUND=%', queryd, FOUND;
????? EXECUTE queryd ;
????? queryd := FETCH c;
??? END LOOP;
??? EXECUTE 'COMMIT';
END;
$$;

\echo '############################### Create a SP to ACTIVATE directed queries..'
CREATE OR REPLACE PROCEDURE activate_dq() LANGUAGE PLvSQL AS $$
DECLARE
??? queryd varchar(10000);

??? c CURSOR (key int) FOR SELECT 'ACTIVATE DIRECTED QUERY \'' || query_name || '\''
?????????????????????????? FROM V_CATALOG.DIRECTED_QUERIES
?????????????????????????? WHERE NOT is_active
?????????????????????????? GROUP BY query_name;
BEGIN
??? OPEN c(1);
??? queryd := FETCH c;
??? WHILE FOUND LOOP
????? -- RAISE NOTICE 'Debug note: cursor returned %, FOUND=%', queryd, FOUND;
????? EXECUTE queryd ;
????? queryd := FETCH c;
??? END LOOP;
??? EXECUTE 'COMMIT';
END;
$$;

\echo '############################### Create a SP to DELETE directed queries..'
CREATE OR REPLACE PROCEDURE del_dq() LANGUAGE PLvSQL AS $$
DECLARE
??? queryd varchar(10000);
??? c CURSOR (key int) FOR SELECT 'DROP DIRECTED QUERY \'' || query_name || '\''
?????????????????????????? FROM V_CATALOG.DIRECTED_QUERIES
??????????????????????? ???GROUP BY query_name;
BEGIN
??? OPEN c(1);
??? queryd := FETCH c;
??? WHILE FOUND LOOP
????? -- RAISE NOTICE 'Debug note: cursor returned %, FOUND=%', queryd, FOUND;
????? EXECUTE queryd ;
????? queryd := FETCH c;
??? END LOOP;
??? EXECUTE 'COMMIT';
END;
$$;

\echo '############################### Create a SP to SAVE directed queries..'
CREATE OR REPLACE PROCEDURE save_dq() LANGUAGE PLvSQL AS $$
DECLARE
??? queryd varchar(10000);
??? c CURSOR (key int) FOR SELECT 'CREATE DIRECTED QUERY OPTIMIZER \'' ||
?????????????????????????? CASE WHEN LENGTH(label) IS NULL OR LENGTH(label) < 1
??????????????????????????????? THEN 'digest_' || digest::VARCHAR(30)
??????????????????????????????? ELSE label
?????????????????????????? END
?????? ????????????????????|| '\' ' || max(request)
?????????????????????????? FROM dc_requests_issued
?????????????????????????? WHERE???? request_type='QUERY'
???????????????????????????????? AND (sysdate() - time) < '10 days'?
?????????????????????????? GROUP BY label,digest
?????????????????????????? ORDER BY label;
BEGIN
??? OPEN c(1);
??? queryd := FETCH c;
??? WHILE FOUND LOOP
????? EXECUTE queryd ;
????? queryd := FETCH c;
??? END LOOP;
??? EXECUTE 'COMMIT';
END;
$$;

\echo '############################### Check max query length..'
SELECT max(LENGTH(request)) Max_Length,
?????? CASE WHEN max(LENGTH(request)) > 10000
??????????? THEN '### ERROR: Please enlarge queryd in Directed Queries SP because Max LENGTH(request) > 10000'
??????????? ELSE '### QA test note: Max LENGTH(request) < 10000'
?????? END AS 'QA Note'
FROM dc_requests_issued
WHERE???? request_type='QUERY'
???? AND (sysdate() - time) < '30 days';

\echo '############################### SAVE directed queries via a script log because the SP will fail if there are changes in the schema since query execute time.'
-- CALL save_dq();
\! ./save_dq_log_script.sh
SELECT query_name,is_active active,right(vertica_version,10) Ver,save_plans_version plan,left(input_query,50) input_q,
?????? left(annotated_query,40) annotated_q FROM V_CATALOG.DIRECTED_QUERIES;

\echo '############################### ACTIVATE directed queries'
CALL activate_dq();
SELECT query_name,is_active active,right(vertica_version,10) Ver,save_plans_version plan,left(input_query,50) input_q,
?????? left(annotated_query,40) annotated_q FROM V_CATALOG.DIRECTED_QUERIES;

\echo '############################### DEACTIVATE directed queries'
CALL deactivate_dq();
SELECT query_name,is_active active,right(vertica_version,10) Ver,save_plans_version plan,left(input_query,50) input_q,
?????? left(annotated_query,40) annotated_q FROM V_CATALOG.DIRECTED_QUERIES;

\echo '############################### DELETE directed queries'
CALL del_dq();
SELECT query_name,is_active active,right(vertica_version,10) Ver,save_plans_version plan,left(input_query,50) input_q,
left(annotated_query,40) annotated_q FROM V_CATALOG.DIRECTED_QUERIES;        

Here is the save DQ script: cat save_dq_log_script.sh

#!/bin/bash
echo '############################### Creating file tmp_save_directed_queries_list.sql'
vsql -P footer=off -AtXnqf - <<-EOF1 > tmp_save_directed_queries_list.sql
SET ESCAPE_STRING_WARNING TO OFF;
SET STANDARD_CONFORMING_STRINGS TO OFF;
SELECT ' ; CREATE DIRECTED QUERY OPTIMIZER \'' ||
? ? ? ?CASE WHEN LENGTH(ri.label) IS NULL OR LENGTH(ri.label) < 1
? ? ? ? ? ? THEN 'digest_' || ri.digest::VARCHAR(30)
? ? ? ? ? ? ELSE ri.label
? ? ? ?END
? ? ? ?|| '\' ' || max(ri.request)
FROM v_monitor.query_requests qr LEFT OUTER JOIN dc_requests_issued ri
? ? ?ON qr.session_id=ri.session_id AND qr.transaction_id=ri.transaction_id AND qr.statement_id=ri.statement_id
WHERE? ? ?ri.request_type='QUERY'
? ? ? AND UPPER(left(ri.request, 6)) = 'SELECT'
? ? ? AND qr.success
? ? ? AND (sysdate() - qr.END_TIMESTAMP) < '10 days'
GROUP BY ri.label,ri.digest
ORDER BY ri.label,ri.digest DESC;
EOF1

echo '############################### Saving directed queries via the statments in tmp_save_directed_queries_list.sql'
vsql -P footer=off -AtXnqf tmp_save_directed_queries_list.sql > tmp_save_directed_queries_list.log 2>&1

exit        

3. Count rows number in all tables in the current database, listed in V_CATALOG.TABLES

This use case demonstrates how to use Vertica Stored Procedures to count the number of rows in all tables in a database,?as listed in the V_CATALOG.TABLES view.

CREATE TABLE t1 (f1 varchar(300), f2 int);
SET ESCAPE_STRING_WARNING TO OFF;
SET STANDARD_CONFORMING_STRINGS TO OFF;

DO $$
DECLARE
? ? f1_ varchar(300);
? ? f2_ int;
? ? c CURSOR (key int) FOR SELECT 'INSERT INTO t1 (f1,f2) SELECT \'' || table_schema || '.' || table_name || '\', count(1) FROM ' || table_schema || '.' || table_name? FROM tables WHERE NOT is_system_table ORDER BY 1;

BEGIN
? ? OPEN c(1);
? ? f1_ := FETCH c;
? ? WHILE FOUND LOOP
? ? ? -- RAISE NOTICE 'cursor returned %, FOUND=%', f1_, FOUND;
? ? ? EXECUTE f1_ ;
? ? ? f1_ := FETCH c;
? ? END LOOP;
? ? EXECUTE 'COMMIT';
END;
$$;

SELECT f1 as 'schema.table', f2 as 'Rows number'
FROM t1
WHERE f1 != 'public.t1'
ORDER BY 2 DESC;
--? ? ? ? ? ? schema.table? ? ? ? ? ?| Rows number
-- ----------------------------------+-------------
--? public.table_a_with_1000000_rows |? ? ?1000000
--? public.table_b_with_500_rows? ? ?|? ? ? ? ?500
--? public.table_c_with_400_rows? ? ?|? ? ? ? ?400
--? public.table_d_with_300_rows? ? ?|? ? ? ? ?300
--? public.table_e_with_200_rows? ? ?|? ? ? ? ?200
--? public.table_f_with_100_rows? ? ?|? ? ? ? ?100
--? public.table_g_with_0_rows? ? ? ?|? ? ? ? ? ?0
-- (7 rows)        

4. Adding procedural logic to existing SQL procedures using Anonymous Code Blocks (DO $$)

Combined with another common use case to run a certain action on all tables with a given pattern.?

For example, truncating all staging tables before a new ETL circle.?

Thank you Maurizio Felici for this idea!

\echo Truncate all staging tables dem

CREATE TABLE staging_customers (
? ? customer_id INT,
? ? customer_name VARCHAR(50),
? ? purchase_date DATE
);

-- Insert sample data into the staging_customers table
INSERT INTO staging_customers (customer_id, customer_name, purchase_date)
VALUES
? ? (1, 'John Smith', '2023-05-10'),
? ? (2, 'Emily Davis', '2023-01-22'),
? ? (3, 'David Wilson', '2023-04-20');
COMMIT;

CREATE TABLE customers AS SELECT * FROM staging_customers;

SET ESCAPE_STRING_WARNING TO OFF;
SET STANDARD_CONFORMING_STRINGS TO OFF;

DO $$
DECLARE
? ? f1_ varchar(500);
? ? c CURSOR FOR SELECT 'TRUNCATE TABLE ' || table_schema || '.' || table_name || ';' FROM tables WHERE NOT is_system_table AND table_name ilike 'staging_%' ORDER BY 1;
BEGIN
? ? OPEN c;
? ? f1_ := FETCH c;
? ? WHILE FOUND LOOP
? ? ? RAISE NOTICE '>>> Will execute: %', f1_;
? ? ? EXECUTE f1_ ;
? ? ? f1_ := FETCH c;
? ? END LOOP;
END;
$$;

SELECT * FROM customers;
SELECT * FROM staging_customers;

Run time output:
vsql: Truncate_demo.sql:38: NOTICE 2005:? >>> Will execute: TRUNCATE TABLE public.staging_customers;
DO
SELECT * FROM customers;
?customer_id | customer_name | purchase_date
-------------+---------------+---------------
? ? ? ? ? ?1 | John Smith? ? | 2023-05-10
? ? ? ? ? ?2 | Emily Davis? ?| 2023-01-22
? ? ? ? ? ?3 | David Wilson? | 2023-04-20
(3 rows)

SELECT * FROM staging_customers;
?customer_id | customer_name | purchase_date
-------------+---------------+---------------
(0 rows)        
Marco Gessner

Database, Data Warehouse and Big Data Technologist

1 年

Nicely written, Moshe; and nice to get some code here that we can play with…

要查看或添加评论,请登录

Moshe Goldberg的更多文章

社区洞察

其他会员也浏览了