Impala User-Defined Functions
Impala UDF (User-Defined Functions)

Impala User-Defined Functions

Impala User-Defined Functions (UDFs)

In order to code our own application logic for processing column values during an Impala query, we use User-Defined Functions. Impala User-defined functions are frequently abbreviated as UDFs. For example, using an external math library- a UDF could perform calculations, also it can combine several column values into one, it can perform geospatial calculations, or other kinds of tests and transformations especially those are outside the scope of the built-in SQL operators and functions.

In other words, to simplify query logic when producing reports, or in order to transform data in flexible ways while using INSERT … SELECT syntax to copy from one table to another, we can use UDFs.

Let’s read Impala Functions in detail

Also, under names stored functions or stored routines this feature is available in other database products.

In Impala 1.2 and higher, Impala support for UDF is available:

  • Using UDFs in a query required using the Hive shell, in Impala 1.1.
  • After Impala 1.2, we can run both Java-based Hive UDFs that you might already have written and high-performance native code UDFs written in C++.
  • Impala UDAFs can run and return a value based on a set of rows and scalar UDFs that return a single value for each row of the result set.

Note: There is no support for User-Defined Table Functions (UDTFs) or window functions, in Impala currently.

Impala UDF Concepts

Basically, we can write all-new functions on the basis of our use case. Moreover, it is possible to reuse Java UDFs which we have already written for Hive. However, for producing results one row at a time, we can code either scalar functions or more complex aggregate functions.

a. Impala UDFs and UDAFs

In Impala UDF, we write might accept or produce different numbers of input and output values, on the basis of our use case:

One of the most general forms of UDF takes a single input value and returns a single output value. However, it is called once for each row in the result set, while used in a query.

For example:

  1. select Employee_name, is_frequent_Employee(Employee_id) from Employees;
  2. select obfuscate(sensitive_column) from sensitive_data;
  • Although, a (UDAF) returns a single value after accepting a group of values. 

Read about Impala Shell and Impala Commands 

For example:

— It evaluates multiple rows, however, returns a single value.

  1. select most_profitable_location(store_id, sales, expenses, tax_rate, depreciation) from franchise_data group by year;
  2. select closest_Hotel(latitude, longitude) from places;

— Evaluates batches of rows and returns a separate value for each batch.

b. Native Impala UDF

In addition to supporting existing Hive UDFs written in Java, Impala supports UDFs written in C++ as well. However, we use C++ UDFs while practical. The reason behind it is the compiled native code can yield higher performance because of UDF execution time often 10x faster for a C++ on comparing to Java UDF.

c. Using Hive UDF with Impala

There is a flexibility that User-Defined Functions (UDFs), which originally written for Hive, Impala can run them, even with no changes, but only subject to the several conditions:

  • It is must that the parameters and return value all should use scalar data types which are supported by Impala. For example, complex or nested types.
  • Moreover, Impala does not support Hive UDFs that accept or return the TIMESTAMP.
  • Here, both Hive UDAFs and UDTFs are not supported.
  • UDF execution time often 10x faster for a C++ on comparing to Java UDF.

Let’s Discuss Impala Data Types: Usage, Syntax and Examples

Install – Impala UDF & Development Package

Initially, download and install the impala-udf-devel package or impala-udf-dev, in order to develop Impala UDF. There are header files, sample source, and build configuration files, in this package.

  • For our operating system version, locate the appropriate .repo or list file.
  • Specify impala-udf-devel or impala-udf-dev, for the package name.

In addition, there is an advantage that it is not necessary that UDF development code relies on Impala being installed on the same machine. Because it is possible to write and compile UDFs on a minimal development system, and further deploy them on a different one for use with Impala.

Do yo want to install Impala on Linux

 How to Write Impala UDF?

Follow these steps while writing Impala UDFs:

  • Once we transfer values from the high-level SQL to your lower-level Impala UDF code, remember the data type differences.
  • For function-oriented programming, use best practices, like :
  1. Select arguments carefully.
  2. Try to avoid side effects.
  3. Also, make each function do a single thing

Read Complete Article>>


要查看或添加评论,请登录

Malini Shukla的更多文章

社区洞察

其他会员也浏览了