Converting Postgres Function to Impala UDF or a function in Spark

manmeet

2016-07-14T00:14:23

I have a postgres function that is called in a query. Its similar to this sample:

CREATE OR REPLACE FUNCTION test_function(id integer, dt date, days int[], accts text[], flag boolean) RETURNS float[] AS $$
  DECLARE
    pt_dates date[];
    pt_amt integer[];
    amt float[];
  BEGIN
  if cleared then
      pt_dates := array(select dt from tabl);
      pt_amt := array(select amt from tab1);
      if array_upper(days, 1) is not null then
       for j in 1 .. array_upper(days, 1)
       loop
         amt+=amt;
       end loop;
      end if;
      return amt;
  END;
$$ LANGUAGE plpgsql;

If I wish to convert this to in to the Data Lake Environment, which is the best way to do it? Impala UDF? or Spark UDF ? or Hive UDF? In Impala UDF, how do I access the impala database? if I write Spark UDF can I use it in the impala-shell?

Please advise.

Copyright License：
Author:「manmeet」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to：https://stackoverflow.com/questions/38356852/converting-postgres-function-to-impala-udf-or-a-function-in-spark

About “Converting Postgres Function to Impala UDF or a function in Spark” questions

Converting Postgres Function to Impala UDF or a function in Spark

I have a postgres function that is called in a query. Its similar to this sample: CREATE OR REPLACE FUNCTION test_function(id integer, dt date, days int[], accts text[], flag boolean) RETURNS floa...

Autoincrement UDF works in hive but returns null in Impala

I created a java function that creates auto incremental value, I also created a hive UDF based on this function, it works great in hive. I created an Impala UDF based on this function and it returns '

Using Hive UDF in Impala gives erroneous results in Impala 1.2.4

I have two Hive UDFs in Java which work perfectly well in Hive. Both functions are complimentary to each other. String myUDF(BigInt) BigInt myUDFReverso(String) myUDF("myInput") gives some output

Spark UDF on Python function

I have created a python function for translating short strings using the GCP Translate API. The codes does something like this. def translateString(inputString, targetLanguage, apiKey): baseUr...

Scala and Spark UDF function

I made a simple UDF to convert or extract some values from a time field in a temptabl in spark. I register the function but when I call the function using sql it throws a NullPointerException. Belo...

OVERLAPS function in Impala

I am trying to use and overlaps function as in ORACLE or Netezza that takes two date ranges and check if they overlap each other. Soemthing like this: SELECT (TIMESTAMP '2011-01-28 00:00:00', TIME...

Can't save spark df to impala

I'm trying to complete following steps: Load df from impala to jupyter notebook (spark df, one string column, 100 000 rows) Processing another column using pyspark udf, now we have df with two col...

whether impala support java UDF in where clause

I can use a java based UDF in hive and impala,but throw ClassNotFound error when call the udf in where clause The UDF can not use when referenced in where clause but work properly when it only

Impala UDF in Python, using later version of Impyla and numba

I am using impyla==0.13.5 (or 0.13.7), numba==0.23.1. Is it possible to get impala udf working in python for these version? I looked into some cloudera tutorial, but it seems the impala udf is depr...

Scala spark UDF function that takes input and puts it in an Array

I am trying to create a Scala UDF for Spark, that can be used in Spark SQL. The objective of the function is to accept any column type as input, and put it in an ArrayType, unless the input is alre...