Converting Postgres Function to Impala UDF or a function in Spark

2016-07-14T00:14:23

I have a postgres function that is called in a query. Its similar to this sample:

CREATE OR REPLACE FUNCTION test_function(id integer, dt date, days int[], accts text[], flag boolean) RETURNS float[] AS $$
  DECLARE
    pt_dates date[];
    pt_amt integer[];
    amt float[];
  BEGIN
  if cleared then
      pt_dates := array(select dt from tabl);
      pt_amt := array(select amt from tab1);
      if array_upper(days, 1) is not null then
       for j in 1 .. array_upper(days, 1)
       loop
         amt+=amt;
       end loop;
      end if;
      return amt;
  END;
$$ LANGUAGE plpgsql;

If I wish to convert this to in to the Data Lake Environment, which is the best way to do it? Impala UDF? or Spark UDF ? or Hive UDF? In Impala UDF, how do I access the impala database? if I write Spark UDF can I use it in the impala-shell?

Please advise.

Copyright License:
Author:「manmeet」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/38356852/converting-postgres-function-to-impala-udf-or-a-function-in-spark

About “Converting Postgres Function to Impala UDF or a function in Spark” questions

I have a postgres function that is called in a query. Its similar to this sample: CREATE OR REPLACE FUNCTION test_function(id integer, dt date, days int[], accts text[], flag boolean) RETURNS floa...
I created a java function that creates auto incremental value, I also created a hive UDF based on this function, it works great in hive. I created an Impala UDF based on this function and it returns '
I have two Hive UDFs in Java which work perfectly well in Hive. Both functions are complimentary to each other. String myUDF(BigInt) BigInt myUDFReverso(String) myUDF("myInput") gives some output
I have created a python function for translating short strings using the GCP Translate API. The codes does something like this. def translateString(inputString, targetLanguage, apiKey): baseUr...
I made a simple UDF to convert or extract some values from a time field in a temptabl in spark. I register the function but when I call the function using sql it throws a NullPointerException. Belo...
I am trying to use and overlaps function as in ORACLE or Netezza that takes two date ranges and check if they overlap each other. Soemthing like this: SELECT (TIMESTAMP '2011-01-28 00:00:00', TIME...
I'm trying to complete following steps: Load df from impala to jupyter notebook (spark df, one string column, 100 000 rows) Processing another column using pyspark udf, now we have df with two col...
I can use a java based UDF in hive and impala,but throw ClassNotFound error when call the udf in where clause The UDF can not use when referenced in where clause but work properly when it only
I am using impyla==0.13.5 (or 0.13.7), numba==0.23.1. Is it possible to get impala udf working in python for these version? I looked into some cloudera tutorial, but it seems the impala udf is depr...
I am trying to create a Scala UDF for Spark, that can be used in Spark SQL. The objective of the function is to accept any column type as input, and put it in an ArrayType, unless the input is alre...

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.