Impala built-in function not available when migrating from Impala to SparkSQL

Edamame

2016-12-22T05:43:08

I am using a built-in function in Impala like:

select id, parse_url(my_table.url, "QUERY", "extensionId") from my_table

Now I am migrating to SparkSQL (using pyspark in Jupyter Notebook):

my_table.select(my_table.id.cast('string'), parse_url(my_table.url.cast('string'), "QUERY", "extensionId")).show()

However, I got the following error:

NameError: name 'parse_url' is not defined

Also tried below:

my_table.registerTempTable("my_table")

sqlContext.sql("select id, url, parse_url(url, 'QUERY', 'extensionId') as new_url from my_table").show(100)

But all the new_url becomes null.

Any idea what I missed here? Also, how would people handle such problem? Thanks!

Copyright License：
Author:「Edamame」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to：https://stackoverflow.com/questions/41272377/impala-built-in-function-not-available-when-migrating-from-impala-to-sparksql

About “Impala built-in function not available when migrating from Impala to SparkSQL” questions

Impala built-in function not available when migrating from Impala to SparkSQL

I am using a built-in function in Impala like: select id, parse_url(my_table.url, "QUERY", "extensionId") from my_table Now I am migrating to SparkSQL (using pyspark in Jupyter Notebook): my_table.

Impala vs SparkSQL: built-in function translation: fnv_hash

I am using the fnv_hash in Impala to translate some string value into numbers. Now I am migrating to Spark SQL, is there a similar function in Spark SQL that I can use? An almost 1-1 function mapping

OVERLAPS function in Impala

I am trying to use and overlaps function as in ORACLE or Netezza that takes two date ranges and check if they overlap each other. Soemthing like this: SELECT (TIMESTAMP '2011-01-28 00:00:00', TIME...

Impala access to existing Parquet tables in S3

I have some Parquet tables that were created with SparkSQL stored in S3. I would like to also be able use them from Impala. I also have an instance of Impala running on CDH5 that I can access using...

split function does not work in Cloudera Impala

I keep getting an AnalysisException that says "split unknown" when I try to use the split function in Cloudera Impala. It seems to be a valid function listed on the built-in functions page. For ref...

Is Impala nested data available ?

according to this http://impala.io/overview.html, impala 2.1 should support nested data. I am using impala 2.1.1 but didn't find any documentation about nested data. Does anyone know when nested data

Not able to read Impala view correctly using Spark SQL

I am trying to read an Impala view from Spark Scala program using SparkSQL API. My code is like this val sparkSession: SparkSession = SparkSession.builder().appName("HiveLoader").master(new Spark...

What functionality does Hive have that impala does not? Hive vs Impala

I have used HIVE 13.1 extensively and want to start running some of my jobs in Impala 2.5. What functionalities in current hive are not available in impala? Has anyone transitioned workflows from...

What functionality does Hive have that impala does not? Hive vs Impala

I have used HIVE 13.1 extensively and want to start running some of my jobs in Impala 2.5. What functionalities in current hive are not available in impala? Has anyone transitioned workflows from...

Impala FDW for Postgres 9.5

I am looking for Impala Foreign Data Wrapper for Postgres 9.5. I have tried to figure out from the internet and can only have one reference to https://github.com/lapug/impala_fdw But it seems the f...