Impala built-in function not available when migrating from Impala to SparkSQL

2016-12-22T05:43:08

I am using a built-in function in Impala like:

select id, parse_url(my_table.url, "QUERY", "extensionId") from my_table

Now I am migrating to SparkSQL (using pyspark in Jupyter Notebook):

my_table.select(my_table.id.cast('string'), parse_url(my_table.url.cast('string'), "QUERY", "extensionId")).show()

However, I got the following error:

NameError: name 'parse_url' is not defined

Also tried below:

my_table.registerTempTable("my_table")

sqlContext.sql("select id, url, parse_url(url, 'QUERY', 'extensionId') as new_url from my_table").show(100)

But all the new_url becomes null.

Any idea what I missed here? Also, how would people handle such problem? Thanks!

Copyright License:
Author:「Edamame」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/41272377/impala-built-in-function-not-available-when-migrating-from-impala-to-sparksql

About “Impala built-in function not available when migrating from Impala to SparkSQL” questions

I am using a built-in function in Impala like: select id, parse_url(my_table.url, "QUERY", "extensionId") from my_table Now I am migrating to SparkSQL (using pyspark in Jupyter Notebook): my_table.
I am using the fnv_hash in Impala to translate some string value into numbers. Now I am migrating to Spark SQL, is there a similar function in Spark SQL that I can use? An almost 1-1 function mapping
I am trying to use and overlaps function as in ORACLE or Netezza that takes two date ranges and check if they overlap each other. Soemthing like this: SELECT (TIMESTAMP '2011-01-28 00:00:00', TIME...
I have some Parquet tables that were created with SparkSQL stored in S3. I would like to also be able use them from Impala. I also have an instance of Impala running on CDH5 that I can access using...
I keep getting an AnalysisException that says "split unknown" when I try to use the split function in Cloudera Impala. It seems to be a valid function listed on the built-in functions page. For ref...
according to this http://impala.io/overview.html, impala 2.1 should support nested data. I am using impala 2.1.1 but didn't find any documentation about nested data. Does anyone know when nested data
I am trying to read an Impala view from Spark Scala program using SparkSQL API. My code is like this val sparkSession: SparkSession = SparkSession.builder().appName("HiveLoader").master(new Spark...
I have used HIVE 13.1 extensively and want to start running some of my jobs in Impala 2.5. What functionalities in current hive are not available in impala? Has anyone transitioned workflows from...
I have used HIVE 13.1 extensively and want to start running some of my jobs in Impala 2.5. What functionalities in current hive are not available in impala? Has anyone transitioned workflows from...
I am looking for Impala Foreign Data Wrapper for Postgres 9.5. I have tried to figure out from the internet and can only have one reference to https://github.com/lapug/impala_fdw But it seems the f...

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.