I am using a built-in function in Impala like:
select id, parse_url(my_table.url, "QUERY", "extensionId") from my_table
Now I am migrating to SparkSQL (using pyspark in Jupyter Notebook):
my_table.select(my_table.id.cast('string'), parse_url(my_table.url.cast('string'), "QUERY", "extensionId")).show()
However, I got the following error:
NameError: name 'parse_url' is not defined
Also tried below:
my_table.registerTempTable("my_table")
sqlContext.sql("select id, url, parse_url(url, 'QUERY', 'extensionId') as new_url from my_table").show(100)
But all the new_url
becomes null
.
Any idea what I missed here? Also, how would people handle such problem? Thanks!
Copyright License:
Author:「Edamame」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/41272377/impala-built-in-function-not-available-when-migrating-from-impala-to-sparksql