Spring boot Hadoop, Webhdfs and Apache Knox

2016-05-03T19:55:31

I have a Spring boot application which is accessing HDFS through Webhdfs secured via Apache Knox secured by Kerberos. I created my own KnoxWebHdfsFileSystem with custom scheme (swebhdfsknox) as a subclass of WebHdfsFilesystem which only changes the URLs to contain the Knox proxy prefix. So it effectively remaps requests from form:

http://host:port/webhdfs/v1/...

to the Knox one:

http://host:port/gateway/default/webhdfs/v1/...

I do this by overriding two methods:

  1. public URI getUri()
  2. URL toUrl(Op op, Path fspath, Param<?, ?>... parameters)

So far so good. I let spring boot create FsShell for me and use it for various operations such as list files, mkdir etc. All work fine. Except copyFromLocal which as documented requires 2 steps and redirect. And on the last step when the filesystem tries to PUT to the final URL which received in Location header it fails with error:

org.apache.hadoop.security.AccessControlException: Authentication required
    at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:334) ~[hadoop-hdfs-2.6.0.jar:na]
    at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91) ~[hadoop-hdfs-2.6.0.jar:na]
    at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathOutputStreamRunner$1.close(WebHdfsFileSystem.java:787) ~[hadoop-hdfs-2.6.0.jar:na]
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:54) ~[hadoop-common-2.6.0.jar:na]
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112) ~[hadoop-common-2.6.0.jar:na]
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366) ~[hadoop-common-2.6.0.jar:na]
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338) ~[hadoop-common-2.6.0.jar:na]
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:302) ~[hadoop-common-2.6.0.jar:na]
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1889) ~[hadoop-common-2.6.0.jar:na]
    at org.springframework.data.hadoop.fs.FsShell.copyFromLocal(FsShell.java:265) ~[spring-data-hadoop-core-2.2.0.RELEASE.jar:2.2.0.RELEASE]
    at org.springframework.data.hadoop.fs.FsShell.copyFromLocal(FsShell.java:254) ~[spring-data-hadoop-core-2.2.0.RELEASE.jar:2.2.0.RELEASE]

I suspect the problem is the redirect somehow but can't figure out what might be the problem here. If I do the same requests via curl the file is successfully uploaded to HDFS.

Copyright License:
Author:「Jan Zyka」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/37003468/spring-boot-hadoop-webhdfs-and-apache-knox

About “Spring boot Hadoop, Webhdfs and Apache Knox” questions

I have a Spring boot application which is accessing HDFS through Webhdfs secured via Apache Knox secured by Kerberos. I created my own KnoxWebHdfsFileSystem with custom scheme (swebhdfsknox) as a
I have installed Apache KNOX using Apache Ambari. Followed the below link for configuring the WEBHDFS with KNOX.KNOX is installed in address1. https://knox.apache.org/books/knox-0-13-0/user-guide...
The following URL works curl -X GET 'http://10.1.1.1:50070/webhdfs/v1/?op=LISTSTATUS' and returns Expires: Thu, 07 May 2015 04:19:20 GMT Date: Thu, 07 May 2015 04:19:20 GMT Pragma: no-cache Conten...
I try to access HDFS in Hadoop Sandbox with the help of Java API from a Spring Boot application. To specify the URI to access the filesystem by I use a configuration parameter spring.hadoop.fsUri. ...
In hadoop, Is there any limit to the size of data that can be accessed/Ingested to HDFS through knox + webhdfs?
I'm currently working on a future project with an Hadoop cluster. I need to find informations about security of the cluster. I found the API Apache Knox Gateway which seems to be what we need. We ...
I have a Spring Boot application that uses spring-yarn-boot:2.2.0.RELEASE to get access to a Hadoop filesystem (HDFS). Operations that I do are LISTSTATUS, GETFILESTATUS and OPEN (to read a file). ...
I have KNOX gateway setup for our Hadoop cluster and I have gone through KNOX WebHDFS examples. So far, I know that the below cURL commands can be used to create a directory and upload a single fil...
Hadoop webhdfs: https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html Azure webhdfs: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-overview
We have a Windows application which is communicating fine via the WebHDFS Client (In the Incubator phase) http:/ /hadoopsdk.codeplex.com/wikipage?title=WebHDFS%20Client&amp;referringTitle=Home to a

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.