Pig Cassandra cluster ClassNotFoundException: org.apache.cassandra.hadoop.ColumnFamilySplit

2011-12-09T14:51:01

I am trying to run Cassandra-0.8.5, Hadoop 0.2.0 and Pig 0.8.1. I run a very simple pig scripts as

rows = LOAD 'cassandra://pygmalion/$CF' USING CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
counted = foreach (group rows all) generate COUNT($1);
dump counted;

it worked well if i run local mode. But if I run mapreduce mode, i alway get error message as below, i run out of idea. any help or hind will be greatly appreciated. pig_813399.log:

Backend error message

java.io.IOException: java.lang.ClassNotFoundException: org.apache.cassandra.hadoop.ColumnFamilySplit
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:225)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
        at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:349)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:611)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
        at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.ClassNotFoundException: org.apache.c

Pig Stack Trace

ERROR 2997: Unable to recreate exception from backed error: java.io.IOException: java.lang.ClassNotFoundException: org.apache.cassandra.hadoop.ColumnFamilySplit

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias counted. Backend error : Unable to recreate exception from backed error: java.io.IOException: java.lang.ClassNotFoundException: org.apache.cassandra.hadoop.ColumnFamilySplit
        at org.apache.pig.PigServer.openIterator(PigServer.java:753)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:615)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
        at org.apache.pig.tools.grunt.GruntParser.loadScript(GruntParser.java:477)
        at org.apache.pig.tools.grunt.GruntParser.processScript(GruntParser.java:422)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.Script(PigScriptParser.java:692)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:425)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
        at org.apache.pig.Main.run(Main.java:455)
        at org.apache.pig.Main.main(Main.java:107)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.io.IOExcepti
on: java.lang.ClassNotFoundException: org.apache.cassandra.hadoop.ColumnFamilySplit
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:382)
        at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1209)
        at org.apache.pig.PigServer.storeEx(PigServer.java:885)
        at org.apache.pig.PigServer.store(PigServer.java:827)
        at org.apache.pig.PigServer.openIterator(PigServer.java:739)

Copyright License:
Author:「user1089192」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/8442040/pig-cassandra-cluster-classnotfoundexception-org-apache-cassandra-hadoop-column

About “Pig Cassandra cluster ClassNotFoundException: org.apache.cassandra.hadoop.ColumnFamilySplit” questions

I am trying to run Cassandra-0.8.5, Hadoop 0.2.0 and Pig 0.8.1. I run a very simple pig scripts as rows = LOAD 'cassandra://pygmalion/$CF' USING CassandraStorage() AS (key, columns: bag {T: tuple...
I am working on BI process that will read data from cassandra, create summaries using Map Reduce and write back to a different keyspace. Starting with a single node, everything worked as i expecte...
I'm trying to set up a trial cassandra + pig cluster. The cassandra wiki makes it sound like you need hadoop to integrate with pig. but the readme in cassandra-src/contrib/pig makes it sound like...
I know that cassandra has hadoop integration, but it seems to be for a combination cassandra/hadoop cluster using cassandra's own imitation. But is there a loader that can be used with pig in a se...
I am trying to connect to Cassandra from pig. But Cassandra is installed in different cluster i need to connect to connect to Cassandra remotely from pig. I am referring following link exmaple Ge...
I have a cassandra cluster with username and password set in the password.properties file, I could'n figure out how to use pig's CassandraStorage to load and write data to this cluster. Without the
I am using pig CassandraStroage() to insert a big data set into cassandra, after running 4 hours, it crashed with the following exception: java.lang.NullPointerException at org.apache.cass...
Trying to write a Pig script that will extract data from a Cassandra table. The Pig script looks like this: REGISTER ./cassandra-all-2.0.8.39.jar REGISTER ./datastax-agent-4.1.4-standalone.jar RE...
I have a Hadoop (2.7.2) setup over a Cassandra (3.7) Cluster. I have no problem with using Hadoop MapReduce. Similarly, I have no problem to create tables and keyspace in CQLSH. However, I have been
I have a 4 node Cassandra cluster which is also a hadoop cluster When I run pig script to select and count the rows of Cassandra table - it creates hadoop job with 1 map task - and it takes long t...

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.