Hadoop Lucene: Error: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=cuneyt, access=WRITE, inode=””:hdfs:hdfs:rwxr-xr-x

Problematic command: /homes/cuneyt/trunk/bin/mahout  lucene.vector –dir /homes/cuneyt/lucene/index –field 0 —output /homes/cuneyt/lda/vector –dictOut /homes/cuneyt/lda/dict.txt

Correct command: /homes/cuneyt/trunk/bin/mahout  lucene.vector –dir /homes/cuneyt/lucene/index –field 0 –output lda/vector –dictOut /homes/cuneyt/lda/dict.txt

Solution: Do not use absolute hdfs path (/homes/cuneyt/lda/vector ). Use a relative path ( lda/vector ). Hdfs does not allow mkdir with global path.


 

You cannot specify queue name in Mahout. These geniuses over there who wrote Mahout have totally fucked this up. As you do not pass it, Mahout tries to send the job to the “default” queue. In my case at least, the “default” queue does not exist, so i could not run my job.

The legend says that if you change the source code of trunk and add “conf.set(“mapred.job.queue.name”, “xxx”), it should be ok. Of course these legend tellers do not mention in which source file you add this miraculous line. In my case, I was trying to run cvb (topic modelling algorithm), so the source file should be somewhere in trunk/core/src/main/java/org/apache/mahout/clustering, but i have no idea which file it is.

I advise you to give up on Mahout, and try Y!LDA for topic modeling on Hadoop. GO to https://github.com/shravanmn/Yahoo_LDA, and download the zip file. Unzip it on your machine and go to /shravanmn-Yahoo_LDA-28011b8/docs/html/index.html in the unzipped directory. Have fun!


Advertisements