   * a bit involved ... [[]]   * a bit involved ... [[]]
 +Here are my steps to get this working (with lots of Ross's help) for rmr2 and rhdfs installation. Do this on all nodes.
 +  * Add EPEL repository to your yum installation, then
 +  * yum install R, which pulls in 
 +Make sure java is installed properly (the one you used for Hadoop itself) and set ENV in /etc/profile
 +export JAVA_HOME="/usr/java/latest"
 +export PATH=/usr/java/latest/bin:$PATH
 +export HADOOP_HOME=/usr/lib/hadoop-0.20
 +export HADOOP_CMD=/usr/bin/hadoop
 +export HADOOP_STREAMING=/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u6.jar
 +I noticed that at soome point openJDK is reinstalled so I managed these links
 +lrwxrwxrwx  1 root root 24 May 27 10:41 /usr/bin/jar -> /usr/java/latest/bin/jar
 +lrwxrwxrwx  1 root root 21 May 27 09:47 /usr/bin/jar-alt -> /etc/alternatives/jar
 +lrwxrwxrwx  1 root root 30 May 27 10:41 /usr/bin/jarsigner -> /usr/java/latest/bin/jarsigner
 +lrwxrwxrwx  1 root root 27 May 27 09:47 /usr/bin/jarsigner-alt -> /etc/alternatives/jarsigner
 +lrwxrwxrwx  1 root root 25 May 27 10:35 /usr/bin/java -> /usr/java/latest/bin/java
 +lrwxrwxrwx  1 root root 22 May 27 09:47 /usr/bin/java-alt -> /etc/alternatives/java
 +lrwxrwxrwx  1 root root 26 May 27 10:38 /usr/bin/javac -> /usr/java/latest/bin/javac
 +lrwxrwxrwx  1 root root 23 May 27 09:47 /usr/bin/javac-alt -> /etc/alternatives/javac
 +lrwxrwxrwx  1 root root 25 May 27 09:47 /usr/bin/javadoc -> /etc/alternatives/javadoc
 +lrwxrwxrwx  1 root root 26 May 28 09:37 /usr/bin/javah -> /usr/java/latest/bin/javah
 +lrwxrwxrwx  1 root root 23 May 27 09:47 /usr/bin/javah-alt -> /etc/alternatives/javah
 +lrwxrwxrwx  1 root root 26 May 27 10:39 /usr/bin/javap -> /usr/java/latest/bin/javap
 +lrwxrwxrwx  1 root root 23 May 27 09:47 /usr/bin/javap-alt -> /etc/alternatives/javap
 +lrwxrwxrwx  1 root root 27 May 27 10:40 /usr/bin/javaws -> /usr/java/latest/bin/javaws
 +lrwxrwxrwx. 1 root root 28 May 15 14:56 /usr/bin/javaws-alt -> /usr/java/default/bin/javaws
 +So if commands ''which java'' and ''java -version'' return the proper information, reconfigure java in R. At the OS prompt
 +# at OS
 +R CMD javareconf
 +# in R
 +You could also set java in this file: $HADOOP_HOME/conf/
 +When that successful, add dependencies:
 +See the following files for current lists of dependencies:
 +Enter R and issues the command
 +install.packages(c("Rcpp", "RJSONIO", "bitops", "digest", "functional", "stringr", "plyr", "reshape2"))
 +If Rccp is a problem: locate and install an older version of Rcpp in the CRAN archives (
 +# in R
 +# at OS
 +R CMD INSTALL Rcpp_0.9.8.tar.gz
 +Finally the RHadoop stuff, at The OS level
 +wget -O rmr-2.2.0.tar.gz
 +wget -O rhdfs_1.0.5.tar.gz
 +R CMD INSTALL rmr-2.2.0.tar.gz
 +R CMD INSTALL rhdfs_1.0.5.tar.gz
 +Type 'q()' to quit R.
 +> library(rmr2)
 +Loading required package: Rcpp
 +Loading required package: RJSONIO
 +Loading required package: digest
 +Loading required package: functional
 +Loading required package: stringr
 +Loading required package: plyr
 +Loading required package: reshape2
 +> library(rhdfs)
 +Loading required package: rJava
 +Be sure to run hdfs.init()
 +> sessionInfo()
 +R version 3.0.0 (2013-04-03)
 +Platform: x86_64-redhat-linux-gnu (64-bit)
 + [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 + [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 + [7] LC_PAPER=C                 LC_NAME=C
 + [9] LC_ADDRESS=C               LC_TELEPHONE=C
 +attached base packages:
 +[1] stats     graphics  grDevices utils     datasets  methods   base
 +other attached packages:
 + [1] rhdfs_1.0.5    rJava_0.9-4    rmr2_2.2.0     reshape2_1.2.2 plyr_1.8
 + [6] stringr_0.6.2  functional_0.4 digest_0.6.3   RJSONIO_1.0-3  Rcpp_0.10.3
 +Tutorial documentation: [[]]
 +R script:
 +small.ints = to.dfs(1:1000)
 +mapreduce(input = small.ints, map = function(k, v) cbind(v, v^2))
 +Then Hbase for Rhbase:
 +But first Trift, the language interface to the database Hbase:
 +yum install openssl098e
 +Download Trift: [[]]
 +yum install byacc -y
 +yum install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel
 +make install
 +export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig/
 +pkg-config --cflags thrift
 +cp -p /usr/local/lib/ /usr/lib/
 +Configure for distributed environment: [[]]
 +  * used 3 zookeepers with quorum, see config example online
 +  * start with rolling_restart, the start & stop have a timing issue
 +  * /hbase owened by root:root
 +  * permissions reset on /hdfs, not sure why
 +  * also use /sanscratch/zookeepers
 +  * 
 ==== Perl Hadoop's native Streaming ==== ==== Perl Hadoop's native Streaming ====
 </code> </code>
 +==== Perl Hadoop's native Streaming #2 ====
 +Adopted from [[]]
 +  * Create vectors X = [x1,x2, ...] and Y = [y1,y2, ...]
 +  * And solve the product Z = [x1*y1, x2*y2, ...]
 +First, do this twice in shell
 +for i in `seq 1 1000000`
 +> do
 +> echo -e "$i,$RANDOM" >> v_data_large.txt
 +> done
 +The we'll use the mapper 
 +# convert comma delimited to tab delimited
 +        @fields = split(/,/, $line);
 +        if ($fields[0] eq '#') { next;}
 +        if($fields[0] && $fields[1]){
 +                print "$fields[0]\t$fields[1]";
 +        }
 +And the reducer from the web site
 +@fields=split(/\t/, $line);
 +$key = $fields[0];
 +$value = $fields[1];
 +if($lastKey ne "" && $key ne $lastKey){
 +print "$lastKey\t$product\n";
 +#the last key
 +print "$lastKey\t$product\n";
 +And submit the job
 + hadoop jar \
 +/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u6.jar \
 +-input /tmp/v_data.txt  -output /tmp/v.out \
 +-file ~/ -mapper ~/ \
 +-file ~/ -reducer ~/ 
 +And that works.
 ==== Perl Hadoop::Streaming ==== ==== Perl Hadoop::Streaming ====
 +  * Installed on whitetail only
   * [[]]   * [[]]
Line 308: Line 569:
 cpan> install Hadoop::Streaming  cpan> install Hadoop::Streaming 
-[root@qactweet1 hmeij07]# cat hadoop_streaming.txt +
 Installing /usr/local/share/perl5/Hadoop/ Installing /usr/local/share/perl5/Hadoop/
 Installing /usr/local/share/perl5/Hadoop/Streaming/ Installing /usr/local/share/perl5/Hadoop/Streaming/
Line 319: Line 580:
 Installing /usr/local/share/perl5/Hadoop/Streaming/Reducer/Input/ Installing /usr/local/share/perl5/Hadoop/Streaming/Reducer/Input/
   * How to use this?   * How to use this?
