This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:115 [2013/05/28 13:26] hmeij [Perl Hadoop's native Streaming #2] |
cluster:115 [2013/09/03 19:38] hmeij [Rhadoop] |
||
---|---|---|---|
Line 186: | Line 186: | ||
* a bit involved ... [[http:// | * a bit involved ... [[http:// | ||
+ | |||
+ | Here are my steps to get this working (with lots of Ross's help) for rmr2 and rhdfs installation. Do this on all nodes. | ||
+ | |||
+ | * Add EPEL repository to your yum installation, | ||
+ | * yum install R, which pulls in | ||
+ | |||
+ | < | ||
+ | R-core-3.0.0-2.el6.x86_64 | ||
+ | R-java-devel-3.0.0-2.el6.x86_64 | ||
+ | R-devel-3.0.0-2.el6.x86_64 | ||
+ | R-core-devel-3.0.0-2.el6.x86_64 | ||
+ | R-java-3.0.0-2.el6.x86_64 | ||
+ | R-3.0.0-2.el6.x86_64 | ||
+ | </ | ||
+ | |||
+ | |||
+ | Make sure java is installed properly (the one you used for Hadoop itself) and set ENV in / | ||
+ | |||
+ | < | ||
+ | export JAVA_HOME="/ | ||
+ | export PATH=/ | ||
+ | |||
+ | export HADOOP_HOME=/ | ||
+ | export HADOOP_CMD=/ | ||
+ | export HADOOP_STREAMING=/ | ||
+ | </ | ||
+ | |||
+ | I noticed that at soome point openJDK is reinstalled so I managed these links | ||
+ | |||
+ | < | ||
+ | lrwxrwxrwx | ||
+ | lrwxrwxrwx | ||
+ | lrwxrwxrwx | ||
+ | lrwxrwxrwx | ||
+ | lrwxrwxrwx | ||
+ | lrwxrwxrwx | ||
+ | lrwxrwxrwx | ||
+ | lrwxrwxrwx | ||
+ | lrwxrwxrwx | ||
+ | lrwxrwxrwx | ||
+ | lrwxrwxrwx | ||
+ | lrwxrwxrwx | ||
+ | lrwxrwxrwx | ||
+ | lrwxrwxrwx | ||
+ | lrwxrwxrwx. 1 root root 28 May 15 14:56 / | ||
+ | </ | ||
+ | |||
+ | So if commands '' | ||
+ | |||
+ | < | ||
+ | # at OS | ||
+ | R CMD javareconf | ||
+ | # in R | ||
+ | install.packages(' | ||
+ | </ | ||
+ | |||
+ | You could also set java in this file: $HADOOP_HOME/ | ||
+ | |||
+ | When that successful, add dependencies: | ||
+ | |||
+ | See the following files for current lists of dependencies: | ||
+ | |||
+ | [[https:// | ||
+ | [[https:// | ||
+ | |||
+ | Enter R and issues the command | ||
+ | |||
+ | < | ||
+ | install.packages(c(" | ||
+ | </ | ||
+ | |||
+ | If Rccp is a problem: locate and install an older version of Rcpp in the CRAN archives (http:// | ||
+ | |||
+ | < | ||
+ | # in R | ||
+ | install.packages(" | ||
+ | # at OS | ||
+ | wget http:// | ||
+ | R CMD INSTALL Rcpp_0.9.8.tar.gz | ||
+ | </ | ||
+ | |||
+ | Finally the RHadoop stuff, at The OS level | ||
+ | |||
+ | < | ||
+ | wget -O rmr-2.2.0.tar.gz http:// | ||
+ | wget -O rhdfs_1.0.5.tar.gz https:// | ||
+ | |||
+ | R CMD INSTALL rmr-2.2.0.tar.gz | ||
+ | R CMD INSTALL rhdfs_1.0.5.tar.gz | ||
+ | </ | ||
+ | |||
+ | Verify | ||
+ | |||
+ | < | ||
+ | Type ' | ||
+ | |||
+ | > library(rmr2) | ||
+ | Loading required package: Rcpp | ||
+ | Loading required package: RJSONIO | ||
+ | Loading required package: digest | ||
+ | Loading required package: functional | ||
+ | Loading required package: stringr | ||
+ | Loading required package: plyr | ||
+ | Loading required package: reshape2 | ||
+ | > library(rhdfs) | ||
+ | Loading required package: rJava | ||
+ | |||
+ | HADOOP_CMD=/ | ||
+ | |||
+ | Be sure to run hdfs.init() | ||
+ | > sessionInfo() | ||
+ | R version 3.0.0 (2013-04-03) | ||
+ | Platform: x86_64-redhat-linux-gnu (64-bit) | ||
+ | |||
+ | locale: | ||
+ | [1] LC_CTYPE=en_US.UTF-8 | ||
+ | [3] LC_TIME=en_US.UTF-8 | ||
+ | [5] LC_MONETARY=en_US.UTF-8 | ||
+ | [7] LC_PAPER=C | ||
+ | [9] LC_ADDRESS=C | ||
+ | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C | ||
+ | |||
+ | attached base packages: | ||
+ | [1] stats | ||
+ | |||
+ | other attached packages: | ||
+ | [1] rhdfs_1.0.5 | ||
+ | [6] stringr_0.6.2 | ||
+ | |||
+ | </ | ||
+ | |||
+ | Test | ||
+ | |||
+ | Tutorial documentation: | ||
+ | |||
+ | R script: | ||
+ | |||
+ | < | ||
+ | # | ||
+ | |||
+ | library(rmr2) | ||
+ | library(rhdfs) | ||
+ | hdfs.init() | ||
+ | |||
+ | small.ints = to.dfs(1: | ||
+ | mapreduce(input = small.ints, map = function(k, v) cbind(v, v^2)) | ||
+ | </ | ||
+ | |||
+ | Then Hbase for Rhbase: | ||
+ | |||
+ | [[http:// | ||
+ | |||
+ | But first Trift, the language interface to the database Hbase: | ||
+ | |||
+ | < | ||
+ | yum install openssl098e | ||
+ | </ | ||
+ | |||
+ | Download Trift: [[http:// | ||
+ | |||
+ | < | ||
+ | yum install byacc -y | ||
+ | |||
+ | ./configure | ||
+ | make | ||
+ | make install | ||
+ | export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/ | ||
+ | pkg-config --cflags thrift | ||
+ | cp -p / | ||
+ | </ | ||
+ | |||
+ | Configure for distributed environment: | ||
+ | |||
+ | |||
+ | |||
+ | |||
==== Perl Hadoop' | ==== Perl Hadoop' |