User Tools

Site Tools


cluster:115

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:115 [2013/05/28 09:51]
hmeij [Rhadoop]
cluster:115 [2013/09/10 15:04] (current)
hmeij [Rhadoop]
Line 2: Line 2:
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
-===== Use Hadoop (test) Cluster =====+===== Use Hadoop Cluster =====
  
 [[cluster:114|Build Hadoop Cluster]] [[cluster:114|Build Hadoop Cluster]]
Line 241: Line 241:
 install.packages('rJava') install.packages('rJava')
 </code> </code>
 +
 +You could also set java in this file: $HADOOP_HOME/conf/hadoop-env.sh
  
 When that successful, add dependencies: When that successful, add dependencies:
Line 273: Line 275:
 R CMD INSTALL rmr-2.2.0.tar.gz R CMD INSTALL rmr-2.2.0.tar.gz
 R CMD INSTALL rhdfs_1.0.5.tar.gz R CMD INSTALL rhdfs_1.0.5.tar.gz
 +</code>
 +
 +Verify
 +
 +<code>
 +Type 'q()' to quit R.
 +
 +> library(rmr2)
 +Loading required package: Rcpp
 +Loading required package: RJSONIO
 +Loading required package: digest
 +Loading required package: functional
 +Loading required package: stringr
 +Loading required package: plyr
 +Loading required package: reshape2
 +> library(rhdfs)
 +Loading required package: rJava
 +
 +HADOOP_CMD=/usr/bin/hadoop
 +
 +Be sure to run hdfs.init()
 +> sessionInfo()
 +R version 3.0.0 (2013-04-03)
 +Platform: x86_64-redhat-linux-gnu (64-bit)
 +
 +locale:
 + [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 + [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 + [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 + [7] LC_PAPER=C                 LC_NAME=C
 + [9] LC_ADDRESS=C               LC_TELEPHONE=C
 +[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 +
 +attached base packages:
 +[1] stats     graphics  grDevices utils     datasets  methods   base
 +
 +other attached packages:
 + [1] rhdfs_1.0.5    rJava_0.9-4    rmr2_2.2.0     reshape2_1.2.2 plyr_1.8
 + [6] stringr_0.6.2  functional_0.4 digest_0.6.3   RJSONIO_1.0-3  Rcpp_0.10.3
 +
 </code> </code>
  
Line 281: Line 323:
 R script: R script:
  
 +<code>
 #!/usr/bin/Rscript #!/usr/bin/Rscript
  
Line 289: Line 332:
 small.ints = to.dfs(1:1000) small.ints = to.dfs(1:1000)
 mapreduce(input = small.ints, map = function(k, v) cbind(v, v^2)) mapreduce(input = small.ints, map = function(k, v) cbind(v, v^2))
 +</code>
  
 +Then Hbase for Rhbase:
  
 +[[http://hbase.apache.org/book/configuration.html]]
  
 +But first Trift, the language interface to the database Hbase:
  
 +<code>
 +yum install openssl098e
 +</code>
 +
 +Download Trift: [[http://thrift.apache.org/download/]]
 +
 +<code>
 +yum install byacc -y
 +yum install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel
 +
 +./configure
 +make
 +make install
 +export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig/
 +pkg-config --cflags thrift
 +cp -p /usr/local/lib/libthrift-0.9.0.so /usr/lib/
 +
 +HBASE_ROOT/bin/hbase thrift start &
 +lsof -i:9090 that is server, port 9095 is monitor
 +
 +</code>
 +
 +Configure for distributed environment: [[http://hbase.apache.org/book/standalone_dist.html#standalone]]
 +
 +  * used 3 zookeepers with quorum, see config example online
 +  * start with rolling_restart, the start & stop have a timing issue
 +  * /hbase owened by root:root
 +  * permissions reset on /hdfs, not sure why
 +  * also use /sanscratch/zookeepers
 +  * some more notes below
 +
 +
 +<code>
 +
 +
 +install.packages('rJava')
 +install.packages("int64")
 +install.packages(c("Rcpp", "RJSONIO", "bitops", "digest", "functional", "stringr", "plyr", "reshape2"))
 +
 +wget http://cran.r-project.org/src/contrib/Archive/Rcpp/Rcpp_0.9.8.tar.gz
 +wget -O rmr-2.2.0.tar.gz http://goo.gl/bhCU6
 +wget -O rhdfs_1.0.5.tar.gz https://github.com/RevolutionAnalytics/rhdfs/blob/master/build/rhdfs_1.0.5.tar.gz?raw=true
 +
 +R CMD INSTALL Rcpp_0.9.8.tar.gz
 +R CMD INSTALL rmr-2.2.0.tar.gz
 +R CMD INSTALL rhdfs_1.0.5.tar.gz
 +R CMD INSTALL rhbase_1.2.0.tar.gz
 +
 +yum install openssl098e openssl openssl-devel flex boost ruby ruby-libs ruby-devel php php-libs php-devel \
 +automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel
 +
 +b2 install --prefix=/usr/local
 +
 +thrift: ./configure --prefix=/usr/local --with-boost=/usr/local; make
 +make install
 +
 +cp -p /usr/local/lib/libthrift-0.9.0.so /usr/lib/
 +cd /usr/lib; ln -s libthrift-0.9.0.so libthrift.so
 +
 +SKIP (nasty replaced with straight copy, could go to nodes)
 +http://www.cpan.org
 +'o conf commit'
 +cpan> install Hadoop::Streaming 
 +
 +whitetail only, unpack hbase, edit conf/hbase-site.xml, add to /etc/rc.local
 +also edit conf/regionservers
 +copy /usr/local/hbase-version-dir to nodes:/usr/local
 +
 +  <property>
 +    <name>hbase.zookeeper.quorum</name>
 +    <value>example1,example2,example3</value>
 +    <description>The directory shared by RegionServers.
 +    </description>
 +  </property>
 +  <property>
 +    <name>hbase.zookeeper.property.dataDir</name>
 +    <value>/export/zookeeper</value>
 +    <description>Property from ZooKeeper's config zoo.cfg.
 +    The directory where the snapshot is stored.
 +    </description>
 +  </property>
 +
 +
 +</code>
  
  
Line 484: Line 615:
  
 ==== Perl Hadoop::Streaming ==== ==== Perl Hadoop::Streaming ====
 +
 +  * All nodes
 +
  
   * [[http://search.cpan.org/~spazm/Hadoop-Streaming-0.122420/lib/Hadoop/Streaming.pm]]   * [[http://search.cpan.org/~spazm/Hadoop-Streaming-0.122420/lib/Hadoop/Streaming.pm]]
cluster/115.1369749068.txt.gz ยท Last modified: 2013/05/28 09:51 by hmeij