\\ **[[cluster:0|Back]]** ==== Hadoop Summary ==== Our production Hadoop Cluster is based on [[http://www.cloudera.com/content/cloudera/en/home.html|Cloudera]]'s CD3U6 repository. Here are some details : * namenode (that is login node): whitetail.wesleyan.edu * whitetail also runs the Hadoop Scheduler and Health Monitor * [[http://whitetail.wesleyan.edu:50070|Health Status]] * [[http://whitetail.wesleyan.edu:50030|Job Tracker]] * ssh to it directly or from any of our other tails * resources: access to 600 GB of memory and 1.75 TB of Hadoop's Distributed File System (HDFS) * could be doubled in near future if needed * HDFS is not backed up! * You must request a writable work area /userdata/username * Be sure to down load your results to /home/username (that is the regular filesystem) * Data to be shared (dictionaries, anagrams, etc) can be posted in /shareddata * request such items to be posted there * Basic tools (request other tools to be installed) * shell scripting * python * perl (Hadoop::Streaming) * java (both Oracle in /usr/java and openJDK) * R+RHadoop (rmr2, rhdfs, rhbase) * Hbase (noSQL database) * [[http://whitetail.wesleyan.edu:60010|Master & Zookeepers]] * [[http://whitetail.wesleyan.edu:9095|Thrift server]] * MySQL * request a database to be set up for you (limited space) * Note: the permissions are bit weird in HDFS but I think it is sorted out. * If this turns into a problem we'll let everybody run as user hdfs ... * Note: some http links will not work because they point to the private network * If you wish to view them launch firefox from whitetail ... Other useful pages * [[cluster:114|Build Hadoop Cluster]] * [[cluster:115|Use Hadoop Cluster]] \\ **[[cluster:0|Back]]**