User Tools

Site Tools


cluster:121

This is an old revision of the document!



Back

Hadoop Summary

Our production Hadoop Cluster is based on Cloudera's CD3U6 repository. Here are some details (some links will not work because of the private network):

  • namenode (that is login node): whitetail.wesleyan.edu
    • whitetail also runs the Hadoop Scheduler and Health Monitor
    • ssh to it directly or from any of our other tails
  • resources: access to 600 GB of memory and 1.75 TB of Hadoop's Distributed File System (HDFS)
    • could be doubled in near future if needed
  • HDFS is not backed up!
    • You must request a writable work area /userdata/username
    • Be sure to down load your results to /home/username (that is the regular filesystem)
  • Data to be shared (dictionaries, anagrams, etc) can be posted in /shareddata
    • request such items to be posted there
  • Basic tools (request other tools to be installed)
    • shell scripting
    • python
    • perl (Hadoop::Streaming)
    • java (both Oracle in /usr/java and openJDK)
    • R+RHadoop (rmr2, rhdfs, rhbase)
    • Hbase (noSQL database)
    • MySQL
      • request a database to be set up

Note: the permissions are bit weird in HDFS but I think it is sorted out.

  • If this turns into a problem we'll let everybody run as user hdfs …

Other useful pages

cluster/121.1379343829.txt.gz · Last modified: 2013/09/16 15:03 by hmeij