User Tools

Site Tools


cluster:121

This is an old revision of the document!



Back

Hadoop Summary

Our production Hadoop Cluster is based on Cloudera's CD3U6 repository. Here are some details (some links will not work because of the private network):

  • namenode (that is login node): whitetail.wesleyan.edu
    • whitetail also runs the Hadoop Scheduler and Health Monitor
    • ssh to it directly or from any of our other tails
  • resources: access to 600 GB of memory and 1.75 TB of Hadoop's Distributed File System (HDFS)
    • could be doubled in near future if needed
  • HDFS is not backed up!
    • You must request a writable work area /userdata/username
    • Be sure to down load your results to /home/username
  • Data to be shared (dictionaries, anagrams, etc) can be posted in /shareddata
    • request such items to be posted there
  • Basic tools (request other tools to be installed)
    • shell scripting
    • python
    • perl (Hadoop::Streaming)
    • R+RHadoop (rmr2, rhdfs, rhbase)
    • Hbase (noSQL database)
    • MySQL
      • request a database to be set up

Other useful pages

cluster/121.1379339833.txt.gz · Last modified: 2013/09/16 09:57 by hmeij