cluster:121
This is an old revision of the document!
Hadoop Summary
Our production Hadoop Cluster is based on Cloudera's CD3U6 repository. Here are some details (some links will not work because of the private network):
- namenode (that is login node): whitetail.wesleyan.edu
- whitetail also runs the Hadoop Scheduler and Health Monitor
- ssh to it directly or from any of our other tails
- resources: access to 600 GB of memory and 1.75 TB of Hadoop's Distributed File System (HDFS)
- could be doubled in near future if needed
- HDFS is not backed up!
- You must request a writable work area /userdata/username
- Be sure to down load your results to /home/username
- Data to be shared (dictionaries, anagrams, etc) can be posted in /shareddata
- request such items to be posted there
- Basic tools (request other tools to be installed)
- shell scripting
- python
- perl (Hadoop::Streaming)
- R+RHadoop (rmr2, rhdfs, rhbase)
- Hbase (noSQL database)
- MySQL
- request a database to be set up
Other useful pages
cluster/121.1379339833.txt.gz · Last modified: by hmeij
