cluster:121
This is an old revision of the document!
Hadoop Summary
Our production Hadoop Cluster is based on Cloudera's CD3U6 repository. Here are some details :
- namenode (that is login node): whitetail.wesleyan.edu
- whitetail also runs the Hadoop Scheduler and Health Monitor
- ssh to it directly or from any of our other tails
- resources: access to 600 GB of memory and 1.75 TB of Hadoop's Distributed File System (HDFS)
- could be doubled in near future if needed
- HDFS is not backed up!
- You must request a writable work area /userdata/username
- Be sure to down load your results to /home/username (that is the regular filesystem)
- Data to be shared (dictionaries, anagrams, etc) can be posted in /shareddata
- request such items to be posted there
- Basic tools (request other tools to be installed)
- shell scripting
- python
- perl (Hadoop::Streaming)
- java (both Oracle in /usr/java and openJDK)
- R+RHadoop (rmr2, rhdfs, rhbase)
- Hbase (noSQL database)
- MySQL
- request a database to be set up
- Note: the permissions are bit weird in HDFS but I think it is sorted out.
- If this turns into a problem we'll let everybody run as user hdfs …
- Note: some http links will not work because they point to the private network
- If you wish to view them launch firefox from whitetail …
Other useful pages
cluster/121.1379343985.txt.gz · Last modified: by hmeij
