cluster:114
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cluster:114 [2013/05/17 13:46] – [Hadoop Cluster] hmeij | cluster:114 [2013/09/10 18:59] (current) – hmeij | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| **[[cluster: | **[[cluster: | ||
| - | ==== Hadoop Cluster ==== | + | ==== Build Hadoop |
| - | These are my notes building a test Hadoop cluster on virtual machines in VMware. They consists of a blending of instructions posted by others with my commentary added. | + | [[cluster: |
| + | |||
| + | These are my notes building a test Hadoop cluster on virtual machines in VMware. They consists of a blending of instructions posted by others with my commentary added. | ||
| * CTOvision [[http:// | * CTOvision [[http:// | ||
| Line 13: | Line 15: | ||
| * Yahoo [[http:// | * Yahoo [[http:// | ||
| * Apache [[http:// | * Apache [[http:// | ||
| - | * Noll [[www.michael-noll.com/ | + | * Noll [[http://www.michael-noll.com/ |
| - | * IBM article [[http:// | + | * IBM article [[http:// |
| + | |||
| + | And | ||
| + | |||
| + | * White [[http:// | ||
| + | |||
| + | ==== Building ==== | ||
| + | |||
| + | * Deployed 8 virtual machines, Oracle Linux 6, 64 bit, bare bones. | ||
| + | * Each node has 1 GB ram and a 36 GB hard disk. | ||
| + | |||
| + | * First get rid of OpenJDK if it's in your VMware template | ||
| + | * Consult CTOvision on how to do that. | ||
| + | * The download latest Java packages from Oracle and install. | ||
| + | * Everything below is done by root. | ||
| + | |||
| + | < | ||
| + | # all nodes, i used pdsh to spawn commands across all nodes | ||
| + | rpm -ivh / | ||
| + | rpm -ivh / | ||
| + | alternatives --install / | ||
| + | alternatives --auto java | ||
| + | # fix this as some Hadoop scripts look at this location | ||
| + | cd /usr/java | ||
| + | ln -s ./ | ||
| + | which java | ||
| + | java -version | ||
| + | </ | ||
| + | |||
| + | * Next set up the Cloudera repository | ||
| + | |||
| + | < | ||
| + | # all nodes | ||
| + | cd / | ||
| + | wget http:// | ||
| + | yum update | ||
| + | yum install hadoop-0.20 | ||
| + | </ | ||
| + | |||
| + | * Selinux, again ... | ||
| + | |||
| + | < | ||
| + | setenforce 0 | ||
| + | # edit this file and disable | ||
| + | vi / | ||
| + | </ | ||
| + | |||
| + | * Ports, the node need to talk to each other as well as allow admin pages to load | ||
| + | |||
| + | < | ||
| + | # edit this file and restart iptables | ||
| + | vi / | ||
| + | # hadoop | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -m iprange --src-range 129.133.x.xxx-129.133.x.xxx --dport 50070 -j ACCEPT | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -m iprange --src-range 129.133.x.xxx-129.133.x.xxx --dport 50075 -j ACCEPT | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -m iprange --src-range 129.133.x.xxx-129.133.x.xxx --dport 50090 -j ACCEPT | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -m iprange --src-range 129.133.x.xxx-129.133.x.xxx --dport 50105 -j ACCEPT | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -m iprange --src-range 129.133.x.xxx-129.133.x.xxx --dport 50030 -j ACCEPT | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -m iprange --src-range 129.133.x.xxx-129.133.x.xxx --dport 50060 -j ACCEPT | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -m iprange --src-range 129.133.x.xxx-129.133.x.xxx --dport 8020 -j ACCEPT | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -m iprange --src-range 129.133.x.xxx-129.133.x.xxx --dport 50010 -j ACCEPT | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -m iprange --src-range 129.133.x.xxx-129.133.x.xxx --dport 50020 -j ACCEPT | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -m iprange --src-range 129.133.x.xxx-129.133.x.xxx --dport 50100 -j ACCEPT | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -m iprange --src-range 129.133.x.xxx-129.133.x.xxx --dport 8021 -j ACCEPT | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -m iprange --src-range 129.133.x.xxx-129.133.x.xxx --dport 9001 -j ACCEPT | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -m iprange --src-range 129.133.x.xxx-129.133.x.xxx --dport 8012 -j ACCEPT | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -m iprange --src-range 129.133.x.xxx-129.133.x.xxx --dport 54310 -j ACCEPT | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -m iprange --src-range 129.133.x.xxx-129.133.x.xxx --dport 54311 -j ACCEPT | ||
| + | # plus 127.0.0.1:0 and maybe 9000 | ||
| + | # hadoop admin status | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -s 129.133.0.0/ | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -s 129.133.0.0/ | ||
| + | -A INPUT -m state --state NEW -m tcp -p tcp -s 129.133.0.0/ | ||
| + | </ | ||
| + | |||
| + | * Install the **headnode** node and tracker | ||
| + | |||
| + | < | ||
| + | # head node | ||
| + | yum -y install hadoop-0.20-namenode | ||
| + | yum -y install hadoop-0.20-jobtracker | ||
| + | </ | ||
| + | |||
| + | * On all the work nodes | ||
| + | |||
| + | < | ||
| + | # data node | ||
| + | yum -y install hadoop-0.20-datanode | ||
| + | yum -y install hadoop-0.20-tasktracker | ||
| + | </ | ||
| + | |||
| + | * Next set up the configuration environment | ||
| + | * Edit the conf files, consult Dakini site for content | ||
| + | * Copy those 3 files to all work nodes | ||
| + | * The display command should point to the MyCluster files | ||
| + | |||
| + | < | ||
| + | # all nodes | ||
| + | cp -r / | ||
| + | alternatives --install / | ||
| + | alternatives --set hadoop-0.20-conf / | ||
| + | alternatives --display hadoop-0.20-conf | ||
| + | vi / | ||
| + | vi / | ||
| + | vi / | ||
| + | </ | ||
| + | |||
| + | * Since this is a test cluster I located the DHFS filesystem on the OS disk | ||
| + | * In a production environment you'd want multiple dedicated disks per node | ||
| + | |||
| + | < | ||
| + | # all nodes | ||
| + | mkdir -p / | ||
| + | mkdir -p / | ||
| + | mkdir -p / | ||
| + | mkdir -p / | ||
| + | chown -R hdfs:hadoop /mnt/hdfs | ||
| + | chown -R mapred: | ||
| + | </ | ||
| + | |||
| + | * Format HDFS! Very important. Do ONLY ONCE on head node. | ||
| + | |||
| + | < | ||
| + | # headnode only | ||
| + | sudo -u hdfs hadoop namenode -format | ||
| + | </ | ||
| + | |||
| + | * Fix permissions | ||
| + | |||
| + | < | ||
| + | # all nodes | ||
| + | chgrp hdfs / | ||
| + | chmod g+rw / | ||
| + | </ | ||
| + | |||
| + | * Start Hadoop nodes and trackers | ||
| + | * If you receive the dreaded " | ||
| + | * Check the log in question it'll give a hint | ||
| + | * You may have typos in the XML files and configuration does not load | ||
| + | * File permissions may prevent nodes and trackers from starting | ||
| + | * You missed a step, like in the alternatives commands | ||
| + | * You issued the HDFS format command multiples times | ||
| + | |||
| + | < | ||
| + | # head node | ||
| + | / | ||
| + | / | ||
| + | |||
| + | # work nodes | ||
| + | / | ||
| + | / | ||
| + | </ | ||
| + | |||
| + | * Alright, lets some filesystem entries | ||
| + | |||
| + | < | ||
| + | # head node only | ||
| + | sudo -u hdfs hadoop fs -mkdir / | ||
| + | sudo -u hdfs hadoop fs -chown mapred: | ||
| + | sudo -u hdfs hadoop dfs -mkdir /tmp | ||
| + | sudo -u hdfs hadoop dfs -chmod -R 1777 /tmp | ||
| + | </ | ||
| + | |||
| + | * Command line health check | ||
| + | |||
| + | < | ||
| + | sudo -u hdfs hadoop dfsadmin -report | ||
| + | sudo -u hdfs hadoop dfs -df | ||
| + | </ | ||
| + | |||
| + | * And from a remote machine access your head node | ||
| + | * Hadoop Map/Reduce Administration | ||
| + | * [[http:// | ||
| + | * The Namenode | ||
| + | * [[http:// | ||
| + | |||
| + | TODO | ||
| + | * Run some jobs | ||
| + | * Find a MOOC course | ||
| \\ | \\ | ||
| **[[cluster: | **[[cluster: | ||
cluster/114.1368798401.txt.gz · Last modified: by hmeij
