User Tools

Site Tools


cluster:151

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:151 [2016/10/20 18:12]
hmeij07 [Gaussian]
cluster:151 [2016/10/31 13:42]
hmeij07 [Installation]
Line 2: Line 2:
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
-===== SGI Altix 3000 =====+===== beeGFS =====
  
-The HPCC community has been offered a SGI Altix 3000 (purchased in 2006), basically a half rack on wheels (20U or so).  The Altix has 4 IA-64 processors (family Itanium 2)which aren't particularly fast (1.3 Ghz), but The Altix has 96 GBytes of memory and so is useful for some large Gaussian jobs which aren't particularly suited to running on our current HPCC cluster design (More details to come on this issue)+A document for me to recall and make notes of what I read in the manual pages and what needs testing.
  
-  * Details about IA-64 [[https://en.wikipedia.org/wiki/IA-64]] +Basically during the Summer of 2016 I investigated if the HPCC could afford enterprise level storageI wanted 99.999% uptime, snapshots, high availability and other goodies such as parallel NFS. Netapp came the closest but, eh, still at $42K lots of other options show up. The story is detailed here at [[cluster:149|The Storage Problem]]
-  * Details about Altix [[https://en.wikipedia.org/wiki/Altix]]+
  
-It is running redhat AS2.1 (which definitely ages it), so basic Linux. The node has been configured to fit our environment and will provide+This page is best read from the bottom up.
  
-  * /home from file server sharptail, over ethernet +==== cluster idea ====
-  * /sanscratch from scratch server greentail, over ethernet +
-  * Openlava 2.2 stand alone installation +
-  * icc/ifort version 8.1 on local disk +
-  * Gaussian version ?.?? on local disk+
  
-In order to use the local compilers you must "source" the following files+  * Storage servers: buy 2 now 4k+4k then 3rd in July 4k?
  
 +  * move test users over on 2 nodes, test, only change is $HOME
 +
 +  * Home cluster
 +    * cottontail (mngt+admingiu)
 +    * 2-3 new units storage (+snapshots/meta backup)
 +    * cottontail2 meta + n38-n45 meta, all mirrored
 +
 +==== Mirror Meta ====
 +
 +Definitely wnat Meta content mirrored, that way you use the n38-n45 nodes with local 15K disk, plus maybe cottontail2 (raid 1 with hot and cold spare).
 +
 +Content mirrorring will require more disk space. Perhaps snapshotting to another node is more useful, also solves backup issue.
 +
 <code> <code>
-  .  /opt/intel_cc_80.8.1-028/bin/iccvars.sh + 
-   /opt/intel_fc_80.8.1-024/bin/ifortvars.sh+# enable 
 +[root@n7 ~]# beegfs-ctl --mirrormd /mnt/beegfs/hmeij-mirror 
 +Mount: '/mnt/beegfs'; Path: '/hmeij-mirror' 
 +Operation succeeded
 + 
 +# put some new content in  
 +[root@n7 ~]# rsync -vac /home/hmeij/iozone-tests /mnt/beegfs/hmeij-mirror/ 
 + 
 +# lookup meta tag 
 +[root@n7 ~]# beegfs-ctl --getentryinfo /mnt/beegfs/hmeij-mirror/iozone-tests/current.tar 
 +Path: /hmeij-mirror/iozone-tests/current.tar 
 +Mount: /mnt/beegfs 
 +EntryID: 3-581392E1-31 
 + 
 +# find 
 +[root@sharptail ~]# ssh n38 find /data/beegfs_meta -name 3-581392E1-31 
 +/data/beegfs_meta/mirror/49.dentries/54/6C/0-581392F0-30/#fSiDs#/3-581392E1-31 
 + 
 +# and find 
 +[root@sharptail ~]# ssh n39 find /data/beegfs_meta -name 3-581392E1-31 
 +/data/beegfs_meta/dentries/54/6C/0-581392F0-30/#fSiDs#/3-581392E1-31 
 + 
 +# seems to work 
 </code> </code>
 +==== /mnt/beegfs/ ====
  
-You can also find MKL libraries at ''/opt/intel/mkl60''.+  * Source content 110G in XFS with ~100,000 files in ~2,000 dirs 
 +    * /home/hmeij (mix of files, nothing large) plus 
 +    * /home/fstarr/filler (lots of tiny files) 
 +   
 +  * File content spread across 2 storage servers 
 +    * petaltail:/var/chroot/data/beegfs_storage 
 +    * swallowtail:/data/beegfs_storage 
 +    * 56G used in beegfs-storage per storage server 
 +    * ~92,400 files per storage server 
 +    * ~1,400 dirs per storage server  mostly in "chunks" dir
  
-All Openlava commands work the same way as elsewhere in our HPCC environmentIn order to use the SGI Altix you must SSH to the head/compute node from any of our "taillogin nodes (ie cottontail, swallowtail)+  * Meta content spread across 2 meta servers (n37 and n38) 
 +    * 338MB per beegfs-meta server so 0.006% space wise for 2 servers 
 +    * ~105,000 files per metadata server 
 +    * ~35,000 dirs almost spread evenly across "dentriesand "inodes" 
 + 
 +  * Client (n7 and n8see 110G in /mnt/beegfs 
 +    * 110G in /mnt/beegfs 
 +    * ~100,000 files 
 +    * ~2,000 dirs 
 + 
 +Looks like:
  
 <code> <code>
  
-[root@hmeij ~]ssh hmeij@cottontail +file content
-hmeij@cottontail's password:  +
-Last login: Thu Oct 20 09:38:40 2016 from 129.133.22.42 +
-[hmeij@cottontail ~]$+
  
-then+[root@swallowtail ~]ls -lR /data/beegfs_storage/chunks/u0/57E4/2/169-57E42E75-31 
 +/data/beegfs_storage/chunks/u0/57E4/2/169-57E42E75-31: 
 +total 672 
 +-rw-rw-rw- 1 root root 289442 Jun 26  2015 D8-57E42E89-30 
 +-rw-rw-rw- 1 root root   3854 Jun 26  2015 D9-57E42E89-30 
 +-rw-rw-rw- 1 root root  16966 Jun 26  2015 DA-57E42E89-30 
 +-rw-rw-rw- 1 root root  65779 Jun 26  2015 DB-57E42E89-30 
 +-rw-rw-rw- 1 root root  20562 Jun 26  2015 DF-57E42E89-30 
 +-rw-rw-rw- 1 root root 259271 Jun 26  2015 E0-57E42E89-30 
 +-rw-rw-rw- 1 root root    372 Jun 26  2015 E1-57E42E89-30
  
-[hmeij@cottontail ~]$ ssh enzo +[root@petaltail ~]# ls -lR /var/chroots/data/beegfs_storage/chunks/u0/57E4/2/169-57E42E75-31 
-[hmeij@enzo hmeij]$ bqueues +/var/chroots/data/beegfs_storage/chunks/u0/57E4/2/169-57E42E75-31: 
-QUEUE_NAME      PRIO STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN  SUSP  +total 144 
-sgi96            50   Open:Active      4                          0+-rw-rw-rw- 1 root root     40 Jun 26  2015 DC-57E42E89-30 
 +-rw-rw-rw- 1 root root  40948 Jun 26  2015 DD-57E42E89-30 
 +-rw-rw-rw- 1 root root 100077 Jun 26  2015 DE-57E42E89-30 
 + 
 +# meta content 
 + 
 +[root@sharptail ~]# ssh n38 find /data/beegfs_meta -name 169-57E42E75-31 
 +/data/beegfs_meta/inodes/6A/7E/169-57E42E75-31 
 +/data/beegfs_meta/dentries/6A/7E/169-57E42E75-31 
 + 
 +[root@sharptail ~]# ssh n39 find /data/beegfs_meta -name 169-57E42E75-31 
 +(none, no mirror)
  
 </code> </code>
 +==== Tuning ====
  
-===== Gaussian =====+  * global interfaces files ib0->eth1->eth0 
 +    * connInterfacesFile /home/tmp/global/beegfs.connInterfacesFile 
 +    * set in /etc/beegfs-[storage|client|meta|admon|mgmtd].conf and restart services
  
-In order to run Gaussian jobs you must be member of the unix group ''gaussian'' and agree to accept ''/share/apps/gaussian/License.pdf''.+  * backup beeGFS EA metadata, see faq 
 +    * attempt restore 
 +    * or just snapshot
  
-(Details on how to leverage this global shared memory block to come ....)+  * storage server tuning 
 +    * set on cottontail on sdb, both values were 128  (seems to help -- late summer 2016) 
 +    * echo 4096 > /sys/block/sd?/queue/nr_requests 
 +    * echo 4096 > /sys/block/sd?/queue/read_ahead_kb 
 +    * set on cottontail, was 90112 + /etc/rc.local 
 +    * echo 262144 > /proc/sys/vm/min_free_kbytes 
 +  * do same on greentail? (done late fall 2016) 
 +    * all original values same as cottontail (all files) 
 +    * set on c1d1 thru c1d6 
 +  * do same on sharptail? 
 +    * no such values for sdb1 
 +    * can only find min_free_kbytes, same value as cottontail 
 +  * stripe and chunk size
  
-===== Example Job =====+<code>
  
-(Provide sample job submit script...)+[root@n7 ~]# beegfs-ctl --getentryinfo /mnt/beegfs/ 
 +Path: 
 +Mount: /mnt/beegfs 
 +EntryID: root 
 +Metadata node: n38 [ID: 48] 
 +Stripe pattern details: 
 ++ Type: RAID0 
 ++ Chunksize: 512K 
 ++ Number of storage targets: desired: 4
  
 +</code>
 +  * The cache type can be set in the client config file (/etc/beegfs/beegfs-client.conf).
 +    * buffered is default, few 100k per file
  
 +  * tuneNumWorkers in all /etc/beegfs/beggfs-C.conf file
 +    * for meta, storage and clients ...
 +
 +  * metadata server tuning
 +    * read in more detail
 +
 +==== Installation ====
 +
 +  * made easy [[http://www.beegfs.com/wiki/ManualInstallWalkThrough|External Link]]
 +  * rpms pulled from repository via petaltail in ''greentail:/sanscratch/tmp/beegfs''
 +
 +<code>
 +
 +[root@cottontail ~]# ssh n7 beegfs-net
 +
 +mgmt_nodes
 +=============
 +cottontail [ID: 1]
 +   Connections: TCP: 1 (10.11.103.253:8008); 
 +
 +meta_nodes
 +=============
 +n38 [ID: 48]
 +   Connections: TCP: 1 (10.11.103.48:8005); 
 +n39 [ID: 49]
 +   Connections: TCP: 2 (10.11.103.49:8005); 
 +
 +storage_nodes
 +=============
 +swallowtail [ID: 136]
 +   Connections: TCP: 1 (192.168.1.136:8003 [fallback route]); 
 +petaltail [ID: 217]
 +   Connections: <none>
 +
 +</code>
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/151.txt · Last modified: 2016/12/06 20:14 by hmeij07