This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:88 [2010/08/10 21:01] hmeij |
cluster:88 [2010/08/11 17:44] hmeij |
||
---|---|---|---|
Line 184: | Line 184: | ||
* make a backup copy of / | * make a backup copy of / | ||
* edit file, delete everything but queue ' | * edit file, delete everything but queue ' | ||
- | * (if you rename queue normal you also need to edit lsb.params) | + | * (if you rename queue normal you also need to edit lsb.params |
* remove most queue definitions and set the following | * remove most queue definitions and set the following | ||
- | * QJOBLIMIT | + | * QJOB_LIMIT |
- | * UJOBLIMIT | + | * UJOB_LIMIT |
* INTERACTIVE = no (only batch is allowed) | * INTERACTIVE = no (only batch is allowed) | ||
* EXCLUSIVE = Y (allow the bsub -x flag) | * EXCLUSIVE = Y (allow the bsub -x flag) | ||
* PRE_EXEC = / | * PRE_EXEC = / | ||
* POST_EXEC = / | * POST_EXEC = / | ||
- | * make the directories /home/apps (for compile | + | * make the directories /home/apps (for compiled |
- | * make the directory / | + | * make the directory / |
- | * be sure / | + | * be sure / |
* create the pre/post exec files (post does an rm -rf against the created directories) | * create the pre/post exec files (post does an rm -rf against the created directories) | ||
+ | * for example: | ||
< | < | ||
#!/bin/bash | #!/bin/bash | ||
Line 208: | Line 209: | ||
* ' | * ' | ||
- | * ' | + | * ' |
+ | |||
+ | Now we're ready to submit a serial jobs. As a non-privilege user create two files: | ||
- | Now we're ready to submit jobs. As non-priviledged user create two files: | ||
* run | * run | ||
+ | |||
< | < | ||
#!/bin/bash | #!/bin/bash | ||
- | rm -f out err job*.out | + | |
+ | rm -f out err job3.out | ||
#BSUB -q normal | #BSUB -q normal | ||
#BSUB -J test | #BSUB -J test | ||
Line 239: | Line 244: | ||
* job.sh | * job.sh | ||
+ | * | ||
< | < | ||
#!/bin/bash | #!/bin/bash | ||
Line 251: | Line 257: | ||
</ | </ | ||
+ | |||
+ | * 'bsub < run' (submits) | ||
+ | * ' | ||
+ | |||
+ | |||
+ | ===== Step 6 ===== | ||
+ | |||
+ | More fun. Parallel jobs can be submitted over ethernet interconnects but will not achieve the performance of Infiniband interconnects ofcourse. | ||
+ | |||
+ | * yum install libibverbs; pdsh yum install libibverbs -q -y | ||
+ | * yum install gcc-c++ | ||
+ | |||
+ | On our Dell cluster we have static pre-compiled flavors of MPI and OFED. A tarball of 200 MB can be found here [[hhttp:// | ||
+ | |||
+ | * download tarball, stage in / | ||
+ | * cd /opt; tar zxvf / | ||
+ | * examples in / | ||
+ | * export PATH=/ | ||
+ | * export LD_LIBRARY_PATH=/ | ||
+ | * cd / | ||
+ | * ./ring.c; ./hello.c (to test, it'll complain about no HCA card) | ||
+ | |||
+ | Ok, so now we need write a script to submit a parallel job. A parallel job is submitted with command ' | ||
+ | |||
+ | * irun | ||
+ | |||
+ | < | ||
+ | #!/bin/bash | ||
+ | |||
+ | rm -f err out | ||
+ | |||
+ | #BSUB -e err | ||
+ | #BSUB -o out | ||
+ | #BSUB -n 4 | ||
+ | #BSUB -q normal | ||
+ | #BSUB -J ptest | ||
+ | |||
+ | export PATH=/ | ||
+ | export LD_LIBRARY_PATH=/ | ||
+ | |||
+ | echo "make sure we have the right mpirun" | ||
+ | which mpirun | ||
+ | |||
+ | / | ||
+ | |||
+ | / | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | |||
\\ | \\ | ||
**[[cluster: | **[[cluster: |