User Tools

Site Tools


cluster:81


Back

Gaussian never fixed the connectivity with Linda so it can not be run across multiple nodes. — Meij, Henk 2010/12/09 10:45

Gaussian & Linda

(I wrote this up for a user so am sharing it here until we get clarification from Gaussian.com)

Hi Anthony,

I observed your job below on sharptail. This must be running with the standard g09 executable. Gaussian is program that forks itself on the same host for as many threads you define, in your case 16. You’ll notice below that the scheduler allocates 16 jobslots on 8 nodes. You’ll also notice that Gaussian forks itself 16 times but on the same host, all the others are idle. This will seriously slow down your job.

The solution to this is that you must use Gaussian compiled with Linda. The latter provides the communication between the nodes. In that instance Gaussian will fork itself only twice on each host. Now in order to do this you must specify the target hosts in your Gaussian job. So here are some tips. In your submit script…

# Point to target queue with predefined hosts
#BSUB -q bss12g16

# This must be 16 always
#BSUB –n 16

# Use the correct g09
export g09root=“/share/apps/gaussian/g09root_amd64_linda”

# linda stuff
export GAUSS_LFLAGS=“-nodefile $LSB_HOSTS -opt Tsnet.Node.lindarsharg: ssh”

Then in your Gaussian .com file …

%mem=12gb
%nprocshared=2
%lindaworkers=bss011,bss012,bss013,bss014,bss015,bss016,bss017,bss018

And that should be it. However, big problem. We’re trying to resolve this with Gaussian support but have not received any feedback. It appears the Linda compilation of Gaussian will throw an error, something like:

‘execfile error; could not locate file’

But you could try it and see if you receive a different response.

4722    adavis0 RUN   bss24      sharptail   bss110      Fry_Lab    Nov 17 11:09
                                             bss110
                                             bss070
                                             bss070
                                             bss080
                                             bss080
                                             bss091
                                             bss091
                                             bss102
                                             bss102
                                             bss072
                                             bss072
                                             bss092
                                             bss092
                                             bss123
                                             bss123
[root@sharptail tmp]# pdsh uptime |  egrep '110|070|080|091|102|072|092|123'

bss070:  10:47:53 up 69 days, 20:10,  0 users,  load average: 0.00, 0.00, 0.00

bss072:  10:47:40 up 69 days, 19:59,  0 users,  load average: 0.00, 0.00, 0.00

bss091:  10:48:04 up 61 days, 20:37,  0 users,  load average: 0.00, 0.00, 0.00

bss092:  10:47:58 up 61 days, 20:28,  0 users,  load average: 0.00, 0.00, 0.00

bss080:  10:47:57 up 69 days,  1:07,  0 users,  load average: 0.00, 0.00, 0.00

bss102:  10:47:53 up 61 days, 19:25,  0 users,  load average: 0.00, 0.00, 0.00

bss123:  10:50:49 up 61 days, 39 min,  0 users,  load average: 0.00, 0.00, 0.00

bss110:  10:49:08 up 61 days,  1:12,  0 users,  load average: 16.51, 16.35, 16.25


Back

cluster/81.txt · Last modified: 2010/12/09 15:46 by hmeij