User Tools

Site Tools


cluster:129

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
cluster:129 [2014/06/18 10:03]
hmeij
cluster:129 [2014/06/18 10:04] (current)
hmeij
Line 7: Line 7:
 When you have one or more jobs running that rely on Gaussian internal checkpoint mechanism, heavy read/write operations may result. ​ That traffic should definitely not hit the /home file system but the /sanscratch file system. ​ That scratch space is also NFS mounted over the Infiniband interconnects (via IPoIB). The result is that this file systems IO operations will also slow our file server down tremendously (even though /sanscratch is a 5 disk Raid 0 setup). When you have one or more jobs running that rely on Gaussian internal checkpoint mechanism, heavy read/write operations may result. ​ That traffic should definitely not hit the /home file system but the /sanscratch file system. ​ That scratch space is also NFS mounted over the Infiniband interconnects (via IPoIB). The result is that this file systems IO operations will also slow our file server down tremendously (even though /sanscratch is a 5 disk Raid 0 setup).
  
-So we've been trying to figure out how to control, or throttle, the IO traffic. Usually the application itself will provide option for this, like rsync'​s ​\-\-bwlimit option, but we've not found anything so far. However from the operating systems'​ point of view we do have a tool available: ionice - get/set program io scheduling class and priority.+So we've been trying to figure out how to control, or throttle, the IO traffic. Usually the application itself will provide option for this, like rsync'​s --bwlimit option, but we've not found anything so far. However from the operating systems'​ point of view we do have a tool available: ionice - get/set program io scheduling class and priority.
  
 So for those that rely on the generation of large Gaussian checkpoint file, please add the following lines to the very top of your submission script: So for those that rely on the generation of large Gaussian checkpoint file, please add the following lines to the very top of your submission script:
Line 18: Line 18:
 </​code>​ </​code>​
  
-This instructs the compute node to schedule the IO traffic with "best effort"​ scheduling class and lowest priority for the scripts process ID and any processes launched from the script. This seems (surprisingly) to have a positive effect on the client'​s IO traffic that hits the NFS mounted filesystem. It has a tremendous positive impact when issued on the file server itself so perhaps a monitor script is needed in the future. ​+This instructs the compute node to schedule the IO traffic with "best effort"​ scheduling class and lowest priority for the scripts process ID and any processes launched from the script. This seems (surprisingly, perhaps because of IpoIB?) to have a positive effect on the client'​s IO traffic that hits the NFS mounted filesystem. It has a tremendous positive impact when issued on the file server itself so perhaps a monitor script is needed in the future. ​
  
  
cluster/129.txt ยท Last modified: 2014/06/18 10:04 by hmeij