DokuWiki

from middle march to end june (~3 months) tsm files grew from 4,724,563 to 18,020,855
from middle march to end june (~3 months) tsm volume grew from 1.6 to 2.5 TB (compressed so multiply by 2x for uncompressed volume)
from middle march to end june (~3 months) the filer's disk space usage (4TB) grew from 1.7 to 2.9 TB
on one LUN, rsync took 36 hrs! to build the list of files present before copying started

LUN Sizes

Mounting the cluster_home volume on swallowtail reveals …

Col A: The size of the physical LUNs (command against “lun files” inside volume)
Col B: The size used according to the linux host (local disk usage summary)
Col C: The size used according to the host filer3 (command against mount point)
Col D: The size of the “holes”, that is B-C
Col E: The time it took to rsync one way
Col F: NMAP

F	E	D	C	B	A
	rsync	B-C	df -h	du -hs	ll -h	LUN	Note
na	-	-	-	-	1T	sanscratch	(re-empty)
#10	-	-	104M	17G	1.1T	cusers	(empty)
#9		148	663G	811G	1.1T	rusers	chsu sknauert
#12		253	771G	1.1T	1.1T	rusers2	dbblum
#7	a-i:1hr j-y:7hrs z:6hrs	126	217G	343G	1.1T	rusers3	adavis02 adezieck aminei bkormos dfrohman eaaron ebarnes gng gpetersson imukerji jbodyfelt jfarnham jknee lost+found mlee03 qgu spieniazek vclapa wpringle yminami ztan
#3	-	-	104M	17G	1.1T	rusers4	(empty)
#4	8 hrs	442	324G	766G	1.1T	rusers5	ajbenson alarner lvargaslara sdixit shorowitz skong
#6	-	-	129M	17G	1.1T	rusers6	chemdata
#2	-	18	464G	482G	1.1T	rusers7	abhattachary amoreno ewheatley fstarr gconnors jlocey mspescha vscavera wdai
#5	5 hrs	179	77G	256G	1.1T	rusers8	bstewart
#1	-	64	299G	363G	1.1T	users	(all other users) 36 hrs for rsync to build file inventory, abandoned for now
		1,230G	2,815G	4,045G

test lun was NMAP #8

The LUNs with 104 MB in size as reported by linux host, or 17 GB as reported by host filer3 are the overhead numbers of empty or almost empty LUNs.

The linux host reports 2.8 TB to be used. The filer reports lightly over 4 TB to be used (hence we filled up the volume holding the LUNs). That results in a file system of deleted files, the “holes”, of 1.2 TB that needs to be reclaimed.

Order of Restores

Delete

<hi #ffff00>sanscratch … lun deleted, waiting for reclamation, offlined lun volume</hi>
<hi #ffff00>sanscratch …create nfs dir on filer4, nfs reconfig mount</hi>

Few “holes”

<hi #ffff00>cusers … rsynced over, nfs reconfig, mount</hi>
<hi #ffff00>rusers4 … rsynced over, nfs reconfig, mount </hi>
<hi #ffff00>rusers6 … rsynced over, nfs reconfig, mount </hi>
<hi #ffff00>rusers7 rsyncing over all</hi> nfs reconfig, mount

Many “holes”

<hi #ffff00>users … rsyncing over hpc05,</hi> nfs reconfig, mount
<hi #ffff00>rusers5 … rsynced over, umount, nfs reconfig, mount</hi>
<hi #ffff00>rusers3 … rsynced over, umount, nfs reconfig, mount</hi>
<hi #ffff00>rusers8 … rsynced over, umount, nfs reconfig, mount</hi>
<hi #fa8072>rusers … rsyncing over chsu,</hi> nfs reconfig, mount
<hi #ffff00>rusers2 … rsynced over, nfs reconfig, mount</hi>

Large Home Dirs

This will help with the restore order of the LUNs.

Size	Username
235G	./users/hpc05
663G	./rusers/chsu
771G	./rusers2/dbblum
87G	./rusers3/jbodyfelt
107G	./rusers3/ztan
72G	./rusers5/ajbenson
134G	./rusers5/sdixit
87G	./rusers5/skong
455G	./rusers7/wdai
76G	./rusers8/bstewart

Musings

1,230G & 2,815G & 4,045G

Columns D,C and B from table above. The filer reports the home directory volume is full. The volume is 4 TB space reserved, so yes, it is full at 4,045G used. The linux head node reports that 2.8 TB of data is found for the home dirs. That implies the filer has 1.2 TB of “holes”, areas of the file system not reclaimed. Defragmentation. The rate of file deletion must be tremendous.

tsm total number of files for the cluster grew from 4,724,563 to 18,020,855 in the last 3 months.

Yes, it appears we have a lot of files. The TSM total file count ofcourse includes the one active and the one inactive version (if modified) of each file, or the deleted version only. Deleted files and inactive versions are kept for 30 days. So those are included in the 18 million count.

tsm total volume for the cluster grew from 1.6 to 2.5 TB

The linux head node reports that 2.8 TB of data is on disk. That would be roughly 1.4 TB of compressed TSM data. Wow. So 2.5 minus 1.4 equals 1.1 TB of compressed volume for the inactive and deleted versions of files. That would be the equivalent of 2.2 TB of data if on disk! Almost matches, pointing at a tremendous modification and/or deletion rate of files.
Back

DokuWiki

User Tools

Site Tools

Table of Contents

LUN Sizes

Order of Restores

Large Home Dirs

Musings

Page Tools