This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:194 [2020/03/12 14:00] hmeij07 [ZFS] |
cluster:194 [2021/02/01 15:44] hmeij07 [HA] |
||
---|---|---|---|
Line 5: | Line 5: | ||
Notes. Mainly for me but might be useful/of interest to users. | Notes. Mainly for me but might be useful/of interest to users. | ||
+ | |||
+ | Message: | ||
+ | |||
+ | Our current file server is sharptail.wesleyan.edu which serves out home directories (/home, 10T). A new file server hpcstore.wesleyan.edu will be deployed taking over this function (/zfshomes, 190T). This notice is to inform you your home directory has been cut over. | ||
+ | |||
+ | There are no changes for you. When you log into cottontail or cottontail2 you end up in your new home directory. $HOME and ~username work as usual. The only difference is that your old home was at / | ||
+ | |||
+ | If you wish to load/unload large content from your new home directory please log into hpcstore.wesleyan.edu directly (via ssh/sftp) or preferably use rsync with a bandwidth throttle no larger than " | ||
+ | |||
+ | Details at\\ | ||
+ | https:// | ||
+ | |||
+ | ==== Summary ==== | ||
+ | |||
+ | * **SSH** (sftp/scp) | ||
+ | |||
+ | < | ||
+ | |||
+ | # from outside via VPN | ||
+ | $ ssh hpc21@hpcstore.wesleyan.edu | ||
+ | |||
+ | hpc21@hpcstore.wesleyan.edu' | ||
+ | FreeBSD 11.2-STABLE (TrueNAS.amd64) | ||
+ | (banner snip ...) | ||
+ | Welcome to TrueNAS | ||
+ | |||
+ | # note we ended up on node " | ||
+ | [hpc21@hpcstore2 ~]$ pwd | ||
+ | / | ||
+ | [hpc21@hpcstore2 ~]$ echo $HOME | ||
+ | / | ||
+ | |||
+ | # quota check | ||
+ | [hpc21@hpcstore2 ~]$ zfs userspace tank/ | ||
+ | TYPE NAME USED QUOTA | ||
+ | POSIX User hpc21 | ||
+ | |||
+ | |||
+ | # from inside HPCC with ssh keys properly set up | ||
+ | [hpc21@cottontail ~]$ ssh hpcstore | ||
+ | Last login: Mon Mar 23 10:58:27 2020 from 129.133.52.222 | ||
+ | |||
+ | [hpc21@cottontail ~]$ echo $HOME | ||
+ | / | ||
+ | |||
+ | [hpc21@hpcstore2 ~]$ df -h . | ||
+ | Filesystem | ||
+ | tank/ | ||
+ | |||
+ | </ | ||
+ | |||
+ | * **RSYNC** | ||
+ | |||
+ | < | ||
+ | |||
+ | [hmeij@ThisPC]$ rsync -vac --dry-run --whole-file --bwlimit=4096 | ||
+ | c: | ||
+ | sending incremental file list | ||
+ | ... | ||
+ | |||
+ | </ | ||
+ | |||
+ | * **SMB/ | ||
+ | * all users have shares but not class accounts (hpc101-hpc200) | ||
+ | |||
+ | Not any more. Serious conflict between NFS and SMB ACLs if both protocols enabled on same dataset. So **nobody** has a samba share. If you want to drop& | ||
+ | |||
+ | --- // | ||
+ | |||
+ | < | ||
+ | |||
+ | # windows command line | ||
+ | C: | ||
+ | Enter the password for ' | ||
+ | The command completed successfully. | ||
+ | |||
+ | # or ThisPC > Map Network Drive | ||
+ | \\hpcstore.wesleyan.edu\username | ||
+ | # user is hpcc username, password is hpcc password | ||
+ | |||
+ | </ | ||
==== Consoles ==== | ==== Consoles ==== | ||
- | port 5, 11, web site (with shell) | + | * port 5 |
+ | * set up mac | ||
+ | * plug in pin2usb cable (look for device / | ||
+ | * launch terminal, invoke screen | ||
+ | * screen / | ||
+ | * sysadmin/ | ||
+ | * ifconfig ehto | grep 'inet addr' or | ||
+ | * ipmitool -H 127.0.0.1 -U admin -P admin lan print | ||
+ | * ipmitool -H 127.0.0.1 -U admin -P admin lan set 1 ipaddr ... (etc + netmask + defgw) | ||
+ | * to set initial ips/ | ||
+ | * port 10 (if 12 option netcli boot menu does not show) | ||
+ | * unplug console cable, plug in pin2serial cable | ||
+ | * set up windows laptop, launch hyperterminal, | ||
+ | * 12 menu '' | ||
+ | * port 80->443, web site (with shell of '' | ||
+ | * gui | ||
+ | * shell | ||
+ | * all non-zfs commands are persistent across boots | ||
+ | * except ssh keys and directory permissions | ||
==== HA ==== | ==== HA ==== | ||
Line 16: | Line 115: | ||
Virtual IP '' | Virtual IP '' | ||
+ | Critical for Failover Network Interfaces marked for IGB0 and IGB1 (/zfshomes via NFS) and lagg0 (vlan52) | ||
+ | |||
+ | You can always Disable Failover, to fix power feed of switches 192.168.0.0/ | ||
+ | |||
+ | Check Box to Disable Failover | ||
+ | Go to WebUI > System > Failover > Click the Box > Then Click Save | ||
+ | |||
+ | This will allow you to make your network change without failing over. Then when finished, Enable Failover again. | ||
==== SSH ==== | ==== SSH ==== | ||
- | Allowed for large content transfers using '' | + | Allowed for large content transfers using '' |
TODO: rsync? | TODO: rsync? | ||
- | Home directories are located in ''/ | + | Home directories are located in ''/ |
- | TODO: write script. | + | TODO: write script. |
- | TODO: add disksold sharptail:/ | + | TODO: add disksold sharptail:/ |
- | TODO: backup target | + | TODO: backup target\\ |
< | < | ||
- | # create user, add primary | + | # create user, no new but set primary |
- | # then move all dot files into ~/._nas | + | # set shell, set permissions, some random passwd date +%N with symbols |
- | # copy content over from sharptail | + | # then move all dot files into ~/ |
- | + | # copy content over from sharptail, @hpcstore... | |
- | # SSH keys in place so should be passwordless | + | rsync -ac --bwlimit=4096 --whole-file --stats sharptail:/ |
+ | # SSH keys in place so should be passwordless, test | ||
ssh username@hpcstore.wesleyan.edu | ssh username@hpcstore.wesleyan.edu | ||
Line 44: | Line 152: | ||
==== ZFS ==== | ==== ZFS ==== | ||
+ | |||
+ | * https:// | ||
< | < | ||
Line 76: | Line 186: | ||
tank/ | tank/ | ||
+ | # health | ||
+ | zpool status -v tank | ||
+ | pool: tank | ||
+ | | ||
+ | scan: scrub repaired 0 in 0 days 00:00:02 with 0 errors on Sun Feb 2 03:00:04 2020 | ||
+ | config: | ||
+ | |||
+ | NAME STATE READ WRITE CKSUM | ||
+ | tank ONLINE | ||
+ | raidz2-0 | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | raidz2-1 | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | raidz2-2 | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | gptid/ | ||
+ | logs | ||
+ | gptid/ | ||
+ | cache | ||
+ | gptid/ | ||
+ | spares | ||
+ | gptid/ | ||
+ | |||
+ | errors: No known data errors | ||
</ | </ | ||
+ | |||
==== SMB ==== | ==== SMB ==== | ||
- | SMB/CIFS (Samba) shares are also created once the homedir is up. | + | SMB/CIFS (Samba) shares are also created once the homedir is up. NOT! |
+ | |||
+ | * do not mix SMB and NFS on same dataset, not supported | ||
+ | * problems ' | ||
+ | * windows ACLs on top of unix file system = bad | ||
+ | |||
+ | < | ||
+ | |||
+ | # v that plus is the problem | ||
+ | drwxr-xr-x+ 147 root wheel 147 Apr 27 08:17 / | ||
+ | |||
+ | # either use ACL editor to strip off in v13.1-U2 or | ||
+ | |||
+ | setfacl -bn / | ||
+ | |||
+ | followed by for example | ||
+ | |||
+ | find / | ||
+ | |||
+ | # also unsupported via shell | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | * For each user | ||
+ | * mnt/ | ||
+ | * uncheck default permissions | ||
+ | * valid users = usname, @ugroup(s) | ||
+ | * ''/ | ||
+ | |||
+ | **Note** At user creation a random password is set. Please ask to have it reset to access SMB shares. (there should be some **self-serve password reset** functionality with email confirmation but I cannot find it for now. Any passwords changed outside of database will not be persistent across boots. | ||
< | < | ||
Line 95: | Line 289: | ||
</ | </ | ||
+ | |||
+ | ---- | ||
+ | Change $HOME location in ''/ | ||
+ | **Note** remove access to old $HOME ... chown root:root + chmod o-rwx \\ | ||
+ | END OF USER ACCOUNT SETUP | ||
+ | ---- | ||
+ | |||
+ | |||
+ | ==== NFS ==== | ||
+ | |||
+ | * maproot is needed | ||
+ | * export to both private networks | ||
+ | |||
+ | < | ||
+ | |||
+ | root@hpcstore1[~]# | ||
+ | |||
+ | / | ||
+ | / | ||
+ | |||
+ | / | ||
+ | -maproot=" | ||
+ | / | ||
+ | -maproot=" | ||
+ | |||
+ | |||
+ | </ | ||
==== Rollback ==== | ==== Rollback ==== | ||
Line 134: | Line 355: | ||
/ | / | ||
- | <\code> | + | </code> |
+ | |||
+ | ==== Update ==== | ||
+ | |||
+ | **Change the Train** to 11.3, then you will apply the update first in the WebUI to the passive controller. | ||
+ | |||
+ | After its reboots, you will failover to it by **rebooting** the Active controller (the **current** WebUI). | ||
+ | |||
+ | This will failover to the updated 11.3-U2.1 controller (brief interruption). | ||
+ | |||
+ | From there, you would go to System > Update and do the same for the NEW passive controller. | ||
+ | |||
+ | After that initiate failover back to primary via dashboard (brief interruption). | ||
+ | |||
+ | Enable HA, click icon | ||
+ | |||
+ | **Apply Pending Updates** | ||
+ | Upgrades both controllers. Files are downloaded to the Active Controller and then transferred to the Standby Controller. The upgrade process starts concurrently on both TrueNAS Controllers. | ||
+ | |||
+ | Server responds while HA disabled. | ||
+ | |||
+ | Update takes 15 mins in total. | ||
+ | |||
+ | ** 11.3 U5 ** | ||
+ | |||
+ | * Check for Updates, read release Notes, schedule support ticket if needed (major update) | ||
+ | * Apply Pending update, save configuration, | ||
+ | * active/ | ||
+ | * At 100% stanby reboots, HA disables, file system ok | ||
+ | * Check version on standby, Initiate Fail Over (interrupts file system) | ||
+ | * Login, hyou end up on updated, now active server | ||
+ | * Logout/ | ||
+ | * Wait for HA to be enabled, check version on new standby | ||
+ | ==== HDD ==== | ||
+ | |||
+ | Two types, hard to find in stock. | ||
+ | |||
+ | < | ||
+ | |||
+ | | ||
+ | |||
+ | 8T SAS | ||
+ | da0: <HGST HUS728T8TAL4201 B460> Fixed Direct Access SPC-4 SCSI device | ||
+ | da0: Serial Number VAKM5GTL | ||
+ | da0: 1200.000MB/ | ||
+ | da0: Command Queueing enabled | ||
+ | da0: 7630885MB (1953506646 4096 byte sectors) | ||
+ | exxactcorp | ||
+ | https:// | ||
+ | | ||
+ | 800G SSD | ||
+ | da2: <WDC WUSTR6480ASS201 B925> Fixed Direct Access SPC-5 SCSI device | ||
+ | da2: Serial Number V6V1XGDA | ||
+ | da2: Command Queueing enabled | ||
+ | da2: 763097MB (1562824368 512 byte sectors) | ||
+ | exxactcorp | ||
+ | https:// | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Logs ==== | ||
+ | |||
+ | From support: | ||
+ | |||
+ | That information is logged via syslog for the opposite controller. For example, to find the information I did here, I looked in the syslog output on the controller that was passive at the time these alerts occurred. | ||
+ | |||
+ | You can look that information up yourself by opening an SSH session to the passive controller, navigating to the / | ||
+ | |||
+ | |||
+ | ==== Split Brain ==== | ||
+ | |||
+ | When ending up with an error fail over state try console shutdown first. If that does not work cut power to controllers. | ||
+ | |||
+ | ==== fndebug ==== | ||
+ | |||
+ | * first log into support, then download teamviewer | ||
+ | * https:// | ||
+ | * get.teamviewer.com/ | ||
+ | |||
+ | |||
+ | **Manual debug file creation**, then ftp to ftp.ixsystems.com | ||
+ | |||
+ | < | ||
+ | |||
+ | freenas-debug -A | ||
+ | tar czvf fndebug-wesleyan-20201123.tar.gz / | ||
+ | |||
+ | # next look at bottom of fndebug/ | ||
+ | /dev/da10 HGST: | ||
+ | /dev/da9 HGST: | ||
+ | # these drives have not failed yet ut have write errors, offline/ | ||
+ | |||
+ | # next look at output of zpool status -x in fndebug/ | ||
+ | # and the error code | ||
+ | # https:// | ||
+ | |||
+ | NAME STATE READ WRITE CKSUM | ||
+ | tank DEGRADED | ||
+ | ... | ||
+ | raidz2-1 | ||
+ | gptid/ | ||
+ | # look for checksums that have failed like this disk in vdev raidz2-1 | ||
+ | |||
+ | # clean up the spare that resilvered (INUSE status) | ||
+ | # then run a clear on the pool. Then we'll try to get another debug. | ||
+ | |||
+ | zpool detach tank gptid/ | ||
+ | zpool clear tank | ||
+ | |||
+ | # that brought all drives back online and the vdevs show | ||
+ | # then via gui added the available drive back as spare | ||
+ | |||
+ | </ | ||
+ | |||
+ | * Monitor the progress of the resilvering operation: 'zpool status -x' | ||
+ | |||
+ | |||
+ | **Replace a failed drive** | ||
+ | |||
+ | * https:// | ||
+ | * drives mentioned above have not failed yet so we must " | ||
+ | |||
+ | < | ||
+ | |||
+ | 1) Go into the Storage > Pools page. Click the Gear icon next to the pool and press the " | ||
+ | 2) Find da4 and press the three-dot options button next to it, then press " | ||
+ | 3) Go to the System > View Enclosure page, select da4 and press " | ||
+ | 4) Physically swap the drive on the rack with its replacement. | ||
+ | 5) Go back to the Storage > Pool > Status page, bring up the options for the removed drive, | ||
+ | 5a) Select member disk from dropdown, and press " | ||
+ | The replacement drive may or may not have been given the name " | ||
+ | 6) Wait for the drive to finish resilvering before proceeding to replace da3. | ||
+ | 6a) Click spinning icon to view progress. Pool status " | ||
+ | Return the drives in original box, return label provided. | ||
+ | |||
+ | </ | ||
\\ | \\ |