This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:194 [2021/03/25 18:11] hmeij07 [fndebug] |
cluster:194 [2023/08/27 19:55] hmeij07 [Update 13] |
||
---|---|---|---|
Line 151: | Line 151: | ||
</ | </ | ||
+ | |||
+ | ==== certs ==== | ||
+ | |||
+ | * Go to System > CA and certs | ||
+ | * Add a cert | ||
+ | * name is " | ||
+ | * FQDN for hpstore.wesleyan.edu, | ||
+ | * fill in just the basics, no contraints or advance settings | ||
+ | * Add, then view CSR section, copy and provide to inCommon admin | ||
+ | |||
+ | * Once you get email back Available formats: | ||
+ | * as Certificate only, PEM encoded ... download, open in notepad, copy to clipboard | ||
+ | * Go to System > CA and certs | ||
+ | * Select type import a cert (or type internal certificate changes fields) | ||
+ | * paste in public key in certificate field | ||
+ | * paste in private key from CSR | ||
+ | * or Csr checkbox on system (this option) | ||
+ | * System > Genral, switch certs,Save (this will restart web services) | ||
+ | * check in new browser | ||
==== ZFS ==== | ==== ZFS ==== | ||
Line 164: | Line 183: | ||
zfs userspace | zfs userspace | ||
zfs groupspace tank/ | zfs groupspace tank/ | ||
+ | |||
+ | # uttlerly bizarre in v12 these commands change | ||
+ | |||
+ | root@hpcstore2[~]# | ||
+ | hpcstore2% | ||
+ | hpcstore2% zfs get userused@hmeij07 tank/ | ||
+ | NAME | ||
+ | tank/ | ||
+ | hpcstore2% zfs get userquota@hmeij07 tank/ | ||
+ | NAME | ||
+ | tank/ | ||
+ | hpcstore2% zfs get userspace@hmeij07 tank/ | ||
+ | bad property list: invalid property ' | ||
+ | |||
# hpc100 | # hpc100 | ||
Line 324: | Line 357: | ||
==== Snapshots ==== | ==== Snapshots ==== | ||
+ | |||
+ | Snapshots made easier in new releases ... traverse to the hidden directory ''/ | ||
+ | |||
+ | < | ||
+ | |||
+ | 129.133.52.245:/ | ||
+ | 251T | ||
+ | |||
+ | </ | ||
+ | |||
* Daily snapshots, one per day, kept for a year (for now) | * Daily snapshots, one per day, kept for a year (for now) | ||
Line 332: | Line 375: | ||
* check permissions on cloned volume, not windows! | * check permissions on cloned volume, not windows! | ||
* NOTE: once had mnt/ | * NOTE: once had mnt/ | ||
- | * when cloning grant access to 192.168.0.0/ | + | * when cloning grant access to <del>192.168.0.0/ |
* NFS mount, read only | * NFS mount, read only | ||
* maproot '' | * maproot '' | ||
- | * Clone mounted on say '' | + | * Clone mounted on say '' |
* Restore actions by user | * Restore actions by user | ||
* Delete clone when done | * Delete clone when done | ||
Line 388: | Line 431: | ||
* At 100% stanby reboots, HA disables, file system ok | * At 100% stanby reboots, HA disables, file system ok | ||
* Check version on standby, Initiate Fail Over (interrupts file system) | * Check version on standby, Initiate Fail Over (interrupts file system) | ||
- | * Login, | + | * Login, |
- | * Logout/ | + | * Logout/ |
- | * Wait for HA to be enabled, | + | * when old active server boots "Pending Update" confirm to complete |
+ | * check version on new standby | ||
+ | |||
+ | |||
==== HDD ==== | ==== HDD ==== | ||
Line 517: | Line 564: | ||
</ | </ | ||
+ | |||
+ | |||
+ | ==== Console hangs ==== | ||
+ | |||
+ | 12.7 | ||
+ | |||
+ | As for the issue of the " | ||
+ | |||
+ | service middlewared stop\\ | ||
+ | service middlewared start | ||
+ | |||
+ | " | ||
+ | |||
Line 560: | Line 620: | ||
Result: personality switch active vs standby, took 35 mins | Result: personality switch active vs standby, took 35 mins | ||
- | In two months: ZFS feature updates pathch, not interruptive, | + | In two months: ZFS feature updates pathch, not interruptive, |
+ | Upgrade done | ||
+ | --- // | ||
Storage > Pool > " | Storage > Pool > " | ||
+ | ** 12.0-U4.1 ** | ||
+ | |||
+ | * ditto above, see major release upgrade below | ||
+ | * but old active did not come up, reset controller | ||
+ | * click on " | ||
+ | * hmm something about failed to connect failoverscratchdisk? | ||
+ | |||
+ | ** 12.0-U5.1** | ||
+ | |||
+ | * standby reboot 5 mins | ||
+ | * fail over 1 min | ||
+ | * new standby "apply pending updates" | ||
+ | * this version went fine | ||
+ | |||
+ | __Not created/ | ||
+ | While the underlying issues have been fixed, this setting continues to be disabled by default for additional performance investigation. To manually reactivate persistent L2ARC, log in to the TrueNAS Web Interface, go to System > Tunables, and add a new tunable with these values: | ||
+ | < | ||
+ | Type = sysctl | ||
+ | Variable = vfs.zfs.l2arc.rebuild_enabled | ||
+ | Value = 1 | ||
+ | </ | ||
+ | |||
+ | From support: In an HA environment, | ||
+ | |||
+ | ** 12.0-U6 ** | ||
+ | |||
+ | * same as 5.1, went fine, | ||
+ | * new standby reboot 5 mins | ||
+ | |||
+ | |||
+ | ** 12.0-U6.1 ** | ||
+ | |||
+ | * same as 6, went fine, | ||
+ | * little flakiness on failover, apply pending appeared twice | ||
+ | * let it go 10 mins, use ping hostname to test | ||
+ | * new standby reboot 5 mins | ||
+ | |||
+ | ** 12.0-U7 ** | ||
+ | |||
+ | * major OpenZFS update | ||
+ | * same as update 12.0 | ||
+ | * no problems | ||
+ | * cpu was unusually busy before upgrade | ||
+ | * terminated some rsyncs | ||
+ | |||
+ | ** 12.0-U8 ** | ||
+ | |||
+ | * 02/23/2022 | ||
+ | * no problems | ||
+ | |||
+ | ** 12.0-U8.1 ** | ||
+ | |||
+ | * 05/03/2022 | ||
+ | * failover success at 10 mins | ||
+ | * then no Pending box, just a Continue button | ||
+ | * watch console messages, at 17 mins HA enabled | ||
+ | |||
+ | ==== Update 13 ==== | ||
+ | |||
+ | System > Update > Select (new train 13.0-STABLE) | ||
+ | |||
+ | < | ||
+ | |||
+ | # in shell | ||
+ | |||
+ | | ||
+ | |||
+ | | ||
+ | |||
+ | …10%…20%…30%…40%…50%…60%…70%…80%…90%…100% | ||
+ | |||
+ | beadm list | ||
+ | # (Active N = 12.0-U8.1 and R = 13.0-U3.1) | ||
+ | |||
+ | </ | ||
+ | |||
+ | once both have finished, reboot passive, web gui log back in | ||
+ | |||
+ | once passive back up, reboot active | ||
+ | |||
+ | web gui log back into new active, wait for HA to be enabled | ||
+ | |||
+ | debug plus screenshots for snapshot visibility which is visible (working in 13.0-U3.1) but database setting is still invisble | ||
+ | |||
+ | took less than 35 mins | ||
+ | |||
+ | < | ||
+ | |||
+ | bstop 0 | ||
+ | bresume 0 | ||
+ | # manual, one at a time | ||
+ | scontrol hold joblist | ||
+ | # one at a time | ||
+ | # for i in `squeue | grep ' | ||
+ | # then grep ' | ||
+ | scontrol suspend joblist | ||
+ | scontrol resume | ||
+ | scontrol release joblist | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | ** 13.0-U4 | ||
+ | |||
+ | * apply pending update | ||
+ | * 10 mins, standby on new update | ||
+ | * initiate fail over 1 mins | ||
+ | * look for the icon in top bar, moving back and forth | ||
+ | * finish upgrade | ||
+ | * wait for HA to be enabled | ||
+ | * check versions | ||
+ | |||
+ | |||
+ | ** 13.0-U5.1 | ||
+ | |||
+ | * apply pending update | ||
+ | * 10 mins, standby on new update | ||
+ | * initiate fail over 1 mins | ||
+ | * look for the icon in top bar, moving back and forth | ||
+ | * finish upgrade | ||
+ | * wait for HA to be enabled | ||
+ | * check versions | ||
+ | |||
+ | |||
+ | ** 13.0-U5.3 ** 08/25/2023 | ||
+ | |||
+ | * no problems | ||
+ | **Next support ticket: Ask if you ever need to reboot the disk shelves?** Full power off? | ||
+ | Hi, I'm archiving content from my TrueNAS appliance to another platform, then deleting the files migrated. I'm observing directories like this; 7.5 million files in 990 GB or 15 million files in 7 TB. Should I be concerned that the disk shelves have never be cold rebooted? Like XFS replaying the log journal for a clean mount? My HA nodes reboot on upgrade but I realize the disk arrays keep running, always. | ||
+ | Tier 1 Support: The 2 ES24 shelves do not require to be rebooted as they just house the drives themselves and provide power to them. There shouldn' | ||
+ | Rsync stats (after decompressing) | ||
+ | sod1/ | ||
+ | Number of files: 18,691,764 | ||
+ | Total transferred file size: 13072322138140 bytes | ||
+ | arnt_rosetta/ | ||
+ | Number of files: 8, | ||
+ | Total transferred file size: 1, | ||