This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:194 [2020/11/23 16:37] hmeij07 [fndebug] |
cluster:194 [2021/12/13 16:25] hmeij07 [Update 12] |
||
---|---|---|---|
Line 115: | Line 115: | ||
Virtual IP '' | Virtual IP '' | ||
- | An update goes like this and is not an interruption. Check for and apply updates. They are applied to partner | + | Critical |
+ | |||
+ | You can always Disable Failover, to fix power feed of switches 192.168.0.0/16 or 10.10.0.0/ | ||
+ | |||
+ | Check Box to Disable Failover | ||
+ | Go to WebUI > System > Failover > Click the Box > Then Click Save (leave default controller setting as is) | ||
+ | |||
+ | This will allow you to make your network change without failing over. | ||
==== SSH ==== | ==== SSH ==== | ||
Line 156: | Line 164: | ||
zfs userspace | zfs userspace | ||
zfs groupspace tank/ | zfs groupspace tank/ | ||
+ | |||
+ | # uttlerly bizarre in v12 these commands change | ||
+ | |||
+ | root@hpcstore2[~]# | ||
+ | hpcstore2% | ||
+ | hpcstore2% zfs get userused@hmeij07 tank/ | ||
+ | NAME | ||
+ | tank/ | ||
+ | hpcstore2% zfs get userquota@hmeij07 tank/ | ||
+ | NAME | ||
+ | tank/ | ||
+ | hpcstore2% zfs get userspace@hmeij07 tank/ | ||
+ | bad property list: invalid property ' | ||
+ | |||
# hpc100 | # hpc100 | ||
Line 324: | Line 346: | ||
* check permissions on cloned volume, not windows! | * check permissions on cloned volume, not windows! | ||
* NOTE: once had mnt/ | * NOTE: once had mnt/ | ||
- | * when cloning grant access to 192.168.0.0/ | + | * when cloning grant access to <del>192.168.0.0/ |
* NFS mount, read only | * NFS mount, read only | ||
* maproot '' | * maproot '' | ||
- | * Clone mounted on say '' | + | * Clone mounted on say '' |
* Restore actions by user | * Restore actions by user | ||
* Delete clone when done | * Delete clone when done | ||
Line 350: | Line 372: | ||
</ | </ | ||
- | ==== Update ==== | + | ==== Update |
+ | |||
+ | See Update 12 for manual update to v12 with Anthony on 03.09.2021 | ||
**Change the Train** to 11.3, then you will apply the update first in the WebUI to the passive controller. | **Change the Train** to 11.3, then you will apply the update first in the WebUI to the passive controller. | ||
Line 378: | Line 402: | ||
* At 100% stanby reboots, HA disables, file system ok | * At 100% stanby reboots, HA disables, file system ok | ||
* Check version on standby, Initiate Fail Over (interrupts file system) | * Check version on standby, Initiate Fail Over (interrupts file system) | ||
- | * Login, | + | * Login, |
- | * Logout/ | + | * Logout/ |
- | * Wait for HA to be enabled, | + | * when old active server boots "Pending Update" confirm to complete |
+ | * check version on new standby | ||
+ | |||
+ | |||
==== HDD ==== | ==== HDD ==== | ||
Line 438: | Line 466: | ||
/dev/da10 HGST: | /dev/da10 HGST: | ||
/dev/da9 HGST: | /dev/da9 HGST: | ||
- | # these drives have not failed yet ut have write errors, offline/ | + | # these drives have not failed yet but have write errors, offline/ |
# next look at output of zpool status -x in fndebug/ | # next look at output of zpool status -x in fndebug/ | ||
Line 458: | Line 486: | ||
# that brought all drives back online and the vdevs show | # that brought all drives back online and the vdevs show | ||
+ | # then via gui added the available drive back as spare | ||
</ | </ | ||
Line 465: | Line 494: | ||
**Replace a failed drive** | **Replace a failed drive** | ||
+ | |||
+ | * https:// | ||
+ | * drives mentioned above have not failed yet so we must " | ||
< | < | ||
1) Go into the Storage > Pools page. Click the Gear icon next to the pool and press the " | 1) Go into the Storage > Pools page. Click the Gear icon next to the pool and press the " | ||
- | 2) Find da10 and press the three-dot options button next to it, then press " | + | 2) Find da4 and press the three-dot options button next to it, then press " |
- | 3) Go to the System > View Enclosure page, select | + | 3) Go to the System > View Enclosure page, select |
4) Physically swap the drive on the rack with its replacement. | 4) Physically swap the drive on the rack with its replacement. | ||
- | 5) Go back to the Storage > Pool > Status page, bring up the options for the removed drive, and press " | + | 5) Go back to the Storage > Pool > Status page, bring up the options for the removed drive, |
- | 6) Wait for the drive to finish resilvering before proceeding to replace | + | 5a) Select member disk from dropdown, and press " |
+ | The replacement drive may or may not have been given the name "da4". | ||
+ | 6) Wait for the drive to finish resilvering before proceeding to replace | ||
+ | 6a) Click spinning icon to view progress. Pool status " | ||
+ | Return the drives in original box, return label provided. | ||
</ | </ | ||
+ | |||
+ | ** Pool Unhealthy but not Degraded status** | ||
+ | |||
+ | No failed disks, no deploy of spare, but pool unhealthy. | ||
+ | |||
+ | < | ||
+ | |||
+ | Mar 21 04:03:57 hpcstore2 (da11: | ||
+ | Mar 21 04:03:57 hpcstore2 (da11: | ||
+ | Mar 21 04:03:57 hpcstore2 (da11: | ||
+ | Mar 21 04:03:57 hpcstore2 (da11: | ||
+ | Mar 21 04:03:57 hpcstore2 (da11: | ||
+ | Mar 21 04:03:57 hpcstore2 (da11: | ||
+ | |||
+ | 1) Storage > Pools. Click gear icon next to the pool and press the " | ||
+ | 2) Find da11 and press the three-dot options button next to it, then press " | ||
+ | 3) System > View Enclosure, find& | ||
+ | 4) Physically swap the drive on the rack with its replacement. | ||
+ | 5) Storage > Pool > Status page, bring up three-dot options for the removed drive, | ||
+ | 5a) Select member disk from drop down, and press " | ||
+ | 6) Wait till resilver finishes. | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | ==== Update 12 ==== | ||
+ | |||
+ | System > Update > Select (new train 12.0-STABLE) | ||
+ | |||
+ | ** Open a console on both controllers without double ssh sessions, directly to hpcstore1/ | ||
+ | |||
+ | '' | ||
+ | '' | ||
+ | |||
+ | Then download updates on passive, check version '' | ||
+ | |||
+ | '' | ||
+ | |||
+ | '' | ||
+ | |||
+ | ...10%...20%...30%...40%...50%...60%...70%...80%...90%...100% | ||
+ | |||
+ | reboot passive | ||
+ | |||
+ | from active ping passive heartbeat IP, when up | ||
+ | |||
+ | check version passive | ||
+ | |||
+ | check boot env '' | ||
+ | |||
+ | on passive '' | ||
+ | |||
+ | now force fail over via GUI (interruptive for 6o seconds) | ||
+ | |||
+ | Anthony did a reboot on active instead, watch log for personality swap | ||
+ | |||
+ | then update the new passive | ||
+ | |||
+ | '' | ||
+ | |||
+ | '' | ||
+ | |||
+ | then check version, reboot new passive, check version, become new standby | ||
+ | |||
+ | Result: personality switch active vs standby, took 35 mins | ||
+ | |||
+ | In two months: ZFS feature updates pathch, not interruptive, | ||
+ | Upgrade done | ||
+ | --- // | ||
+ | |||
+ | Storage > Pool > " | ||
+ | |||
+ | ** 12.0-U4.1 ** | ||
+ | |||
+ | * ditto above, see major release upgrade below | ||
+ | * but old active did not come up, reset controller | ||
+ | * click on " | ||
+ | * hmm something about failed to connect failoverscratchdisk? | ||
+ | |||
+ | ** 12.0-U5.1** | ||
+ | |||
+ | * standby reboot 5 mins | ||
+ | * fail over 1 min | ||
+ | * new standby "apply pending updates" | ||
+ | * this version went fine | ||
+ | |||
+ | __Not created/ | ||
+ | While the underlying issues have been fixed, this setting continues to be disabled by default for additional performance investigation. To manually reactivate persistent L2ARC, log in to the TrueNAS Web Interface, go to System > Tunables, and add a new tunable with these values: | ||
+ | < | ||
+ | Type = sysctl | ||
+ | Variable = vfs.zfs.l2arc.rebuild_enabled | ||
+ | Value = 1 | ||
+ | </ | ||
+ | |||
+ | From support: In an HA environment, | ||
+ | |||
+ | ** 12.0-U6 ** | ||
+ | |||
+ | * same as 5.1, went fine, | ||
+ | * new standby reboot 5 mins | ||
+ | |||
+ | |||
+ | ** 12.0-U6.1 ** | ||
+ | |||
+ | * same as 6, went fine, | ||
+ | * little flakiness on failover, apply pending appeared twice | ||
+ | * let it go 10 mins, use ping hostname to test | ||
+ | * new standby reboot 5 mins | ||
+ | |||
+ | ** 12.0-U7 ** | ||
+ | |||
+ | * major OpenZFS update | ||
+ | * same as update 12.0 | ||
+ | * no problems | ||
+ | * cpu was unusually busy before upgrade | ||
+ | * terminated some rsyncs | ||
+ | |||
+ | |||
\\ | \\ | ||
**[[cluster: | **[[cluster: | ||