This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:194 [2020/04/28 13:03] hmeij07 [Update] |
cluster:194 [2020/12/21 17:31] hmeij07 [fndebug] |
||
---|---|---|---|
Line 69: | Line 69: | ||
* **SMB/ | * **SMB/ | ||
* all users have shares but not class accounts (hpc101-hpc200) | * all users have shares but not class accounts (hpc101-hpc200) | ||
+ | |||
+ | Not any more. Serious conflict between NFS and SMB ACLs if both protocols enabled on same dataset. So **nobody** has a samba share. If you want to drop& | ||
+ | |||
+ | --- // | ||
< | < | ||
Line 234: | Line 238: | ||
==== SMB ==== | ==== SMB ==== | ||
- | SMB/CIFS (Samba) shares are also created once the homedir is up. | + | SMB/CIFS (Samba) shares are also created once the homedir is up. NOT! |
+ | |||
+ | * do not mix SMB and NFS on same dataset, not supported | ||
+ | * problems ' | ||
+ | * windows ACLs on top of unix file system = bad | ||
+ | |||
+ | < | ||
+ | |||
+ | # v that plus is the problem | ||
+ | drwxr-xr-x+ 147 root wheel 147 Apr 27 08:17 / | ||
+ | |||
+ | # either use ACL editor to strip off in v13.1-U2 or | ||
+ | |||
+ | setfacl -bn / | ||
+ | |||
+ | followed by for example | ||
+ | |||
+ | find / | ||
+ | |||
+ | # also unsupported via shell | ||
+ | |||
+ | </ | ||
* For each user | * For each user | ||
Line 326: | Line 352: | ||
==== Update ==== | ==== Update ==== | ||
- | Change the Train to 11.3, then you will apply the update first in the WebUI to the passive controller. | + | **Change the Train** to 11.3, then you will apply the update first in the WebUI to the passive controller. |
After its reboots, you will failover to it by **rebooting** the Active controller (the **current** WebUI). | After its reboots, you will failover to it by **rebooting** the Active controller (the **current** WebUI). | ||
Line 338: | Line 364: | ||
Enable HA, click icon | Enable HA, click icon | ||
+ | **Apply Pending Updates** | ||
+ | Upgrades both controllers. Files are downloaded to the Active Controller and then transferred to the Standby Controller. The upgrade process starts concurrently on both TrueNAS Controllers. | ||
+ | Server responds while HA disabled. | ||
+ | |||
+ | Update takes 15 mins in total. | ||
+ | |||
+ | ** 11.3 U5 ** | ||
+ | |||
+ | * Check for Updates, read release Notes, schedule support ticket if needed (major update) | ||
+ | * Apply Pending update, save configuration, | ||
+ | * active/ | ||
+ | * At 100% stanby reboots, HA disables, file system ok | ||
+ | * Check version on standby, Initiate Fail Over (interrupts file system) | ||
+ | * Login, hyou end up on updated, now active server | ||
+ | * Logout/ | ||
+ | * Wait for HA to be enabled, check version on new standby | ||
==== HDD ==== | ==== HDD ==== | ||
Line 363: | Line 405: | ||
exxactcorp | exxactcorp | ||
https:// | https:// | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Logs ==== | ||
+ | |||
+ | From support: | ||
+ | |||
+ | That information is logged via syslog for the opposite controller. For example, to find the information I did here, I looked in the syslog output on the controller that was passive at the time these alerts occurred. | ||
+ | |||
+ | You can look that information up yourself by opening an SSH session to the passive controller, navigating to the / | ||
+ | |||
+ | |||
+ | ==== Split Brain ==== | ||
+ | |||
+ | When ending up with an error fail over state try console shutdown first. If that does not work cut power to controllers. | ||
+ | |||
+ | ==== fndebug ==== | ||
+ | |||
+ | * first log into support, then download teamviewer | ||
+ | * https:// | ||
+ | * get.teamviewer.com/ | ||
+ | |||
+ | |||
+ | **Manual debug file creation**, then ftp to ftp.ixsystems.com | ||
+ | |||
+ | < | ||
+ | |||
+ | freenas-debug -A | ||
+ | tar czvf fndebug-wesleyan-20201123.tar.gz / | ||
+ | |||
+ | # next look at bottom of fndebug/ | ||
+ | /dev/da10 HGST: | ||
+ | /dev/da9 HGST: | ||
+ | # these drives have not failed yet ut have write errors, offline/ | ||
+ | |||
+ | # next look at output of zpool status -x in fndebug/ | ||
+ | # and the error code | ||
+ | # https:// | ||
+ | |||
+ | NAME STATE READ WRITE CKSUM | ||
+ | tank DEGRADED | ||
+ | ... | ||
+ | raidz2-1 | ||
+ | gptid/ | ||
+ | # look for checksums that have failed like this disk in vdev raidz2-1 | ||
+ | |||
+ | # clean up the spare that resilvered (INUSE status) | ||
+ | # then run a clear on the pool. Then we'll try to get another debug. | ||
+ | |||
+ | zpool detach tank gptid/ | ||
+ | zpool clear tank | ||
+ | |||
+ | # that brought all drives back online and the vdevs show | ||
+ | # then via gui added the available drive back as spare | ||
+ | |||
+ | </ | ||
+ | |||
+ | * Monitor the progress of the resilvering operation: 'zpool status -x' | ||
+ | |||
+ | |||
+ | **Replace a failed drive** | ||
+ | |||
+ | * https:// | ||
+ | * drives mentioned above have not failed yet so we must " | ||
+ | |||
+ | < | ||
+ | |||
+ | 1) Go into the Storage > Pools page. Click the Gear icon next to the pool and press the " | ||
+ | 2) Find da4 and press the three-dot options button next to it, then press " | ||
+ | 3) Go to the System > View Enclosure page, select da4 and press " | ||
+ | 4) Physically swap the drive on the rack with its replacement. | ||
+ | 5) Go back to the Storage > Pool > Status page, bring up the options for the removed drive, | ||
+ | 5a) Select member disk from dropdown, and press " | ||
+ | The replacement drive may or may not have been given the name " | ||
+ | 6) Wait for the drive to finish resilvering before proceeding to replace da3. | ||
+ | 6a) Click spinning icon to view progress. Pool status " | ||
+ | Return the drives in original box, return label provided. | ||
</ | </ |