Differences

This shows you the differences between two versions of the page.

--- cluster:226 [2024/10/11 13:17]
hmeij07 [populating /zfshomes]
+++ cluster:226 [2025/03/10 16:34] (current)
hmeij07 [Update 13]
@@ Line 36: / Line 36: @@
   * start account creation when replication task is **not** running = in progress
+    * hpc100-hpc200 first
+    * then active accounts only (last year from Q data)
 Upon nearing cut over day
@@ Line 54: / Line 56: @@
 The target dataset on the receiving system is automatically created in read-only mode to protect the data. To mount or browse the data on the receiving system, create a clone of the snapshot and use the clone. We set IGNORE so should be read/write on M40.  Enable SSH on **target**
+On **source** System > SSH Keypairs
+  * name replication
+  * generate key pair
+  * save
 On **source** System > SSH Connections > Add
   * name replication
-  * host IP or FQDN of target
+  * host IP or FQDN of target (select Manual)
   * username root
-  * generate new key
+  * discover remote ssh key
 On ** source ** Tasks > Replication Tasks
@@ Line 86: / Line 95: @@
 You could kick this off with Run NOW in in Edit menu of task.
+**Session with Marc**
+  * replication and snapshots must be disabled on target
+  * target host, snapshots can be same as on source
+  * but possible to set to say 2 weeks via custom
+    * slowly throttle this back 180 to 120 to 60 to ?
+  * use IP rather than hostname !! in URL https://129.133.52.245
+**Session with Barak**
+  * postpone update till spring break
+  * create "zfshomes-c1toc1 Key", paste in private key
+  * create "zfshomes-c1toc1" ssh connection (use ip above, check discover status)
+  * switch current replication task to use new ssh connector
+  * works! when running picks up last 8 missed snapshots
+==== replication and NFS ====
+When M40HA zfshomes is mounted on scratch server for testing via 10.10.0.0/16, the mount becomes unavailable when replication kicks off on X20HA with target M40HA. That's a problem. So on cut over day stop PUSH replication and Periodic Snapshots on X20HA. After all mounts have been switched to M40HA "zfshomes" configure Periodic Snapshots on M40HA. Then configure Periodic Snapshots on m40HA and PUSSh to X20HA.
+<code>
+[root@greentail52 ~]# df -h /mnt/m40/zfshomes
+df: ‘/mnt/m40/zfshomes’: Input/output error
+[root@greentail52 ~]# umount /mnt/m40/zfshomes
+[root@greentail52 ~]# mount /mnt/m40/zfshomes
+mount.nfs: access denied by server while mounting 10.10.102.240:/mnt/tank/zfshomes
+[root@greentail52 ~]# echo restarting nfs service on m40ha
+restarting nfs service on m40ha
+[root@greentail52 ~]# sleep 5m
+[root@greentail52 ~]# mount /mnt/m40/zfshomes
+[root@greentail52 ~]# df -h /mnt/m40/zfshomes
+Filesystem                        Size  Used Avail Use% Mounted on
+.10.102.240:/mnt/tank/zfshomes  435T  131T  305T  31% /mnt/m40/zfshomes
+[root@greent[root@greentail52 ~]# date
+Tue Oct 15 11:56:09 EDT 2024
+[root@greentail52 ~]# date
+Tue Oct 15 11:56:09 EDT 2024
+[root@greentail52 ~]# df -h /mnt/m40/zfshomes/
+Filesystem                        Size  Used Avail Use% Mounted on
+.10.102.240:/mnt/tank/zfshomes  435T  131T  305T  31% /mnt/m40/zfshomes
+[root@greentail52 ~]# echo mount ok overnight, re-eanbling replication on x20
+mount ok overnight, re-eanbling replication on x20
+[root@greentail52 ~]# echo replication task, run now
+replication task, run now
+[root@greentail52 ~]# df -h /mnt/m40/zfshomes/
+df: ‘/mnt/m40/zfshomes/’: Stale file handle
+</code>
+==== failover and replication ====
+Testing Failover and assess that rerplication continues (x20ha PUSH to m40ha; make sure both controllers have the authorized_keys for hpcstore1 - add hpcstore2)
+  * Initiated failover from m40ha controller 2, an error window message pops up
+  * "Node can not be reached. Node CARPS states do not agree"
+  *
+Yet my web browser shows hpcm40eth0c2 and a minute later hpcm40eth0c1 shows up and HA is enabled.
+Replication of snapshots continues ok after failover which was the point of testing.
+  * Initiated failover again and now back to controller 1
+  * Controller 2 shows up a minute later (reboots)
+  * No error window this time
+  * Time is wrong on controller 2 ...
+  * Load IPMI go to configuration, date and time, enable NTP, refresh
+    * that fixes the time
+    * button selected back to "disabled"
+Check replication again. Do this one more time before production.
+.x docs\\
+Failover is not allowed if both TrueNAS controllers have the same CARP state. A critical Alert (page 303) is generated and the HA icon shows HA Unavailable.
@@ Line 102: / Line 189: @@
     * copy in public
   * don't click http -> https s0 you don't get locked out
-  * when cert expires on you, just access https://
+  * when cert expires on you, just access https
+==== Update 13 ====
+.0-U6.3 11/22/2024
+  * apply pending update
+  * 10 mins, standby on new update
+  * initiate fail over on standby; 1 mins
+  * look for the icon in top bar, moving back and forth
+  * Pending update > Continue (3 mins in) finish upgrade
+  * wait for HA to be enabled (about 10 mins in)
+  * check versions
+.0-U6.3 11/22/2024
+  * 10 mins download and standby reboot
+  * 1 min fail over
+  * 8 1/2 min standby reboot
+  * check HA and versions
+.0-U6.4 01/22/2025
+  * no problems
+.0-U6.7 03/10/2025
+  * no problems
+  * standby reboot took a little longer, about 12 mins.
 \\
 **[[cluster:0|Back]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools