This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:226 [2024/10/11 13:17] hmeij07 [populating /zfshomes] |
cluster:226 [2025/03/10 16:34] (current) hmeij07 [Update 13] |
||
---|---|---|---|
Line 36: | Line 36: | ||
* start account creation when replication task is **not** running = in progress | * start account creation when replication task is **not** running = in progress | ||
+ | * hpc100-hpc200 first | ||
+ | * then active accounts only (last year from Q data) | ||
Upon nearing cut over day | Upon nearing cut over day | ||
Line 54: | Line 56: | ||
The target dataset on the receiving system is automatically created in read-only mode to protect the data. To mount or browse the data on the receiving system, create a clone of the snapshot and use the clone. We set IGNORE so should be read/write on M40. Enable SSH on **target** | The target dataset on the receiving system is automatically created in read-only mode to protect the data. To mount or browse the data on the receiving system, create a clone of the snapshot and use the clone. We set IGNORE so should be read/write on M40. Enable SSH on **target** | ||
+ | |||
+ | On **source** System > SSH Keypairs | ||
+ | |||
+ | * name replication | ||
+ | * generate key pair | ||
+ | * save | ||
+ | |||
On **source** System > SSH Connections > Add | On **source** System > SSH Connections > Add | ||
* name replication | * name replication | ||
- | * host IP or FQDN of target | + | * host IP or FQDN of target |
* username root | * username root | ||
- | * generate new key | + | * discover remote ssh key |
On ** source ** Tasks > Replication Tasks | On ** source ** Tasks > Replication Tasks | ||
Line 86: | Line 95: | ||
You could kick this off with Run NOW in in Edit menu of task. | You could kick this off with Run NOW in in Edit menu of task. | ||
+ | |||
+ | **Session with Marc** | ||
+ | |||
+ | * replication and snapshots must be disabled on target | ||
+ | * target host, snapshots can be same as on source | ||
+ | * but possible to set to say 2 weeks via custom | ||
+ | * slowly throttle this back 180 to 120 to 60 to ? | ||
+ | * use IP rather than hostname !! in URL https:// | ||
+ | |||
+ | **Session with Barak** | ||
+ | |||
+ | * postpone update till spring break | ||
+ | * create " | ||
+ | * create " | ||
+ | * switch current replication task to use new ssh connector | ||
+ | * works! when running picks up last 8 missed snapshots | ||
+ | |||
+ | |||
+ | |||
+ | ==== replication and NFS ==== | ||
+ | |||
+ | When M40HA zfshomes is mounted on scratch server for testing via 10.10.0.0/ | ||
+ | |||
+ | < | ||
+ | |||
+ | [root@greentail52 ~]# df -h / | ||
+ | df: ‘/ | ||
+ | [root@greentail52 ~]# umount / | ||
+ | [root@greentail52 ~]# mount / | ||
+ | mount.nfs: access denied by server while mounting 10.10.102.240:/ | ||
+ | [root@greentail52 ~]# echo restarting nfs service on m40ha | ||
+ | restarting nfs service on m40ha | ||
+ | [root@greentail52 ~]# sleep 5m | ||
+ | [root@greentail52 ~]# mount / | ||
+ | [root@greentail52 ~]# df -h / | ||
+ | Filesystem | ||
+ | 10.10.102.240:/ | ||
+ | [root@greent[root@greentail52 ~]# date | ||
+ | Tue Oct 15 11:56:09 EDT 2024 | ||
+ | [root@greentail52 ~]# date | ||
+ | Tue Oct 15 11:56:09 EDT 2024 | ||
+ | [root@greentail52 ~]# df -h / | ||
+ | Filesystem | ||
+ | 10.10.102.240:/ | ||
+ | [root@greentail52 ~]# echo mount ok overnight, re-eanbling replication on x20 | ||
+ | mount ok overnight, re-eanbling replication on x20 | ||
+ | [root@greentail52 ~]# echo replication task, run now | ||
+ | replication task, run now | ||
+ | [root@greentail52 ~]# df -h / | ||
+ | df: ‘/ | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== failover and replication ==== | ||
+ | |||
+ | Testing Failover and assess that rerplication continues (x20ha PUSH to m40ha; make sure both controllers have the authorized_keys for hpcstore1 - add hpcstore2) | ||
+ | |||
+ | * Initiated failover from m40ha controller 2, an error window message pops up | ||
+ | * "Node can not be reached. Node CARPS states do not agree" | ||
+ | * | ||
+ | Yet my web browser shows hpcm40eth0c2 and a minute later hpcm40eth0c1 shows up and HA is enabled. | ||
+ | |||
+ | Replication of snapshots continues ok after failover which was the point of testing. | ||
+ | |||
+ | * Initiated failover again and now back to controller 1 | ||
+ | * Controller 2 shows up a minute later (reboots) | ||
+ | * No error window this time | ||
+ | * Time is wrong on controller 2 ... | ||
+ | * Load IPMI go to configuration, | ||
+ | * that fixes the time | ||
+ | * button selected back to " | ||
+ | |||
+ | Check replication again. Do this one more time before production. | ||
+ | |||
+ | |||
+ | 12.x docs\\ | ||
+ | Failover is not allowed if both TrueNAS controllers have the same CARP state. A critical Alert (page 303) is generated and the HA icon shows HA Unavailable. | ||
+ | |||
Line 102: | Line 189: | ||
* copy in public | * copy in public | ||
* don't click http -> https s0 you don't get locked out | * don't click http -> https s0 you don't get locked out | ||
- | * when cert expires on you, just access https:// | + | * when cert expires on you, just access https |
+ | |||
+ | ==== Update 13 ==== | ||
+ | |||
+ | | ||
+ | |||
+ | * apply pending update | ||
+ | * 10 mins, standby on new update | ||
+ | * initiate fail over on standby; 1 mins | ||
+ | * look for the icon in top bar, moving back and forth | ||
+ | * Pending update > Continue (3 mins in) finish upgrade | ||
+ | * wait for HA to be enabled (about 10 mins in) | ||
+ | * check versions | ||
+ | |||
+ | | ||
+ | |||
+ | * 10 mins download and standby reboot | ||
+ | * 1 min fail over | ||
+ | * 8 1/2 min standby reboot | ||
+ | * check HA and versions | ||
+ | |||
+ | | ||
+ | |||
+ | * no problems | ||
+ | |||
+ | | ||
+ | |||
+ | * no problems | ||
+ | * standby reboot took a little longer, about 12 mins. | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |