This shows you the differences between two versions of the page.
cluster:36 [2008/03/17 18:44] |
cluster:36 [2008/03/17 18:44] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | \\ | ||
+ | **[[cluster: | ||
+ | //ok, so this story begins with ... i thought i had met my inability to comprehend new technology when i was shown that disks can run multiple raid levels simultaneously. but this multipathing eclipses that. just weird, therefore worth describing.// | ||
+ | --- // | ||
+ | |||
+ | ===== The Problem ===== | ||
+ | |||
+ | Our new NetApp FAS 3050c device [[http:// | ||
+ | |||
+ | Well, this is totally different. | ||
+ | |||
+ | What's new is that both filers share the same [[http:// | ||
+ | |||
+ | For example, i created 3 LUNs on a 26*500 GB disk raid group attached to filer3 only! sda is the local disk. sdb, sdc and sdd are my LUNs. and so are sde, sdf and sdg. | ||
+ | |||
+ | * That's good because we can use this multipath to route traffic via two fiber channels to the filers. | ||
+ | * That's bad because we may end up with corruption and confusion. | ||
+ | |||
+ | < | ||
+ | [root@ionode-1 ~]# fdisk -l | ||
+ | |||
+ | Disk /dev/sda: 79.4 GB, 79456894976 bytes | ||
+ | 255 heads, 63 sectors/ | ||
+ | Units = cylinders of 16065 * 512 = 8225280 bytes | ||
+ | |||
+ | | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | |||
+ | Disk /dev/sdb: 1099.5 GB, 1099529453568 bytes | ||
+ | 255 heads, 63 sectors/ | ||
+ | Units = cylinders of 16065 * 512 = 8225280 bytes | ||
+ | |||
+ | Disk /dev/sdb doesn' | ||
+ | |||
+ | Disk /dev/sdc: 104 MB, 104857600 bytes | ||
+ | 4 heads, 50 sectors/ | ||
+ | Units = cylinders of 200 * 512 = 102400 bytes | ||
+ | |||
+ | Disk /dev/sdc doesn' | ||
+ | |||
+ | Disk /dev/sdd: 104 MB, 104857600 bytes | ||
+ | 4 heads, 50 sectors/ | ||
+ | Units = cylinders of 200 * 512 = 102400 bytes | ||
+ | |||
+ | Disk /dev/sdd doesn' | ||
+ | |||
+ | Disk /dev/sde: 1099.5 GB, 1099529453568 bytes | ||
+ | 255 heads, 63 sectors/ | ||
+ | Units = cylinders of 16065 * 512 = 8225280 bytes | ||
+ | |||
+ | Disk /dev/sde doesn' | ||
+ | |||
+ | Disk /dev/sdf: 104 MB, 104857600 bytes | ||
+ | 4 heads, 50 sectors/ | ||
+ | Units = cylinders of 200 * 512 = 102400 bytes | ||
+ | |||
+ | Disk /dev/sdf doesn' | ||
+ | |||
+ | Disk /dev/sdg: 104 MB, 104857600 bytes | ||
+ | 4 heads, 50 sectors/ | ||
+ | Units = cylinders of 200 * 512 = 102400 bytes | ||
+ | |||
+ | Disk /dev/sdg doesn' | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ===== Multipath LUNs ===== | ||
+ | |||
+ | We install on the ionode the appropriate package, so ... | ||
+ | |||
+ | < | ||
+ | [root@ionode-1 qlafc-linux-8.01.04-3-install]# | ||
+ | device-mapper-1.02.02-3.0.RHEL4 | ||
+ | device-mapper-1.02.02-3.0.RHEL4 | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | | ||
+ | | ||
+ | =>heck where did it go? | ||
+ | | ||
+ | / | ||
+ | / | ||
+ | => oh. | ||
+ | </ | ||
+ | |||
+ | Guess OCS patched up2date, so i moved the RPM to / | ||
+ | |||
+ | < | ||
+ | [root@ionode-1 filer]# rpm -ivh device-mapper-multipath-0.4.5-21.RHEL4.x86_64.rpm | ||
+ | Preparing... | ||
+ | | ||
+ | </ | ||
+ | |||
+ | NetApp wants to use the HBA vendor drivers, not the Redhat version, after downloading ... | ||
+ | |||
+ | < | ||
+ | [root@ionode-1 filer]# cd qlogic-stuff/ | ||
+ | [root@ionode-1 qlogic-stuff]# | ||
+ | [root@ionode-1 qlafc-linux-8.01.04-3-install]# | ||
+ | |||
+ | # | ||
+ | # | ||
+ | # | ||
+ | # | ||
+ | |||
+ | Kernel version: 2.6.9-34.ELsmp | ||
+ | Distribution: | ||
+ | |||
+ | Found QLogic Fibre Channel Adapter in the system | ||
+ | 1. QLE2460 | ||
+ | Installation will begin for following driver | ||
+ | 1. qla2xxx version: v8.01.04 | ||
+ | |||
+ | Preparing... | ||
+ | qla2xxx | ||
+ | |||
+ | QLA2XXX -- Building the qla2xxx driver... | ||
+ | |||
+ | QLA2XXX -- Installing the qla2xxx modules to | ||
+ | / | ||
+ | |||
+ | Setting up QLogic HBA API library... | ||
+ | Please make sure the / | ||
+ | Installing 32bit api binary for x86_64. | ||
+ | Installing 64bit api binary for x86_64. | ||
+ | Done. | ||
+ | |||
+ | Unloading any loaded drivers | ||
+ | Unloaded module qla2400 | ||
+ | Loading module qla2xxx_conf version: v8.01.04.... | ||
+ | Loaded module qla2xxx_conf | ||
+ | Loading module qla2xxx version: v8.01.04.... | ||
+ | Loaded module qla2xxx | ||
+ | Loading module qla2400 version: v8.01.04.... | ||
+ | Loaded module qla2400 | ||
+ | Installing scli.... | ||
+ | Preparing... | ||
+ | scli ################################################## | ||
+ | Installation completed successfully. | ||
+ | |||
+ | Building default persistent binding using SCLI | ||
+ | Info: No devices found on HBA port 0. Skipping target persistent | ||
+ | binding configuration. | ||
+ | Info: No devices found on HBA port 1. Skipping target persistent | ||
+ | binding configuration. | ||
+ | |||
+ | Saved copy of / | ||
+ | / | ||
+ | |||
+ | Saved copy of / | ||
+ | / | ||
+ | |||
+ | QLA2XXX -- Rebuilding ramdisk image... | ||
+ | Ramdisk created. | ||
+ | |||
+ | Reloading the QLogic FC HBA drivers.... | ||
+ | Unloaded module qla2400 | ||
+ | Loading module qla2xxx_conf version: v8.01.04.... | ||
+ | Loaded module qla2xxx_conf | ||
+ | Loading module qla2xxx version: v8.01.04.... | ||
+ | Loaded module qla2xxx | ||
+ | Loading module qla2400 version: v8.01.04.... | ||
+ | Loaded module qla2400 | ||
+ | tee: ql_device_info: | ||
+ | |||
+ | Target Information on all HBAs: | ||
+ | ============================== | ||
+ | ----------------------------------------------------------------------------- | ||
+ | HBA Port 0 - QLE2460 | ||
+ | ----------------------------------------------------------------------------- | ||
+ | Info: The selected adapter has no attached devices (HBA port 0)! | ||
+ | ----------------------------------------------------------------------------- | ||
+ | HBA Port 1 - QLE2460 | ||
+ | ----------------------------------------------------------------------------- | ||
+ | Info: The selected adapter has no attached devices (HBA port 1)! | ||
+ | |||
+ | # | ||
+ | # | ||
+ | # SANsurfer Driver installation for Linux completed | ||
+ | # | ||
+ | |||
+ | </ | ||
+ | |||
+ | Then load '' | ||
+ | |||
+ | < | ||
+ | [root@ionode-1 qlafc-linux-8.01.04-3-install]# | ||
+ | Loading module qla2400 version: v8.01.04.... | ||
+ | Loaded module qla2400 | ||
+ | [root@ionode-1 qlafc-linux-8.01.04-3-install]# | ||
+ | |||
+ | Saved copy of / | ||
+ | / | ||
+ | |||
+ | Saved copy of / | ||
+ | / | ||
+ | |||
+ | QLA2XXX -- Rebuilding ramdisk image... | ||
+ | Ramdisk created. | ||
+ | |||
+ | [root@ionode-1 qlafc-linux-8.01.04-3-install]# | ||
+ | |||
+ | [root@ionode-1 qlafc-linux-8.01.04-3-install]# | ||
+ | |||
+ | [root@ionode-1 qlafc-linux-8.01.04-3-install]# | ||
+ | </ | ||
+ | |||
+ | After the reboot we have not solved our problem. But we do have 2 functioning HBA cards. | ||
+ | |||
+ | < | ||
+ | [root@ionode-1 ~]# / | ||
+ | ----------------------------------------------------------------------------- | ||
+ | Host Name : ionode-1.local | ||
+ | HBA Model : QLE2460 | ||
+ | Port : 0 | ||
+ | Node Name : 20-00-00-E0-8B-93-AA-3A | ||
+ | Port Name : 21-00-00-E0-8B-93-AA-3A | ||
+ | Port ID : 61-16-13 | ||
+ | Serial Number | ||
+ | Driver Version | ||
+ | FCode Version | ||
+ | Firmware Version | ||
+ | OptionROM BIOS Version | ||
+ | OptionROM FCode Version | ||
+ | OptionROM EFI Version | ||
+ | OptionROM Firmware Version : 4.00.12 | ||
+ | Actual Connection Mode : Point to Point | ||
+ | Actual Data Rate : 4 Gbps | ||
+ | PortType (Topology) | ||
+ | Device Target Count : 0 | ||
+ | HBA Status | ||
+ | ----------------------------------------------------------------------------- | ||
+ | Host Name : ionode-1.local | ||
+ | HBA Model : QLE2460 | ||
+ | Port : 1 | ||
+ | Node Name : 20-00-00-E0-8B-93-AC-57 | ||
+ | Port Name : 21-00-00-E0-8B-93-AC-57 | ||
+ | Port ID : 61-0E-13 | ||
+ | Serial Number | ||
+ | Driver Version | ||
+ | FCode Version | ||
+ | Firmware Version | ||
+ | OptionROM BIOS Version | ||
+ | OptionROM FCode Version | ||
+ | OptionROM EFI Version | ||
+ | OptionROM Firmware Version : 4.00.12 | ||
+ | Actual Connection Mode : Point to Point | ||
+ | Actual Data Rate : 4 Gbps | ||
+ | PortType (Topology) | ||
+ | Device Target Count : 4 | ||
+ | HBA Status | ||
+ | -------------------------------------------------------------------------- | ||
+ | </ | ||
+ | |||
+ | The problem. | ||
+ | |||
+ | < | ||
+ | [root@ionode-1 ~]# sanlun lun show all | ||
+ | filer: | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | |||
+ | |||
+ | [root@ionode-1 ~]# multipath -v3 -d -ll | ||
+ | # | ||
+ | # all paths : | ||
+ | # | ||
+ | 2:0:2:0 sdb 8:16 [ready] NETAPP | ||
+ | 2:0:2:1 sdc 8:32 [ready] NETAPP | ||
+ | 2:0:2:2 sdd 8:48 [ready] NETAPP | ||
+ | 2:0:3:0 sde 8:64 [ready] NETAPP | ||
+ | 2:0:3:1 sdf 8:80 [ready] NETAPP | ||
+ | 2:0:3:2 sdg 8:96 [ready] NETAPP | ||
+ | </ | ||
+ | |||
+ | Now for some magic. | ||
+ | |||
+ | * add a definition for the NetApp device | ||
+ | * blacklist our local disk sda | ||
+ | * add a parameter specifying when to switch path (based on size of single I/O operation that blocks waiting for it to complete | ||
+ | * and reboot | ||
+ | |||
+ | Now we see: | ||
+ | < | ||
+ | params = 1 queue_if_no_path 0 2 1 round-robin 0 1 1 8:112 500 round-robin 0 2 1 8:80 500 8:144 500 | ||
+ | status = 1 0 0 2 1 A 0 1 0 8:112 A 0 E 0 2 0 8:80 A 0 8:144 A 0 | ||
+ | mpath2 (360a9800043346d375a6f41794a597852) | ||
+ | [size=1024 GB][features=" | ||
+ | \_ round-robin 0 [active] | ||
+ | \_ 2:0:2:2 sdh 8:112 [active] | ||
+ | \_ round-robin 0 [enabled] | ||
+ | \_ 1:0:2:2 sdf 8:80 | ||
+ | \_ 2:0:3:2 sdj 8:144 [active] | ||
+ | |||
+ | params = 1 queue_if_no_path 0 2 1 round-robin 0 1 1 8:96 500 round-robin 0 2 1 8:64 500 8:128 500 | ||
+ | status = 1 0 0 2 1 A 0 1 0 8:96 A 0 E 0 2 0 8:64 A 0 8:128 A 0 | ||
+ | mpath1 (360a9800043346d375a6f41794a576176) | ||
+ | [size=1024 GB][features=" | ||
+ | \_ round-robin 0 [active] | ||
+ | \_ 2:0:2:1 sdg 8:96 | ||
+ | \_ round-robin 0 [enabled] | ||
+ | \_ 1:0:2:1 sde 8:64 | ||
+ | \_ 2:0:3:1 sdi 8:128 [active] | ||
+ | |||
+ | params = 1 queue_if_no_path 0 2 1 round-robin 0 1 1 8:32 500 round-robin 0 2 1 8:16 500 8:48 500 | ||
+ | status = 1 133034 0 2 1 E 0 1 0 8:32 F 1194 E 0 2 0 8:16 F 1195 8:48 F 1194 | ||
+ | mpath0 (360a9800043346d375a6f41744c635563) | ||
+ | [size=1024 GB][features=" | ||
+ | \_ round-robin 0 [enabled] | ||
+ | \_ 2:0:2:0 sdc 8:32 | ||
+ | \_ round-robin 0 [enabled] | ||
+ | \_ 1:0:2:0 sdb 8:16 | ||
+ | \_ 2:0:3:0 sdd 8:48 | ||
+ | </ | ||
+ | |||
+ | |||
+ | This magic hoopla is achieved because something is creating these devices for us, and the manual states | ||
+ | |||
+ | |'' | ||
+ | devices are not. After a reboot, or a restart of the HBA driver, you might find that | ||
+ | different /dev/sdx devices make up a given /dev/mapper device. The /dev/mapper | ||
+ | device, however, will always correspond to the same LUN.'' | ||
+ | |||
+ | < | ||
+ | [root@ionode-1 ~]# ls -l / | ||
+ | total 0 | ||
+ | crw------- | ||
+ | brw-rw---- | ||
+ | brw-rw---- | ||
+ | brw-rw---- | ||
+ | </ | ||
+ | |||
+ | Eureka. | ||
+ | |||
+ | Throw a filesystem on it (no need for a partition table apparently). | ||
+ | |||
+ | < | ||
+ | [root@ionode-1 ~]# mkfs -t ext3 / | ||
+ | mke2fs 1.35 (28-Feb-2004) | ||
+ | Filesystem label= | ||
+ | OS type: Linux | ||
+ | Block size=4096 (log=2) | ||
+ | Fragment size=4096 (log=2) | ||
+ | 134234112 inodes, 268439808 blocks | ||
+ | 13421990 blocks (5.00%) reserved for the super user | ||
+ | First data block=0 | ||
+ | Maximum filesystem blocks=4294967296 | ||
+ | 8193 block groups | ||
+ | 32768 blocks per group, 32768 fragments per group | ||
+ | 16384 inodes per group | ||
+ | Superblock backups stored on blocks: | ||
+ | 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, | ||
+ | 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, | ||
+ | 102400000, 214990848 | ||
+ | |||
+ | Writing inode tables: done | ||
+ | Creating journal (8192 blocks): done | ||
+ | Writing superblocks and filesystem accounting information: | ||
+ | |||
+ | This filesystem will be automatically checked every 20 mounts or | ||
+ | 180 days, whichever comes first. | ||
+ | </ | ||
+ | |||
+ | And now we can mount and export to other nodes via /etc/fstab and / | ||
+ | |||
+ | < | ||
+ | / | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | / | ||
+ | </ | ||
+ | |||
+ | There are some linux tuneable parameters, look at header in ionode-1:/ | ||
+ | |||
+ | Yea. Done. | ||
+ | |||
+ | And now we have multiple paths from the ionode to the filers (how they work that out is beyond me) and two HBA cards that also will take over from each other. | ||
+ | |||
+ | to rediscover or discover new LUNs | ||
+ | |||
+ | < | ||
+ | / | ||
+ | </ | ||
+ | |||
+ | make sure the HBA cards have logged into the filer, ssh into filer3 | ||
+ | |||
+ | < | ||
+ | filer3> igroup show | ||
+ | swallowtail (FCP) (ostype: linux): | ||
+ | 21: | ||
+ | 21: | ||
+ | </ | ||
+ | |||
+ | |||
+ | --- // | ||
+ | |||
+ | The bindings (mappings) are stored here ''/ | ||
+ | |||
+ | < | ||
+ | |||
+ | # Multipath bindings, Version : 1.0 | ||
+ | # NOTE: this file is automatically maintained by the multipath program. | ||
+ | # You should not need to edit this file in normal circumstances. | ||
+ | # | ||
+ | # Format: | ||
+ | # alias wwid | ||
+ | # | ||
+ | mpath0 360a9800043346d375a6f41794a597852 | ||
+ | mpath1 360a9800043346d375a6f4237427a316d | ||
+ | mpath2 360a9800043346d375a6f423743307771 | ||
+ | mpath3 360a9800043346d375a6f423743335673 | ||
+ | mpath4 360a9800043346d375a6f423743375578 | ||
+ | mpath5 360a9800043346d375a6f423743394379 | ||
+ | mpath6 360a9800043346d375a6f423837674970 | ||
+ | mpath7 360a9800043346d375a6f4238394f516c | ||
+ | mpath8 360a9800043346d375a6f423839536b52 | ||
+ | mpath9 360a9800043346d375a6f423838394b71 | ||
+ | mpath10 360a9800043346d375a6f424566792f32 | ||
+ | mpath11 360a9800043346d375a6f41794a576176 | ||
+ | |||
+ | </ | ||
+ | |||
+ | ===== Oh baby ===== | ||
+ | |||
+ | What next? Oh yea ... | ||
+ | |||
+ | * perform extensive read/writes and fsck | ||
+ | * disable an HBA card and observe while reading/ | ||
+ | * disable a filer (filer3!) and observe if filer4 takes over (without having the raid group local!) | ||
+ | |||
+ | Monday is "play at work" day. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | \\ | ||
+ | **[[cluster: |