User Tools

Site Tools


cluster:36

Table of Contents


Home

ok, so this story begins with … i thought i had met my inability to comprehend new technology when i was shown that disks can run multiple raid levels simultaneously. but this multipathing eclipses that. just weird, therefore worth describing.Henk Meij 2007/05/05 14:14

The Problem

Our new NetApp FAS 3050c device Read About It has some new setups. Although we encountered these problems before we were able to fix it by applying a label to the partition filesystem.

Well, this is totally different. Again, i got started on this since gracious Dell installed not one but two HBA cards into the ionode. They must be given us hidden signals. So the HBA cards were connected via fiber cable to network switch #3 and #4, and you guessed it, filer3 and filer 4 are connected respectively.

What's new is that both filers share the same WWPN (World Wide Port Name, so wikipedia isn't always useful …). That means when a linux clients sends a query to the filer, both answer.

For example, i created 3 LUNs on a 26*500 GB disk raid group attached to filer3 only! sda is the local disk. sdb, sdc and sdd are my LUNs. and so are sde, sdf and sdg.

  • That's good because we can use this multipath to route traffic via two fiber channels to the filers.
  • That's bad because we may end up with corruption and confusion.
[root@ionode-1 ~]# fdisk -l

Disk /dev/sda: 79.4 GB, 79456894976 bytes
255 heads, 63 sectors/track, 9660 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1           9       72261   de  Dell Utility
/dev/sda2   *          10        1284    10241437+  83  Linux
/dev/sda3            1285        1794     4096575   82  Linux swap
/dev/sda4            1795        9660    63183645    5  Extended
/dev/sda5            1795        9660    63183613+  83  Linux

Disk /dev/sdb: 1099.5 GB, 1099529453568 bytes
255 heads, 63 sectors/track, 133676 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn't contain a valid partition table

Disk /dev/sdc: 104 MB, 104857600 bytes
4 heads, 50 sectors/track, 1024 cylinders
Units = cylinders of 200 * 512 = 102400 bytes

Disk /dev/sdc doesn't contain a valid partition table

Disk /dev/sdd: 104 MB, 104857600 bytes
4 heads, 50 sectors/track, 1024 cylinders
Units = cylinders of 200 * 512 = 102400 bytes

Disk /dev/sdd doesn't contain a valid partition table

Disk /dev/sde: 1099.5 GB, 1099529453568 bytes
255 heads, 63 sectors/track, 133676 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sde doesn't contain a valid partition table

Disk /dev/sdf: 104 MB, 104857600 bytes
4 heads, 50 sectors/track, 1024 cylinders
Units = cylinders of 200 * 512 = 102400 bytes

Disk /dev/sdf doesn't contain a valid partition table

Disk /dev/sdg: 104 MB, 104857600 bytes
4 heads, 50 sectors/track, 1024 cylinders
Units = cylinders of 200 * 512 = 102400 bytes

Disk /dev/sdg doesn't contain a valid partition table

Multipath LUNs

We install on the ionode the appropriate package, so …

[root@ionode-1 qlafc-linux-8.01.04-3-install]# rpm -q device-mapper
device-mapper-1.02.02-3.0.RHEL4
device-mapper-1.02.02-3.0.RHEL4
 1041  up2date --get device-mapper-multipath
 1043  ls /var/spool/up2date/
=>heck where did it go?
 1044  find / -name 'device-mapper-multipath-0.4.5-21.RHEL4.x86_64.rpm'
/state/partition1/home/install/ftp.rocksclusters.org/pub/rocks/rocks-4.1.1/rocks-dist\
/rolls/updates/4.1.1/x86_64/RedHat/RPMS/device-mapper-multipath-0.4.5-21.RHEL4.x86_64.rpm
=> oh.

Guess OCS patched up2date, so i moved the RPM to /share/apps/filer, and …

[root@ionode-1 filer]# rpm -ivh device-mapper-multipath-0.4.5-21.RHEL4.x86_64.rpm
Preparing...                ########################################### [100%]
   1:device-mapper-multipath########################################### [100%]

NetApp wants to use the HBA vendor drivers, not the Redhat version, after downloading …

[root@ionode-1 filer]# cd qlogic-stuff/
[root@ionode-1 qlogic-stuff]# cd qlafc-linux-8.01.04-3-install/
[root@ionode-1 qlafc-linux-8.01.04-3-install]# ./qlinstall

#*********************************************************#
#           SANsurfer Driver Installer for Linux          #
#               Installer Version:  1.01.00pre6           #
#*********************************************************#

Kernel version: 2.6.9-34.ELsmp
Distribution: Red Hat Enterprise Linux WS release 4 (Nahant Update 3)

Found QLogic Fibre Channel Adapter in the system
    1. QLE2460
Installation will begin for following driver
    1. qla2xxx version: v8.01.04

Preparing...                ##################################################
qla2xxx                     ##################################################

QLA2XXX -- Building the qla2xxx driver...

QLA2XXX -- Installing the qla2xxx modules to
/lib/modules/2.6.9-34.ELsmp/kernel/drivers/scsi/qla2xxx/...

Setting up QLogic HBA API library...
Please make sure the /usr/lib/libqlsdm.so file is not in use.
Installing 32bit api binary for x86_64.
Installing 64bit api binary for x86_64.
Done.

Unloading any loaded drivers
Unloaded module qla2400
Loading module qla2xxx_conf version: v8.01.04....
Loaded module qla2xxx_conf
Loading module qla2xxx version: v8.01.04....
Loaded module qla2xxx
Loading module qla2400 version: v8.01.04....
Loaded module qla2400
Installing scli....
Preparing...                ##################################################
scli                        ##################################################
Installation completed successfully.

Building default persistent binding using SCLI
Info: No devices found on HBA port 0. Skipping target persistent
binding configuration.
Info: No devices found on HBA port 1. Skipping target persistent
binding configuration.

Saved copy of /etc/modprobe.conf as
/usr/src/qlogic/v8.01.04-3/backup/modprobe.conf-2.6.9-34.ELsmp-050407-142853.bak

Saved copy of /boot/initrd-2.6.9-34.ELsmp.img as
/usr/src/qlogic/v8.01.04-3/backup/initrd-2.6.9-34.ELsmp.img-050407-142853.bak

QLA2XXX -- Rebuilding ramdisk image...
Ramdisk created.

Reloading the QLogic FC HBA drivers....
Unloaded module qla2400
Loading module qla2xxx_conf version: v8.01.04....
Loaded module qla2xxx_conf
Loading module qla2xxx version: v8.01.04....
Loaded module qla2xxx
Loading module qla2400 version: v8.01.04....
Loaded module qla2400
tee: ql_device_info: Permission denied

Target Information on all HBAs:
==============================
-----------------------------------------------------------------------------
HBA Port 0 - QLE2460  Port Name: 21-00-00-E0-8B-93-AA-3A Port ID: 00-00-00
-----------------------------------------------------------------------------
Info: The selected adapter has no attached devices (HBA port 0)!
-----------------------------------------------------------------------------
HBA Port 1 - QLE2460  Port Name: 21-00-00-E0-8B-93-AC-57 Port ID: 00-00-00
-----------------------------------------------------------------------------
Info: The selected adapter has no attached devices (HBA port 1)!

#*********************************************************#
#               INSTALLATION SUCCESSFUL!!                 #
#    SANsurfer Driver installation for Linux completed    #
#*********************************************************#

Then load scli and reconfigure HBA cards, then rebuild … <hi #ffff00>follow the netapp documentation and don't forget to commit changes to the cards.</hi>

[root@ionode-1 qlafc-linux-8.01.04-3-install]# ./qlinstall -l qla2400
Loading module qla2400 version: v8.01.04....
Loaded module qla2400
[root@ionode-1 qlafc-linux-8.01.04-3-install]# ./qlinstall -br -in qla2400

Saved copy of /etc/modprobe.conf as
/usr/src/qlogic/v8.01.04-3/backup/modprobe.conf-2.6.9-34.ELsmp-050407-144528.bak

Saved copy of /boot/initrd-2.6.9-34.ELsmp.img as
/usr/src/qlogic/v8.01.04-3/backup/initrd-2.6.9-34.ELsmp.img-050407-144528.bak

QLA2XXX -- Rebuilding ramdisk image...
Ramdisk created.

[root@ionode-1 qlafc-linux-8.01.04-3-install]# chkconfig --add multipathd

[root@ionode-1 qlafc-linux-8.01.04-3-install]# chkconfig multipathd on

[root@ionode-1 qlafc-linux-8.01.04-3-install]# reboot

After the reboot we have not solved our problem. But we do have 2 functioning HBA cards. Some configuration changes have been applied to the cards. One specifically, after how long a time out should a request be routed to the other card (not shown, but detailed in the NetApp documentation, see /share/app/filer).

[root@ionode-1 ~]# /usr/local/bin/scli -i
-----------------------------------------------------------------------------
Host Name                  : ionode-1.local
HBA Model                  : QLE2460
Port                       : 0
Node Name                  : 20-00-00-E0-8B-93-AA-3A
Port Name                  : 21-00-00-E0-8B-93-AA-3A
Port ID                    : 61-16-13
Serial Number              : RFC0644M60202
Driver Version             : 8.01.04
FCode Version              : 1.13
Firmware Version           : 4.00.18
OptionROM BIOS Version     : 1.08
OptionROM FCode Version    : 1.13
OptionROM EFI Version      : 1.02
OptionROM Firmware Version : 4.00.12
Actual Connection Mode     : Point to Point
Actual Data Rate           : 4 Gbps
PortType (Topology)        : FPort
Device Target Count        : 0
HBA Status                 : Online
-----------------------------------------------------------------------------
Host Name                  : ionode-1.local
HBA Model                  : QLE2460
Port                       : 1
Node Name                  : 20-00-00-E0-8B-93-AC-57
Port Name                  : 21-00-00-E0-8B-93-AC-57
Port ID                    : 61-0E-13
Serial Number              : RFC0645M67628
Driver Version             : 8.01.04
FCode Version              : 1.13
Firmware Version           : 4.00.18
OptionROM BIOS Version     : 1.08
OptionROM FCode Version    : 1.13
OptionROM EFI Version      : 1.02
OptionROM Firmware Version : 4.00.12
Actual Connection Mode     : Point to Point
Actual Data Rate           : 4 Gbps
PortType (Topology)        : FPort
Device Target Count        : 4
HBA Status                 : Online
--------------------------------------------------------------------------

The problem.

[root@ionode-1 ~]# sanlun lun show all
  filer:          lun-pathname        device filename  adapter  protocol          lun size         lun state
   filer3:  /vol/cluster_home/users       /dev/sdi         host2    FCP          1.0t (1099529453568)  GOOD
   filer3:  /vol/cluster_home/users       /dev/sdg         host2    FCP          1.0t (1099529453568)  GOOD
   filer3:  /vol/cluster_home/users       /dev/sde         host1    FCP          1.0t (1099529453568)  GOOD
   filer3:  /vol/cluster_home/fstarr      /dev/sdh         host2    FCP          1.0t (1099529453568)  GOOD
   filer3:  /vol/cluster_home/fstarr      /dev/sdj         host2    FCP          1.0t (1099529453568)  GOOD
   filer3:  /vol/cluster_home/fstarr      /dev/sdf         host1    FCP          1.0t (1099529453568)  GOOD
   filer3:  /vol/cluster_scratch/lun0     /dev/sdd         host2    FCP          1.0t (1099529453568)  GOOD
   filer3:  /vol/cluster_scratch/lun0     /dev/sdc         host2    FCP          1.0t (1099529453568)  GOOD
   filer3:  /vol/cluster_scratch/lun0     /dev/sdb         host1    FCP          1.0t (1099529453568)  GOOD



[root@ionode-1 ~]# multipath -v3 -d -ll
#
# all paths :
#
  2:0:2:0 sdb 8:16  [ready] NETAPP  /LUN             /0.2
  2:0:2:1 sdc 8:32  [ready] NETAPP  /LUN             /0.2
  2:0:2:2 sdd 8:48  [ready] NETAPP  /LUN             /0.2
  2:0:3:0 sde 8:64  [ready] NETAPP  /LUN             /0.2
  2:0:3:1 sdf 8:80  [ready] NETAPP  /LUN             /0.2
  2:0:3:2 sdg 8:96  [ready] NETAPP  /LUN             /0.2

Now for some magic. We edit /etc/multipath.conf and …

  • add a definition for the NetApp device
  • blacklist our local disk sda
  • add a parameter specifying when to switch path (based on size of single I/O operation that blocks waiting for it to complete
  • and reboot

Now we see:

params = 1 queue_if_no_path 0 2 1 round-robin 0 1 1 8:112 500 round-robin 0 2 1 8:80 500 8:144 500
status = 1 0 0 2 1 A 0 1 0 8:112 A 0 E 0 2 0 8:80 A 0 8:144 A 0
mpath2 (360a9800043346d375a6f41794a597852)
[size=1024 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 2:0:2:2 sdh 8:112  [active]
\_ round-robin 0 [enabled]
 \_ 1:0:2:2 sdf 8:80   [active]
 \_ 2:0:3:2 sdj 8:144  [active]

params = 1 queue_if_no_path 0 2 1 round-robin 0 1 1 8:96 500 round-robin 0 2 1 8:64 500 8:128 500
status = 1 0 0 2 1 A 0 1 0 8:96 A 0 E 0 2 0 8:64 A 0 8:128 A 0
mpath1 (360a9800043346d375a6f41794a576176)
[size=1024 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 2:0:2:1 sdg 8:96   [active]
\_ round-robin 0 [enabled]
 \_ 1:0:2:1 sde 8:64   [active]
 \_ 2:0:3:1 sdi 8:128  [active]

params = 1 queue_if_no_path 0 2 1 round-robin 0 1 1 8:32 500 round-robin 0 2 1 8:16 500 8:48 500
status = 1 133034 0 2 1 E 0 1 0 8:32 F 1194 E 0 2 0 8:16 F 1195 8:48 F 1194
mpath0 (360a9800043346d375a6f41744c635563)
[size=1024 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [enabled]
 \_ 2:0:2:0 sdc 8:32   [failed]
\_ round-robin 0 [enabled]
 \_ 1:0:2:0 sdb 8:16   [failed]
 \_ 2:0:3:0 sdd 8:48   [failed]

This magic hoopla is achieved because something is creating these devices for us, and the manual states

The /dev/mapper devices are persistent across reboots, but the /dev/sdx devices are not. After a reboot, or a restart of the HBA driver, you might find that different /dev/sdx devices make up a given /dev/mapper device. The /dev/mapper device, however, will always correspond to the same LUN.
[root@ionode-1 ~]# ls -l /dev/mapper/
total 0
crw-------  1 root root  10, 63 May  5 12:29 control
brw-rw----  1 root disk 253,  0 May  5 12:29 mpath0
brw-rw----  1 root disk 253,  1 May  5 12:29 mpath1
brw-rw----  1 root disk 253,  2 May  5 12:29 mpath2

Eureka.

Throw a filesystem on it (no need for a partition table apparently).

[root@ionode-1 ~]# mkfs -t ext3 /dev/mapper/mpath0
mke2fs 1.35 (28-Feb-2004)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
134234112 inodes, 268439808 blocks
13421990 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
8193 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000, 214990848

Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 20 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

And now we can mount and export to other nodes via /etc/fstab and /etc/exportfs.

/dev/mapper/mpath0     /sanscratch    ext3   defaults   0   0
/sanscratch     10.3.1.0/255.255.255.0(rw,sync,no_root_squash)

There are some linux tuneable parameters, look at header in ionode-1:/etc/multipath.conf.

Yea. Done.

And now we have multiple paths from the ionode to the filers (how they work that out is beyond me) and two HBA cards that also will take over from each other.

to rediscover or discover new LUNs

/opt/netapp/santools/qla2xxx_lun_rescan all

make sure the HBA cards have logged into the filer, ssh into filer3

filer3> igroup show
    swallowtail (FCP) (ostype: linux):
        21:00:00:e0:8b:93:ac:57 (logged in on: vtic, 0c)
        21:00:00:e0:8b:93:aa:3a (logged in on: vtic, 0d)

Meij, Henk 2008/03/17 14:42

The bindings (mappings) are stored here /var/lib/multipath/bindings

# Multipath bindings, Version : 1.0
# NOTE: this file is automatically maintained by the multipath program.
# You should not need to edit this file in normal circumstances.
#
# Format:
# alias wwid
#
mpath0 360a9800043346d375a6f41794a597852
mpath1 360a9800043346d375a6f4237427a316d
mpath2 360a9800043346d375a6f423743307771
mpath3 360a9800043346d375a6f423743335673
mpath4 360a9800043346d375a6f423743375578
mpath5 360a9800043346d375a6f423743394379
mpath6 360a9800043346d375a6f423837674970
mpath7 360a9800043346d375a6f4238394f516c
mpath8 360a9800043346d375a6f423839536b52
mpath9 360a9800043346d375a6f423838394b71
mpath10 360a9800043346d375a6f424566792f32
mpath11 360a9800043346d375a6f41794a576176

Oh baby

What next? Oh yea …

  • perform extensive read/writes and fsck
  • disable an HBA card and observe while reading/writing
  • disable a filer (filer3!) and observe if filer4 takes over (without having the raid group local!)

Monday is “play at work” day.


Home

cluster/36.txt · Last modified: 2008/03/17 18:44 (external edit)