Date: Mon, 8 Dec 2014 11:14:31 +0100 From: patpro@patpro.net To: freebsd-scsi@freebsd.org Subject: multipath problem: active provider chosen on passive FC path? Message-ID: <23F06C2E-558A-4E68-AD35-B3CD49760DFE@patpro.net>
next in thread | raw e-mail | index | archive | help
Hello, I'm not sure it's the best place to expose my problem, let me know if = another mailing list is recommended. I've installed FreeBSD 9.3 on two HP blade servers (G6), into an HP = C7000 chassis. This chassis uses two Brocade FC switches (active/passive = if I'm not mistaken). The blade servers use QLogic HBA: isp0: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x4000-0x40ff mem = 0xfbff0000-0xfbff3fff irq 30 at device 0.0 on pci6 isp1: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x4400-0x44ff mem = 0xfbfe0000-0xfbfe3fff irq 37 at device 0.1 on pci6 A SAN array presents a dedicated logical unit to each FreeBSD server. On = a given server I see 4 paths to the presented LU that I use to create a = GEOM_MULTIPATH device: (from dmesg) GEOM_MULTIPATH: SPLUNK_1 created GEOM_MULTIPATH: da2 added to SPLUNK_1 GEOM_MULTIPATH: da2 is now active path in SPLUNK_1 GEOM_MULTIPATH: da3 added to SPLUNK_1 GEOM_MULTIPATH: da6 added to SPLUNK_1 GEOM_MULTIPATH: da7 added to SPLUNK_1 # camcontrol devlist | grep VRAID <DGC VRAID 0532> at scbus0 target 2 lun 0 (pass4,da2) <DGC VRAID 0532> at scbus0 target 3 lun 0 (pass5,da3) <DGC VRAID 0532> at scbus1 target 4 lun 0 (pass12,da6) <DGC VRAID 0532> at scbus1 target 5 lun 0 (pass13,da7) # gmultipath status Name Status Components multipath/SPLUNK_1 OPTIMAL da2 (ACTIVE) da3 (PASSIVE) da6 (PASSIVE) da7 (PASSIVE) Unfortunately during boot, and during normal operation, the first = provider (da2 here) seems faulty: isp0: Chan 0 Abort Cmd for N-Port 0x0008 @ Port 0x090a00 (da2:isp0:0:2:0): Command Aborted (da2:isp0:0:2:0): READ(6). CDB: 08 00 03 28 02 00=20 (da2:isp0:0:2:0): CAM status: CCB request aborted by the host (da2:isp0:0:2:0): Retrying command ../.. isp0: Chan 0 Abort Cmd for N-Port 0x0008 @ Port 0x090a00 (da2:isp0:0:2:0): Command Aborted (da2:isp0:0:2:0): WRITE(10). CDB: 2a 00 00 50 20 21 00 00 05 00=20 (da2:isp0:0:2:0): CAM status: CCB request aborted by the host (da2:isp0:0:2:0): Retrying command ../.. Those errors make the boot really slow (10-15 minutes), but the device = is not deactivated. On both servers it's always the first provider of = the multipath device that seems faulty (always the first one on scbus0). = So I guess scbus0 is connected to the passive FC switch. If I use extensively on the multipath device, the faulty provider will = eventually be marked as failed and another one will be marked ACTIVE, = chosen on scbus1. As soon as a provider on scbus1 is marked ACTIVE, the = read/write throughput comes back to expected values. For example, disktnfo(1) shows horrendous performances (+240ms seek = time...) # diskinfo -t /dev/multipath/SPLUNK_1 /dev/multipath/SPLUNK_1 512 # sectorsize 107374181888 # mediasize in bytes (100G) 209715199 # mediasize in sectors 0 # stripesize 0 # stripeoffset 13054 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. CKM00114800912 # Disk ident. Seek times: Full stroke: 250 iter in 1.172849 sec =3D 4.691 msec Half stroke: 250 iter in 2.499101 sec =3D 9.996 msec Quarter stroke: 500 iter in 124.113431 sec =3D 248.227 msec Short forward: 400 iter in 62.483828 sec =3D 156.210 msec Short backward: 400 iter in 62.844187 sec =3D 157.110 msec Seq outer: 2048 iter in 240.999614 sec =3D 117.676 msec Seq inner: 2048 iter in 121.210282 sec =3D 59.185 msec (during this test da2 is marked failed: GEOM_MULTIPATH: Error 5, da2 in SPLUNK_1 marked FAIL=20 GEOM_MULTIPATH: da7 is now active path in SPLUNK_1=20 and the transfer rates test goes well:) Transfer rates: outside: 102400 kbytes in 1.023942 sec =3D 100006 = kbytes/sec middle: 102400 kbytes in 1.104299 sec =3D 92729 = kbytes/sec inside: 102400 kbytes in 1.137533 sec =3D 90019 = kbytes/sec # gmultipath status Name Status Components multipath/SPLUNK_1 DEGRADED da2 (FAIL) da3 (PASSIVE) da6 (PASSIVE) da7 (ACTIVE) Is there any way I can tell GEOM to use an active provider chosen on = scbus1 at boot time? Is there any chance I totally misunderstand the = problem? (other blades in the same chassis are used for ESXi VMware production = for years without any problem, so I guess switches and SAN are correctly = configured) thanks, Patrick --=20 # sysctl -a | grep dev.isp dev.isp.0.%desc: Qlogic ISP 2432 PCI FC-AL Adapter dev.isp.0.%driver: isp dev.isp.0.%location: slot=3D0 function=3D0 handle=3D\_SB_.PCI0.PT07.SLT0 dev.isp.0.%pnpinfo: vendor=3D0x1077 device=3D0x2432 subvendor=3D0x103c = subdevice=3D0x1705 class=3D0x0c0400 dev.isp.0.%parent: pci6 dev.isp.0.wwnn: 5764963215108688473 dev.isp.0.wwpn: 5764963215108688472 dev.isp.0.loop_down_limit: 60 dev.isp.0.gone_device_time: 30 dev.isp.1.%desc: Qlogic ISP 2432 PCI FC-AL Adapter dev.isp.1.%driver: isp dev.isp.1.%location: slot=3D0 function=3D1 handle=3D\_SB_.PCI0.PT07.SLT1 dev.isp.1.%pnpinfo: vendor=3D0x1077 device=3D0x2432 subvendor=3D0x103c = subdevice=3D0x1705 class=3D0x0c0400 dev.isp.1.%parent: pci6 dev.isp.1.wwnn: 5764963215108688475 dev.isp.1.wwpn: 5764963215108688474 dev.isp.1.loop_down_limit: 60 dev.isp.1.gone_device_time: 30
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?23F06C2E-558A-4E68-AD35-B3CD49760DFE>