Date: Tue, 13 Apr 2010 10:49:08 -0600 From: Brad Waite <freebsd@wcubed.net> To: Gary Palmer <gpalmer@freebsd.org> Cc: freebsd-scsi@freebsd.org Subject: Re: QLogic 2360 FC HBAs not playing well with others Message-ID: <4BC4A084.7050906@wcubed.net> In-Reply-To: <20100412034937.GA24680@in-addr.com> References: <4BC280EE.5090202@wcubed.net> <20100412034937.GA24680@in-addr.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Gary Palmer wrote: > On Sun, Apr 11, 2010 at 08:09:50PM -0600, Brad Waite wrote: >>>> Matthew Jacob wrote: >>>>> On 04/09/2010 11:29 AM, Brad Waite wrote: >>>>> I beseech you, oh great masters of SCSI and fibre channel, hear my >>> pleas >>>>> for help! >>>>> >>>>> My 2 QLE2360s don't appear to be waking up properly in a Dell R710 >>>>> running 7.2 AMD64. At the very least, they're not recognizing any of >>>>> the volumes on the Sun 2540 array in the fabric. Everything works just >>>>> fine under VMware ESXi 4.1, though. >>>>> >>>> Get newer firmware either by upgrading with RELENG_7 or snagging >>>> asm_2300.h from RELENG_7 and rebuilding. >>>> >>>> You don't have to load all of ispfw >>>> >>>> isp2300_LOAD=YES >>>> >>>> should get you just that onemodule >>>> >>>> the latest in the FreeBSD tree is 3.03.26 >>> Woot. That helped. Built & installed RELENG_7, but I've got some >>> more wierdness. >> Woot. That helped. >> >> Built & installed RELENG_7, but I've got some more wierdness. >> >> First off I've got da0 - da15 showing similar to this: >> >> da0 at isp0 bus 0 target 0 lun 0 >> da0: <SUN LCSM100_F 0670> Fixed Direct Access SCSI-5 device >> da0: 200.000MB/s transfers >> da0: Command Queueing Enabled >> da0: 138989MB (284650656 512 byte sectors: 255H 63S/T 17718C) >> >> We've got a Sun Storagetek 2540 12-drive array with 4 volumes mapped to >> this host. It would appear that it's showing the 4 volumes AND each of >> the 12 drives. Is that normal? >> >> Next, I have about 20 of the following errors for each of da1, da2, da3, >> da4, da9, da10, da11 & da12. >> >> (da1:isp0:0:0:1): READ(6)/WRITE(6) not supported, increasing >> minimum_cmd_size to 10. >> (da1:isp0:0:0:1): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0 >> (da1:isp0:0:0:1): CAM Status: SCSI Status Error >> (da1:isp0:0:0:1): SCSI Status: Check Condition >> (da1:isp0:0:0:1): ILLEGAL REQUEST asc:94,1 >> (da1:isp0:0:0:1): Vendor Specific ASC >> (da1:isp0:0:0:1): Unretryable error >> >> What's going on here? >> >> Is there any config I need to to for volume mapping and/or >> multipathing? I'm a complete newb when it comes to FC on FreeBSD, so >> forgive my ignorance. >> >> Thanks for the help, guys! > > I suspect the reason you have 16 disk devices showing up is that you > are running multipath. You will get one da device showing up for each > different path, and if you're running a full multipath environment > that's likely 4 paths per device, which would lead to the 16 disks > (unless they're not the sizes you expect, but I would tend to suspect > its a multipath artifact) Thanks for pointing out what should have been obvious. The Sun 2540 has 2 ports on 2 controllers and camcontrol shows exactly that: # camcontrol devlist <SUN LCSM100_F 0670> at scbus0 target 0 lun 0 (da0,pass0) <SUN LCSM100_F 0670> at scbus0 target 0 lun 1 (da1,pass1) <SUN LCSM100_F 0670> at scbus0 target 0 lun 2 (da2,pass2) <SUN LCSM100_F 0670> at scbus0 target 0 lun 3 (da3,pass3) <SUN LCSM100_F 0670> at scbus0 target 1 lun 0 (da4,pass4) <SUN LCSM100_F 0670> at scbus0 target 1 lun 1 (da5,pass5) <SUN LCSM100_F 0670> at scbus0 target 1 lun 2 (da6,pass6) <SUN LCSM100_F 0670> at scbus0 target 1 lun 3 (da7,pass7) <SUN LCSM100_F 0670> at scbus1 target 0 lun 0 (da8,pass8) <SUN LCSM100_F 0670> at scbus1 target 0 lun 1 (da9,pass9) <SUN LCSM100_F 0670> at scbus1 target 0 lun 2 (da10,pass10) <SUN LCSM100_F 0670> at scbus1 target 0 lun 3 (da11,pass11) <SUN LCSM100_F 0670> at scbus1 target 1 lun 0 (da12,pass12) <SUN LCSM100_F 0670> at scbus1 target 1 lun 1 (da13,pass13) <SUN LCSM100_F 0670> at scbus1 target 1 lun 2 (da14,pass14) <SUN LCSM100_F 0670> at scbus1 target 1 lun 3 (da15,pass15) > To handle multipath you probably want to look at gmultipath(8). > > I'm not sure about READ/WRITE errors. You say they show up for 8 > devices? Is it possible that the array is not true active/active > on the controllers? Its possible that half the paths are going to > a controller that is rejecting the I/O until the LUN fails over, > but thats just a guess based on the error message. If you can > look at the controller/bus/target/lun information from dmesg and > see if you can spot a pattern about the path to the LUNs giving > the error that may give a better idea about whats going on. I think you've nailed it. da4-7 & da12-15 have the following respective lines in dmesg: da[4-7]: 200.000MB/s transfers WWNN 0x200400a0b8388efd WWPN 0x203500a0b8388efd PortID 0x10100 da[12-15]: 200.000MB/s transfers WWNN 0x200400a0b8388efd WWPN 0x202500a0b8388efd PortID 0x10100 The two WWPNs correspond to the 2540's controllers and the write errors are on da0-3 & da8-11. I can't find anything yet in the docs on making the other ports active, but I successfully labeled da7 & da15 with gmultipath, although I couldn't add da3 & da11 due to write errors. No real surprise, but since I can't add the label, what happens if one of the active ports on a controller fails? I know I'd have the other path to the active port on the other controller, but would I have to manually add the label to the volumes from newly-active port?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4BC4A084.7050906>