Date: Mon, 12 May 2008 12:14:04 -0500 From: Graham Allan <allan@physics.umn.edu> To: Alexander Sack <pisymbol@gmail.com> Cc: freebsd-scsi@freebsd.org Subject: Re: Hang on boot in isp with QLA2342 after upgrading to 6.3 Message-ID: <20080512171404.GE25577@physics.umn.edu> In-Reply-To: <3c0b01820805120919s7c8d5249xf5dd62934c113506@mail.gmail.com> References: <20080509011028.GV25577@physics.umn.edu> <20080509215621.GX25577@physics.umn.edu> <482646B5.807@miralink.com> <482760D0.1070106@physics.umn.edu> <48276560.30302@miralink.com> <4827AD9F.50202@physics.umn.edu> <3c0b01820805120919s7c8d5249xf5dd62934c113506@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, May 12, 2008 at 12:19:49PM -0400, Alexander Sack wrote: > > Graham, from the driver error messages it seems that the card believes > you are on a switched fabric and that it most likely is logging into > the SNS server to lookup names/addresses for your devices. Are you > sure that your switched fabric is setup correctly? I missed part of > this thread so I apologize if this topic has already been hashed out. > If for some reason the host can not log into the SNS server and > retrieve entries from the database, then you are going to be hosed (I > agree the OS shouldn't be hung unless you are booting off the disk > connected to the failed controller, etc.). > > I am very familiar with the ISP23/4xx chipset and I go digging more > but I was wondering if you have verified that your topology is valid. I'm happy to confess to being a SAN novice, so I'm not quite sure how I would verify that, other than that it "seems to work" ok on the older OS release, and also in specific circumstances on the current one - for example, if one port of the HBA is connected directly to a device, and the other to the fabric, it doesn't have a problem - so in that situation it is able to log in to the fabric ok and retrieve database information. Even when it does hang, it does appear to have logged in to the fabric ok, according to my interpretation of the switch output: fcswitch_s43_2:admin> portshow 8 portName: portHealth: No License Authentication: None portFlags: 0x223805b portLbMod: 0x0 PRESENT ACTIVE F_PORT G_PORT U_PORT LOGIN NOELP LED ACCEPT WAS_EPORT portType: 4.1 portState: 1 Online portPhys: 6 In_Sync portScn: 6 F_Port portRegs: 0x81100000 portData: 0x11deb230 portId: 031800 portWwn: 20:08:00:60:69:51:4a:20 portWwn of device(s) connected: 21:00:00:e0:8b:08:06:d2 Distance: normal Speed: N2Gbps Interrupts: 20487 Link_failure: 18 Frjt: 0 Unknown: 404 Loss_of_sync: 12295 Fbsy: 0 Lli: 13715 Loss_of_sig: 93 Proc_rqrd: 6646 Protocol_err: 0 Timed_out: 0 Invalid_word: 0 Rx_flushed: 0 Invalid_crc: 0 Tx_unavail: 0 Delim_err: 0 Free_buffer: 0 Address_err: 0 Overrun: 0 Lr_in: 36 Suspended: 0 Lr_out: 73 Parity_err: 0 Ols_in: 73 and it's listed in the switch name server (third entry down, 031800): fcswitch_s43_2:admin> nsshow { Type Pid COS PortName NodeName TTL(sec) N 031300; 3;21:00:00:04:d9:60:17:6e;20:00:00:04:d9:60:17:6d; na FC4s: FCP PortSymb: [39] "UNKNOWN A.0 UNKNOWN FW:01.02 Port 1 " Fabric Port Name: 20:03:00:60:69:51:4a:20 N 031500; 3;21:00:00:1b:4d:00:83:ed;20:00:00:1b:4d:00:83:ec; na FC4s: FCP [JetStor FreeBSD mark R4 R001] Fabric Port Name: 20:05:00:60:69:51:4a:20 N 031800; 3;21:00:00:e0:8b:08:06:d2;20:00:00:e0:8b:08:06:d2; na FC4s: FCP Fabric Port Name: 20:08:00:60:69:51:4a:20 N 031900; 3;10:00:00:06:2b:09:4f:d8;20:00:00:06:2b:09:4f:d8; na FC4s: FCIP FCP PortSymb: [47] "LSI7202P B.0 03-01001-02A FW:1.00.06 Port 0 " Fabric Port Name: 20:09:00:60:69:51:4a:20 N 031a00; 2,3;10:00:00:00:c9:24:5b:04;20:00:00:00:c9:24:5b:04; na FC4s: FCP PortSymb: [49] "UNIX (emx2) KGPSA-CA S/W Rev 2.25: F/W Rev 3.93a0" Fabric Port Name: 20:0a:00:60:69:51:4a:20 The Local Name Server has 5 entries } It has been pointed out to me that this kind of weird interaction isn't exactly unknown in the SAN world, and setting up zoning on the switch would probably make it go away. So I will also try that (it's probably a giveway of a SAN novice that I hadn't already done so - it certainly does sound like it would help). But if the hang does point to a problem in the driver, I'm also happy to keep trying different things in the hope of revealing where the problem actually lies. Graham
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080512171404.GE25577>