From owner-freebsd-scsi@FreeBSD.ORG Thu May 15 23:58:26 2008 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D2E101065673 for ; Thu, 15 May 2008 23:58:26 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id 67FEC8FC1C for ; Thu, 15 May 2008 23:58:26 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from joshua-aunes-macbook-pro.local ([192.168.254.200]) (authenticated bits=0) by pooker.samsco.org (8.13.8/8.13.8) with ESMTP id m4FNwMCT069793; Thu, 15 May 2008 17:58:22 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <482CCE1D.70703@samsco.org> Date: Thu, 15 May 2008 17:58:21 -0600 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.13) Gecko/20080313 SeaMonkey/1.1.9 MIME-Version: 1.0 To: Graham Allan References: <20080509011028.GV25577@physics.umn.edu> <20080509215621.GX25577@physics.umn.edu> <482646B5.807@miralink.com> <482760D0.1070106@physics.umn.edu> <48276560.30302@miralink.com> <4827AD9F.50202@physics.umn.edu> <3c0b01820805120919s7c8d5249xf5dd62934c113506@mail.gmail.com> <20080512171404.GE25577@physics.umn.edu> <20080514014307.GV25577@physics.umn.edu> <482C3446.8010203@physics.umn.edu> In-Reply-To: <482C3446.8010203@physics.umn.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.4 required=3.8 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on pooker.samsco.org Cc: freebsd-scsi@freebsd.org Subject: Re: Hang on boot in isp with QLA2342 after upgrading to 6.3 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 May 2008 23:58:26 -0000 Graham Allan wrote: > Graham Allan wrote: >> On Mon, May 12, 2008 at 12:14:04PM -0500, Graham Allan wrote: >>> It has been pointed out to me that this kind of weird interaction isn't >>> exactly unknown in the SAN world, and setting up zoning on the switch >>> would probably make it go away. So I will also try that (it's probably >>> a giveway of a SAN novice that I hadn't already done so - it certainly >>> does sound like it would help). But if the hang does point to a problem >>> in the driver, I'm also happy to keep trying different things in the >>> hope of revealing where the problem actually lies. >> >> Replying to my own message here. >> >> The good news for me is that setting up zoning in the switch does fix >> (or at least hide) the problem on this server for me. >> >> The bad news is, I believe I'm seeing a similar kind of behaviour on a >> completely different 6.3 setup. Haven't had time to fully characterise >> it yet, but in short... Dell 1950 with QLA2342, connected directly to >> an EMC CX300 array. Very often (lets say unpredictably 50% of time) >> hangs during boot at exactly the same point as the first system, right >> around the time it would be probing for drives. > > So I guess one thing I could do is build a kernal with debugging support > (and possibly the "deadlock recipe" from the freebsd handbook), and > force it to the debugger when it hangs. Then I could at least get some > tracebacks and other information - though as it never actually panics > I'm not sure how useful the information will be - I guess it's likely > stuck in a loop somehow. It should give some clue. > > Does that sound like a reasonable idea? Does the kernel version matter > (eg standard 6.3 vs RELENG_6)? Is this list the most appropriate place > for me to talk about the issue? > > (I also think I should double-check 6.2 again, as its release notes > indicate it was where isp was synced from CURRENT - I'd think it should > have the same issue). > > Thanks for everyones interest, > > Graham Well, is it actually deadlocking, or just holding up the boot while it tries to individually probe many thousands of target and lun ID's? I'd bet it's the latter. Compiling in the debugger is the correct first step. You can then compile in CAMDEBUG, CAM_DEBUG_LUN=-1, and CAM_DEBUG_FLAGS=CAM_DEBUG_INFO. Scott