From owner-freebsd-scsi@FreeBSD.ORG Wed Oct 15 11:01:47 2003 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8758316A4B3 for ; Wed, 15 Oct 2003 11:01:47 -0700 (PDT) Received: from pooh.distrust.net (CPE00a0cce2b1a4-CM0010952260cb.cpe.net.cable.rogers.com [63.138.67.69]) by mx1.FreeBSD.org (Postfix) with ESMTP id D62A443FDF for ; Wed, 15 Oct 2003 11:01:44 -0700 (PDT) (envelope-from dsze@pooh.distrust.net) Received: from pooh.distrust.net (dsze@localhost [127.0.0.1]) by pooh.distrust.net (8.12.8p2/8.12.8) with ESMTP id h9FI1Vod026764; Wed, 15 Oct 2003 14:01:45 -0400 (EDT) (envelope-from dsze@pooh.distrust.net) Received: (from dsze@localhost) by pooh.distrust.net (8.12.8p2/8.12.8/Submit) id h9FI1VSY026763; Wed, 15 Oct 2003 14:01:31 -0400 (EDT) Date: Wed, 15 Oct 2003 14:01:31 -0400 From: David Sze To: Nate Lawson Message-ID: <20031015180131.GA25402@pooh.distrust.net> References: <6.0.0.22.2.20031014232154.03a0b990@mail.distrust.net> <6.0.0.22.2.20031015080310.03ac9b88@mail.distrust.net> <20031015100215.U34498@root.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20031015100215.U34498@root.org> User-Agent: Mutt/1.4.1i cc: freebsd-scsi@freebsd.org Subject: Re: Dell PowerEdge 1750 and mpt X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Oct 2003 18:01:47 -0000 On Wed, Oct 15, 2003 at 10:12:49AM -0700, Nate Lawson wrote: > On Wed, 15 Oct 2003, David Sze wrote: > > The application talks to pass(4) to periodically retrieve the serial > > numbers of all devices on the bus (the code is basically copied from > > "camcontrol inquiry -S", plus some code to enumerate the bus). So that is > > consistent with how often we are seeing the crashes. I'll go over the code > > to make sure there are no blatant errors on my part. The only puzzling > > thing is that the same code runs flawlessly on a variety of similar > > hardware, some machines also with mpt(4), but mostly ahc(4) and ahd(4) > > controllers. > > Try running camcontrol inquiry -S on the same device in a loop and see if > it gets the same panic. Running "camcontrol inquiry pass3 -S" in a loop for 1000 times was just fine, there was no panic. > > (kgdb) fr 7 > > #7 0x80174507 in mpt_action (sim=0x923867c0, ccb=0x961a0000) at > > ../../dev/mpt/mpt_freebsd.c:1311 > > 1311 if (mpt_read_cfg_page(mpt, tgt, &tmp.Header)) { > > (kgdb) print *ccb > > stqe_next = 0x0}}, retry_count = 4, cbfcnp = 0x80129a94 > > , func_code = XPT_GET_TRAN_SETTINGS, status = 0, > > Why are you sending a XPT_GET_TRAN_SETTINGS CCB? That's not even needed > to get the serial number. In any case, I need the output of > print ccb->cts. The code never sends XPT_GET_TRAN_SETTINGS, at least not directly. So I guess it is either sent indirectly, or the ccb that was passed in and shown in this crashdump is complete junk. What the code did do was send a XPT_DEV_MATCH, and for each DEV_MATCH_DEVICE it would do a scsi_inquiry() with page_code set to SVPD_UNIT_SERIAL_NUMBER, the same way the scsiserial() function in camcontrol.c does it. I confess to not really knowing anything about the cam subsystem though, so it's entirely possible that I missed something important in copying the code from camcontrol.c. Since these are actually production servers, what I've done in the meantime is (read the cam(3) manpage, haha), then changed the code to repeatedly call cam_open_spec_device() with dev_name set to "pass", and unit incrementing until cam_open_spec_device() returns NULL. Then I just use the serial_num field of the returned cam_device structures.