From owner-freebsd-scsi@FreeBSD.ORG Mon Nov 14 20:10:11 2011 Return-Path: Delivered-To: freebsd-scsi@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 09184106564A for ; Mon, 14 Nov 2011 20:10:11 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id D2F4B8FC0A for ; Mon, 14 Nov 2011 20:10:10 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id pAEKAAm7083843 for ; Mon, 14 Nov 2011 20:10:10 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id pAEKAAPM083842; Mon, 14 Nov 2011 20:10:10 GMT (envelope-from gnats) Date: Mon, 14 Nov 2011 20:10:10 GMT Message-Id: <201111142010.pAEKAAPM083842@freefall.freebsd.org> To: freebsd-scsi@FreeBSD.org From: Jason Wolfe Cc: Subject: Re: kern/154432: [xpt] run_interrupt_driven_hooks: still waiting after 60-300 seconds for xpt_config X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Jason Wolfe List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Nov 2011 20:10:11 -0000 The following reply was made to PR kern/154432; it has been noted by GNATS. From: Jason Wolfe To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/154432: [xpt] run_interrupt_driven_hooks: still waiting after 60-300 seconds for xpt_config Date: Mon, 14 Nov 2011 12:36:37 -0700 --f46d044468d3369bc304b1b6fd26 Content-Type: text/plain; charset=ISO-8859-1 This is happening to me also on a Supermicro X8DTT-H chasis with an LSI2008 SAS2 controller and 12 drives on 8.2-RELEASE-p4 with the mps driver from STABLE. mps0@pci0:4:0:0: class=0x010700 card=0x040015d9 chip=0x00721000 rev=0x02 hdr=0x00 vendor = 'LSI Logic (Was: Symbios Logic, NCR)' class = mass storage subclass = SAS Though the dmesg output is indentical, my problem is a bit different as the 300 second timeout passes, but it still in some cases takes the server 24+ _hours_ to finally continue booting. It hangs after the 300 second message until the time it manages to continue booting, and then all messages appear as normal. The cause of this is surely drives stuck in a transient state that makes them still look alive to the kernel. When the drive is popped out the server boots immediately. 300 seconds to wait for a drive that you think might be alive does seem a bit high, but I would even be happy if even that was being honored in my case. This is an issue across a large pool of servers and I have seen the behavior on ~20 different machines from different batches of chasis and drives in unique locations. --f46d044468d3369bc304b1b6fd26 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable This is happening to me also on a Supermicro X8DTT-H chasis with an LSI2008= SAS2 controller and 12 drives on 8.2-RELEASE-p4 with the mps driver from S= TABLE.

mps0@pci0:4:0:0:=A0=A0=A0=A0=A0=A0=A0 class=3D0x010700 card= =3D0x040015d9 chip=3D0x00721000 rev=3D0x02 hdr=3D0x00
=A0=A0=A0 vendor=A0=A0=A0=A0 =3D 'LSI Logic (Was: Symbios Logic, NCR)&#= 39;
=A0=A0=A0 class=A0=A0=A0=A0=A0 =3D mass storage
=A0=A0=A0 subclas= s=A0=A0 =3D SAS

Though the dmesg output is indentical, my problem is= a bit different as the 300 second timeout passes, but it still in some cas= es takes the server 24+ _hours_ to finally continue booting.=A0 It hangs af= ter the 300 second message until the time it manages to continue booting, a= nd then all messages appear as normal.=A0 The cause of this is surely drive= s stuck in a transient state that makes them still look alive to the kernel= .=A0 When the drive is popped out the server boots immediately.=A0 300 seco= nds to wait for a drive that you think might be alive does seem a bit high,= but I would even be happy if even that was being honored in my case.=A0 Th= is is an issue across a large pool of servers and I have seen the behavior = on ~20 different machines from different batches of chasis and drives in un= ique locations.
--f46d044468d3369bc304b1b6fd26--