From owner-freebsd-stable@FreeBSD.ORG Thu Jun 2 07:51:20 2011 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D4E961065675 for ; Thu, 2 Jun 2011 07:51:20 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.emeryville.ca.mail.comcast.net (qmta09.emeryville.ca.mail.comcast.net [76.96.30.96]) by mx1.freebsd.org (Postfix) with ESMTP id B5EB18FC1E for ; Thu, 2 Jun 2011 07:51:20 +0000 (UTC) Received: from omta01.emeryville.ca.mail.comcast.net ([76.96.30.11]) by qmta09.emeryville.ca.mail.comcast.net with comcast id qvrK1g0010EPchoA9vrK8x; Thu, 02 Jun 2011 07:51:19 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta01.emeryville.ca.mail.comcast.net with comcast id qvrS1g00H1t3BNj8MvrTaT; Thu, 02 Jun 2011 07:51:27 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 164D9102C19; Thu, 2 Jun 2011 00:51:18 -0700 (PDT) Date: Thu, 2 Jun 2011 00:51:18 -0700 From: Jeremy Chadwick To: Alexander Motin Message-ID: <20110602075118.GA42026@icarus.home.lan> References: <814C9E9472FDCC40AAC3FC95A2D67E3B0BD88C69@msx3.exchange.alogis.com> <20110601085454.GA19434@icarus.home.lan> <814C9E9472FDCC40AAC3FC95A2D67E3B0BD88DC0@msx3.exchange.alogis.com> <20110601095610.GA20255@icarus.home.lan> <814C9E9472FDCC40AAC3FC95A2D67E3B0BD88F48@msx3.exchange.alogis.com> <814C9E9472FDCC40AAC3FC95A2D67E3B0BD890BD@msx3.exchange.alogis.com> <4DE73386.5040505@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DE73386.5040505@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: "stable@freebsd.org" Subject: Re: 8-STABLE won't boot with ZFSv28 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jun 2011 07:51:21 -0000 On Thu, Jun 02, 2011 at 09:53:58AM +0300, Alexander Motin wrote: > Hi. > > Holger Kipp wrote: > > got the same messages over and over again - panic took some time: > > > > unknown: WARNING - ATAPI_IDENTIFY requeued due to channel reset LBA=0 > > ata0: reinit done .. > > ata0: reiniting channel .. > > ata0: DISCONNECT requested > > > > > > > > ata0: p0: SATA connect time=0ms status=00000113 > > ata0: p1: SATA connect timeout status=00000000 > > ata0: reset tp1 mask=03 ostat0=00 ostat1=00 > > ata0: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb > > ata0: stat1=0x00 err=0x01 lsb=0x14 msb=0xeb > > ata0: reset tp2 stat0=00 stat1=00 devices=0x30000 > > unknown: WARNING - ATAPI_IDENTIFY requeued due to channel reset LBA=0 > > ata0: reinit done .. > > ata0: reiniting channel .. > > ata0: DISCONNECT requested > > I see two problems here: > 1. "devices=0x30000" means that two ATAPI devices were detected instead > of one. I can reproduce it also with other Intel chipsets. It looks like > a hardware bug to me. It can be workarounded by reconnecting ATAPI > device to even (2 or 4) SATA port, or connecting any other device there. > 2. "DISCONNECT requested" means that controller reported PHY status > change for some device on channel, triggering infinite retry. Unluckily > I have no ICH9 board, while I can't reproduce it with ICH10 or above. > > This patch should workaround the first problem in software: > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/chipsets/ata-intel.c.diff?r1=1.25;r2=1.26 > Try it please and let's see if with some luck it do something about the > second problem. With regards to item #1: I don't see anything in the ICH9 errata that indicates a silicon bug if the only device attached to the controller is an ATAPI device and connected to SATA port 0 (presumably), or an odd-numbered port? If this problem exists on other ICHxx and/or ESBxx chips, I sure would hope it'd be documented. I haven't tried confirming it myself, but if need be I can set up a test box with a SATA-based DVD drive hooked up to it + provide remote serial console/etc. if it'd be of any help. I don't think it would be (sounds like you have lots of hardware :-) ), but I'm willing to help in any way I can. With regards to item #2: could this be at all related to OOB (bit 15) somehow being set in PCS (SATA register offset 0x92)? I'm doubting it but I thought I'd ask. My thought process, which is probably wrong (consider it an educational discussion :-) ): The ICH9 specification states that the default value for this register is 0x0000, and b15=0 means "SATA controller will not retry after an OOB failure", while b15=1 causes the controller to indefinitely retry after OOB failure. I imagine system BIOSes and other things can change this default value, but we don't seem to print it anywhere in ata_intel_chipinit() during a verbose boot. Looking at chipsets/ata-intel.c, it looks like we only touch PCS in ata_intel_chipinit() and ata_intel_reset(). In the former, we avoid touching bits 4 through 15, and in the latter we mask out only what we want to adjust (e.g. the SATA port per ch variable). Reference material is 14.1.31 of the ICH9 datasheet: http://www.intel.com/assets/pdf/datasheet/316972.pdf -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |