From owner-freebsd-stable@FreeBSD.ORG Sun Jun 6 19:45:18 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4000C1065670 for ; Sun, 6 Jun 2010 19:45:18 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.westchester.pa.mail.comcast.net (qmta04.westchester.pa.mail.comcast.net [76.96.62.40]) by mx1.freebsd.org (Postfix) with ESMTP id E19EF8FC0A for ; Sun, 6 Jun 2010 19:45:17 +0000 (UTC) Received: from omta01.westchester.pa.mail.comcast.net ([76.96.62.11]) by qmta04.westchester.pa.mail.comcast.net with comcast id Sjig1e0020EZKEL54jlJyR; Sun, 06 Jun 2010 19:45:18 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta01.westchester.pa.mail.comcast.net with comcast id SjlG1e0083S48mS3MjlHxS; Sun, 06 Jun 2010 19:45:17 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 41D949B418; Sun, 6 Jun 2010 12:45:15 -0700 (PDT) Date: Sun, 6 Jun 2010 12:45:15 -0700 From: Jeremy Chadwick To: freebsd-stable@freebsd.org, Edwin Groothuis Message-ID: <20100606194515.GA29230@icarus.home.lan> References: <20100606052509.GA4744@mavetju.org> <20100606185551.GA267@sysmon.tcworks.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100606185551.GA267@sysmon.tcworks.net> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: Subject: Re: gmirror refused to connect second disk after a reboot X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jun 2010 19:45:18 -0000 On Sun, Jun 06, 2010 at 01:55:51PM -0500, Scott Lambert wrote: > I have one dual PIII machine doing the same to me. I've been assuming > my issue is with the ATA controller. ... > > Dec 11 02:01:48 netmon kernel: ad2: TIMEOUT - READ_DMA retrying (1 retry left) LBA=232068607 > Dec 11 02:02:00 netmon kernel: ad2: setting PIO4 on ROSB4 chip > Dec 11 02:02:00 netmon kernel: ad2: setting UDMA33 on ROSB4 chip > Dec 11 02:02:00 netmon kernel: ad2: TIMEOUT - READ_DMA retrying (1 retry left) LBA=232766751 > Dec 11 02:02:10 netmon kernel: ad0: setting PIO4 on ROSB4 chip > Dec 11 02:02:10 netmon kernel: ad0: setting UDMA33 on ROSB4 chip > Dec 11 02:02:10 netmon kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=232006207 > Dec 11 02:02:36 netmon kernel: ad0: setting PIO4 on ROSB4 chip > Dec 11 02:02:36 netmon kernel: ad0: setting UDMA33 on ROSB4 chip > Dec 11 02:02:36 netmon kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=242232479 > Dec 11 02:02:37 netmon kernel: ad2: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=242234911 > Dec 11 02:02:37 netmon kernel: ad0: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=242235039 > Dec 11 02:02:37 netmon kernel: ad2: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=242234911 > Dec 11 02:02:37 netmon kernel: ad0: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=242235039 > Dec 11 02:02:37 netmon kernel: ad2: FAILURE - READ_DMA status=51 error=84 LBA=242234911 > Dec 11 02:02:37 netmon kernel: ad0: FAILURE - READ_DMA status=51 error=84 LBA=242235039 > Dec 11 02:02:37 netmon kernel: GEOM_MIRROR: Request failed (error=5). ad2[READ(offset=124024274432, length=65536)] > Dec 11 02:02:37 netmon kernel: GEOM_MIRROR: Device gm0: provider ad2 disconnected. > Dec 11 02:02:37 netmon kernel: GEOM_MIRROR: Request failed (error=5). ad0[READ(offset=124024339968, length=65536)] > Dec 11 02:02:37 netmon kernel: g_vfs_done():mirror/gm0s1e[READ(offset=112213082112, length=131072)]error = 5 > Dec 11 02:02:47 netmon kernel: ad0: setting PIO4 on ROSB4 chip > Dec 11 02:02:47 netmon kernel: ad0: setting UDMA33 on ROSB4 chip > Dec 11 02:02:47 netmon kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=242234911 > Dec 11 02:02:47 netmon kernel: ad0: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=242235039 > Dec 11 02:02:47 netmon kernel: ad0: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=242235039 > Dec 11 02:02:47 netmon kernel: ad0: FAILURE - READ_DMA status=51 error=84 LBA=242235039 > Dec 11 02:02:47 netmon kernel: g_vfs_done():mirror/gm0s1e[READ(offset=112213082112, length=131072)]error = 5 > Dec 11 02:02:50 netmon kernel: ad0: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=232478271 > Dec 11 02:02:50 netmon kernel: ad0: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=232478271 > Dec 11 02:02:50 netmon kernel: ad0: FAILURE - READ_DMA status=51 error=84 LBA=232478271 > Dec 11 02:02:50 netmon kernel: g_vfs_done():mirror/gm0s1e[READ(offset=107217682432, length=131072)]error = 5 I agree -- these look like you have either a bad PATA cable, an PATA controller port which has gone bad, or a PATA controller which is behaving *very* badly (internal IC problems). ICRC errors indicate data transmission failures between the controller and the disk. Since these are classic PATA disks, ad0 is probably the master and ad2 is the slave -- but both are probably on the same physical cable. The LBAs for both ad0 and ad2 are very close (ad0=242235039, ad2=242234911), which makes sense since they're in a mirror config. But two disks going kaput at the same time, around the same LBA? I have my doubts. SMART statistics for both of the disks themselves would help determine if the disks are seeing issues or if the disks are also seeing problems communicating with the PATA controller. (Depends on the age of the disks though; some older PATA disks don't have the SMART attribute that describes this). What you should be worried about -- FreeBSD sees problems on both ad0 and ad2. ad2 is offline cuz of the problem, but ad0 isn't. Chances are ad0 is going to fall off the bus eventually because of this problem. I really hope you do backups regularly (daily) if you plan on just ignoring this problem. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |