Date: Sun, 6 Jun 2010 12:45:15 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: freebsd-stable@freebsd.org, Edwin Groothuis <edwin@mavetju.org> Subject: Re: gmirror refused to connect second disk after a reboot Message-ID: <20100606194515.GA29230@icarus.home.lan> In-Reply-To: <20100606185551.GA267@sysmon.tcworks.net> References: <20100606052509.GA4744@mavetju.org> <20100606185551.GA267@sysmon.tcworks.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jun 06, 2010 at 01:55:51PM -0500, Scott Lambert wrote: > I have one dual PIII machine doing the same to me. I've been assuming > my issue is with the ATA controller. ... > > Dec 11 02:01:48 netmon kernel: ad2: TIMEOUT - READ_DMA retrying (1 retry left) LBA=232068607 > Dec 11 02:02:00 netmon kernel: ad2: setting PIO4 on ROSB4 chip > Dec 11 02:02:00 netmon kernel: ad2: setting UDMA33 on ROSB4 chip > Dec 11 02:02:00 netmon kernel: ad2: TIMEOUT - READ_DMA retrying (1 retry left) LBA=232766751 > Dec 11 02:02:10 netmon kernel: ad0: setting PIO4 on ROSB4 chip > Dec 11 02:02:10 netmon kernel: ad0: setting UDMA33 on ROSB4 chip > Dec 11 02:02:10 netmon kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=232006207 > Dec 11 02:02:36 netmon kernel: ad0: setting PIO4 on ROSB4 chip > Dec 11 02:02:36 netmon kernel: ad0: setting UDMA33 on ROSB4 chip > Dec 11 02:02:36 netmon kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=242232479 > Dec 11 02:02:37 netmon kernel: ad2: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=242234911 > Dec 11 02:02:37 netmon kernel: ad0: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=242235039 > Dec 11 02:02:37 netmon kernel: ad2: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=242234911 > Dec 11 02:02:37 netmon kernel: ad0: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=242235039 > Dec 11 02:02:37 netmon kernel: ad2: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=84<ICRC,ABORTED> LBA=242234911 > Dec 11 02:02:37 netmon kernel: ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=84<ICRC,ABORTED> LBA=242235039 > Dec 11 02:02:37 netmon kernel: GEOM_MIRROR: Request failed (error=5). ad2[READ(offset=124024274432, length=65536)] > Dec 11 02:02:37 netmon kernel: GEOM_MIRROR: Device gm0: provider ad2 disconnected. > Dec 11 02:02:37 netmon kernel: GEOM_MIRROR: Request failed (error=5). ad0[READ(offset=124024339968, length=65536)] > Dec 11 02:02:37 netmon kernel: g_vfs_done():mirror/gm0s1e[READ(offset=112213082112, length=131072)]error = 5 > Dec 11 02:02:47 netmon kernel: ad0: setting PIO4 on ROSB4 chip > Dec 11 02:02:47 netmon kernel: ad0: setting UDMA33 on ROSB4 chip > Dec 11 02:02:47 netmon kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=242234911 > Dec 11 02:02:47 netmon kernel: ad0: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=242235039 > Dec 11 02:02:47 netmon kernel: ad0: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=242235039 > Dec 11 02:02:47 netmon kernel: ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=84<ICRC,ABORTED> LBA=242235039 > Dec 11 02:02:47 netmon kernel: g_vfs_done():mirror/gm0s1e[READ(offset=112213082112, length=131072)]error = 5 > Dec 11 02:02:50 netmon kernel: ad0: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=232478271 > Dec 11 02:02:50 netmon kernel: ad0: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=232478271 > Dec 11 02:02:50 netmon kernel: ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=84<ICRC,ABORTED> LBA=232478271 > Dec 11 02:02:50 netmon kernel: g_vfs_done():mirror/gm0s1e[READ(offset=107217682432, length=131072)]error = 5 I agree -- these look like you have either a bad PATA cable, an PATA controller port which has gone bad, or a PATA controller which is behaving *very* badly (internal IC problems). ICRC errors indicate data transmission failures between the controller and the disk. Since these are classic PATA disks, ad0 is probably the master and ad2 is the slave -- but both are probably on the same physical cable. The LBAs for both ad0 and ad2 are very close (ad0=242235039, ad2=242234911), which makes sense since they're in a mirror config. But two disks going kaput at the same time, around the same LBA? I have my doubts. SMART statistics for both of the disks themselves would help determine if the disks are seeing issues or if the disks are also seeing problems communicating with the PATA controller. (Depends on the age of the disks though; some older PATA disks don't have the SMART attribute that describes this). What you should be worried about -- FreeBSD sees problems on both ad0 and ad2. ad2 is offline cuz of the problem, but ad0 isn't. Chances are ad0 is going to fall off the bus eventually because of this problem. I really hope you do backups regularly (daily) if you plan on just ignoring this problem. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100606194515.GA29230>