Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Jun 2010 01:47:53 -0500
From:      Scott Lambert <lambert@lambertfam.org>
To:        freebsd-stable@freebsd.org
Subject:   Re: gmirror refused to connect second disk after a reboot
Message-ID:  <20100609064753.GA46148@sysmon.tcworks.net>
In-Reply-To: <20100606194515.GA29230@icarus.home.lan>
References:  <20100606052509.GA4744@mavetju.org> <20100606185551.GA267@sysmon.tcworks.net> <20100606194515.GA29230@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jun 06, 2010 at 12:45:15PM -0700, Jeremy Chadwick wrote:
> On Sun, Jun 06, 2010 at 01:55:51PM -0500, Scott Lambert wrote:
> > I have one dual PIII machine doing the same to me.  I've been assuming
> > my issue is with the ATA controller.  ...

<snip>

> I agree -- these look like you have either a bad PATA cable, an PATA
> controller port which has gone bad, or a PATA controller which is
> behaving *very* badly (internal IC problems).  ICRC errors indicate data
> transmission failures between the controller and the disk.
> 
> Since these are classic PATA disks, ad0 is probably the master and ad2
> is the slave -- but both are probably on the same physical cable.
> 
> The LBAs for both ad0 and ad2 are very close (ad0=242235039,
> ad2=242234911), which makes sense since they're in a mirror config.  But
> two disks going kaput at the same time, around the same LBA?  I have my
> doubts.

I think I actually made sure that ad0 and ad2 are on their own cables.
ad0 may be sharing with acd0 though.

Yeah, looks like it.

01:16:24 Wed Jun 09 $ sudo atacontrol list
ATA channel 0:
    Master:  ad0 <WDC WD2500JB-57REA0/20.00K20> ATA/ATAPI revision 7
    Slave:  acd0 <LG CD-ROM CRD-8521B/1.04> ATA/ATAPI revision 0
ATA channel 1:
    Master:  ad2 <WDC WD2500JB-57REA0/20.00K20> ATA/ATAPI revision 7
    Slave:       no device present


> SMART statistics for both of the disks themselves would help determine
> if the disks are seeing issues or if the disks are also seeing problems
> communicating with the PATA controller.  (Depends on the age of the disks
> though; some older PATA disks don't have the SMART attribute that
> describes this).

The drives are only a couple of years old.  The box itself is ancient.
:-)  The ICRC error only seem to have occured right after boot.  

I'll jerk the box apart to check/change the cabling when I get a chance.
Maybe I'll just dump the cd drive.

> What you should be worried about -- FreeBSD sees problems on both ad0
> and ad2.  ad2 is offline cuz of the problem, but ad0 isn't.  Chances are
> ad0 is going to fall off the bus eventually because of this problem.  I
> really hope you do backups regularly (daily) if you plan on just
> ignoring this problem.

AMANDA takes care of things.  Also, this box is not terribly important.
I rebuilt the array Sunday.  I don't see anything terribly scary in the
smartctl output.

Anyway, I do hope I haven't hijacked the thread for the OP.  I actually
just wanted to offer a possible matching datapoint.

-- 
Scott Lambert                    KC5MLE                       Unix SysAdmin
lambert@lambertfam.org




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100609064753.GA46148>