From owner-freebsd-stable@FreeBSD.ORG Wed Jun 9 06:47:54 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8BF75106566B for ; Wed, 9 Jun 2010 06:47:54 +0000 (UTC) (envelope-from lambert@lambertfam.org) Received: from sysmon.tcworks.net (sysmon.tcworks.net [65.66.76.4]) by mx1.freebsd.org (Postfix) with ESMTP id 3B0B38FC1D for ; Wed, 9 Jun 2010 06:47:53 +0000 (UTC) Received: from sysmon.tcworks.net (localhost [127.0.0.1]) by sysmon.tcworks.net (8.13.1/8.13.1) with ESMTP id o596lr8a074842 for ; Wed, 9 Jun 2010 01:47:53 -0500 (CDT) (envelope-from lambert@lambertfam.org) Received: (from lambert@localhost) by sysmon.tcworks.net (8.13.1/8.13.1/Submit) id o596lrPI074841 for freebsd-stable@freebsd.org; Wed, 9 Jun 2010 01:47:53 -0500 (CDT) (envelope-from lambert@lambertfam.org) X-Authentication-Warning: sysmon.tcworks.net: lambert set sender to lambert@lambertfam.org using -f Date: Wed, 9 Jun 2010 01:47:53 -0500 From: Scott Lambert To: freebsd-stable@freebsd.org Message-ID: <20100609064753.GA46148@sysmon.tcworks.net> Mail-Followup-To: freebsd-stable@freebsd.org References: <20100606052509.GA4744@mavetju.org> <20100606185551.GA267@sysmon.tcworks.net> <20100606194515.GA29230@icarus.home.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100606194515.GA29230@icarus.home.lan> User-Agent: Mutt/1.4.2.2i Subject: Re: gmirror refused to connect second disk after a reboot X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-stable@freebsd.org List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2010 06:47:54 -0000 On Sun, Jun 06, 2010 at 12:45:15PM -0700, Jeremy Chadwick wrote: > On Sun, Jun 06, 2010 at 01:55:51PM -0500, Scott Lambert wrote: > > I have one dual PIII machine doing the same to me. I've been assuming > > my issue is with the ATA controller. ... > I agree -- these look like you have either a bad PATA cable, an PATA > controller port which has gone bad, or a PATA controller which is > behaving *very* badly (internal IC problems). ICRC errors indicate data > transmission failures between the controller and the disk. > > Since these are classic PATA disks, ad0 is probably the master and ad2 > is the slave -- but both are probably on the same physical cable. > > The LBAs for both ad0 and ad2 are very close (ad0=242235039, > ad2=242234911), which makes sense since they're in a mirror config. But > two disks going kaput at the same time, around the same LBA? I have my > doubts. I think I actually made sure that ad0 and ad2 are on their own cables. ad0 may be sharing with acd0 though. Yeah, looks like it. 01:16:24 Wed Jun 09 $ sudo atacontrol list ATA channel 0: Master: ad0 ATA/ATAPI revision 7 Slave: acd0 ATA/ATAPI revision 0 ATA channel 1: Master: ad2 ATA/ATAPI revision 7 Slave: no device present > SMART statistics for both of the disks themselves would help determine > if the disks are seeing issues or if the disks are also seeing problems > communicating with the PATA controller. (Depends on the age of the disks > though; some older PATA disks don't have the SMART attribute that > describes this). The drives are only a couple of years old. The box itself is ancient. :-) The ICRC error only seem to have occured right after boot. I'll jerk the box apart to check/change the cabling when I get a chance. Maybe I'll just dump the cd drive. > What you should be worried about -- FreeBSD sees problems on both ad0 > and ad2. ad2 is offline cuz of the problem, but ad0 isn't. Chances are > ad0 is going to fall off the bus eventually because of this problem. I > really hope you do backups regularly (daily) if you plan on just > ignoring this problem. AMANDA takes care of things. Also, this box is not terribly important. I rebuilt the array Sunday. I don't see anything terribly scary in the smartctl output. Anyway, I do hope I haven't hijacked the thread for the OP. I actually just wanted to offer a possible matching datapoint. -- Scott Lambert KC5MLE Unix SysAdmin lambert@lambertfam.org