From owner-freebsd-stable@FreeBSD.ORG  Thu Apr 16 11:54:53 2015
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id F17D2C11
 for <freebsd-stable@FreeBSD.org>; Thu, 16 Apr 2015 11:54:52 +0000 (UTC)
Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 801FEE5A
 for <freebsd-stable@FreeBSD.org>; Thu, 16 Apr 2015 11:54:51 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id t3GBlRDC010708;
 Thu, 16 Apr 2015 14:47:27 +0300 (MSK) (envelope-from marck@rinet.ru)
Date: Thu, 16 Apr 2015 14:47:27 +0300 (MSK)
From: Dmitry Morozovsky <marck@rinet.ru>
To: Walter Cramer <wfc@mintsol.com>
cc: freebsd-stable@FreeBSD.org
Subject: Re: [GEOM] Disk IO error when resyncing gmirror -> massive hang in
 D state
In-Reply-To: <20150415132245.B71411@mulder.mintsol.com>
Message-ID: <alpine.BSF.2.00.1504161444140.7918@woozle.rinet.ru>
References: <alpine.BSF.2.00.1504140017170.47151@woozle.rinet.ru>
 <20150415132245.B71411@mulder.mintsol.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
X-NCC-RegID: ru.rinet
X-OpenPGP-Key-ID: 6B691B03
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (woozle.rinet.ru [0.0.0.0]); Thu, 16 Apr 2015 14:48:34 +0300 (MSK)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Apr 2015 11:54:53 -0000

Walter,

thanks for your suggestions.

to quickly answer: I' already evacuated data to the new drive (see the last 
paragraph of my original message). Luckily no critical data were on failed disk 
part, so rsync finished well the very first pass.

The only question still actually open for me is why the kernel was stuck in 
geom, not returning read/write errors to the applications

I'll try to collect lab machine with this drive (which is still by my work 
table) and reproduce the error.


On Wed, 15 Apr 2015, Walter Cramer wrote:

> Here are a few ideas I had, if more capable people have not already sent you
> better ones:
> 
> Copy as much important data as possible from the Toshiba drive, since it could
> degrade further or die at any time.
> 
> Check whether a 'dd' command can quickly reproduce the error, so you can try
> things faster.
> 
> If the failing drive is not fairly cold, try chilling it with a strong fan.
> 
> Briefly put the drive in another system, to see if using a different power
> supply, controller, data cable, etc. would help.  Changing the orientation
> (direction of gravity on the drive) might also be good.
> 
> If nothing else helped, a tiny c language program might use open(), read(),
> lseek(), write(), etc. to copy all readable sectors to your replacement disk
> (using zeros for the unreadable bad sectors).
> 
> -Walter
> 
> 
> On Tue, 14 Apr 2015, Dmitry Morozovsky wrote:
> 
> > Dear colleagues,
> > 
> > unfortunately, the machine in question is in productin, so I have no clear
> > reproduce case. I do have console logs, however.
> > 
> > prerequisites:
> > - rather fresh stable/10, amd64, SuperMicro MicroCloud 1150, X10SLD-F/HF
> > - su+j ufs2 on top of gmirror of two SATA Toshiba drives
> > - one disk died some time ago, so gmirror works in degraded state
> > 
> > trouble:
> > - inserted new drive, labelled, started gmirror resync
> > - apparently remaining drive also has read issues:
> > (ada0:ahcich1:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 10 b2 c3 40 01 00 00 01
> > 00 00
> > (ada0:ahcich1:0:0:0): CAM status: ATA Status Error
> > (ada0:ahcich1:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> > (ada0:ahcich1:0:0:0): RES: 41 40 04 b3 c3 40 01 00 00 00 01
> > (ada0:ahcich1:0:0:0): Error 5, Retries exhausted
> > GEOM_MIRROR: Request failed (error=5). ada0a[READ(offset=6566445056,
> > length=131072)]
> > GEOM_MIRROR: Synchronization request failed (error=5).
> > mirror/m0a[READ(offset=6566445056, length=131072)]
> > 
> > at this point, all requests to disk I/O are stalled, all cron jobs, syslogd,
> > dchpd, etc.
> > 
> > Situation reproduce itself at least two times, then as an emergency new
> > drive
> > had been labelled independently and rsynced over.
> > 
> > Any thoughts?
> > 
> > Thanks in advance!
> > 
> > 
> > -- 
> > Sincerely,
> > D.Marck                                     [DM5020, MCK-RIPE, DM3-RIPN]
> > [ FreeBSD committer:                                 marck@FreeBSD.org ]
> > ------------------------------------------------------------------------
> > *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru ***
> > ------------------------------------------------------------------------
> > _______________________________________________
> > freebsd-stable@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
> 

-- 
Sincerely,
D.Marck                                     [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer:                                 marck@FreeBSD.org ]
------------------------------------------------------------------------
*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru ***
------------------------------------------------------------------------