From owner-freebsd-stable@FreeBSD.ORG Thu Apr 16 11:54:53 2015 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F17D2C11 for ; Thu, 16 Apr 2015 11:54:52 +0000 (UTC) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 801FEE5A for ; Thu, 16 Apr 2015 11:54:51 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id t3GBlRDC010708; Thu, 16 Apr 2015 14:47:27 +0300 (MSK) (envelope-from marck@rinet.ru) Date: Thu, 16 Apr 2015 14:47:27 +0300 (MSK) From: Dmitry Morozovsky To: Walter Cramer cc: freebsd-stable@FreeBSD.org Subject: Re: [GEOM] Disk IO error when resyncing gmirror -> massive hang in D state In-Reply-To: <20150415132245.B71411@mulder.mintsol.com> Message-ID: References: <20150415132245.B71411@mulder.mintsol.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Thu, 16 Apr 2015 14:48:34 +0300 (MSK) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Apr 2015 11:54:53 -0000 Walter, thanks for your suggestions. to quickly answer: I' already evacuated data to the new drive (see the last paragraph of my original message). Luckily no critical data were on failed disk part, so rsync finished well the very first pass. The only question still actually open for me is why the kernel was stuck in geom, not returning read/write errors to the applications I'll try to collect lab machine with this drive (which is still by my work table) and reproduce the error. On Wed, 15 Apr 2015, Walter Cramer wrote: > Here are a few ideas I had, if more capable people have not already sent you > better ones: > > Copy as much important data as possible from the Toshiba drive, since it could > degrade further or die at any time. > > Check whether a 'dd' command can quickly reproduce the error, so you can try > things faster. > > If the failing drive is not fairly cold, try chilling it with a strong fan. > > Briefly put the drive in another system, to see if using a different power > supply, controller, data cable, etc. would help. Changing the orientation > (direction of gravity on the drive) might also be good. > > If nothing else helped, a tiny c language program might use open(), read(), > lseek(), write(), etc. to copy all readable sectors to your replacement disk > (using zeros for the unreadable bad sectors). > > -Walter > > > On Tue, 14 Apr 2015, Dmitry Morozovsky wrote: > > > Dear colleagues, > > > > unfortunately, the machine in question is in productin, so I have no clear > > reproduce case. I do have console logs, however. > > > > prerequisites: > > - rather fresh stable/10, amd64, SuperMicro MicroCloud 1150, X10SLD-F/HF > > - su+j ufs2 on top of gmirror of two SATA Toshiba drives > > - one disk died some time ago, so gmirror works in degraded state > > > > trouble: > > - inserted new drive, labelled, started gmirror resync > > - apparently remaining drive also has read issues: > > (ada0:ahcich1:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 10 b2 c3 40 01 00 00 01 > > 00 00 > > (ada0:ahcich1:0:0:0): CAM status: ATA Status Error > > (ada0:ahcich1:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC ) > > (ada0:ahcich1:0:0:0): RES: 41 40 04 b3 c3 40 01 00 00 00 01 > > (ada0:ahcich1:0:0:0): Error 5, Retries exhausted > > GEOM_MIRROR: Request failed (error=5). ada0a[READ(offset=6566445056, > > length=131072)] > > GEOM_MIRROR: Synchronization request failed (error=5). > > mirror/m0a[READ(offset=6566445056, length=131072)] > > > > at this point, all requests to disk I/O are stalled, all cron jobs, syslogd, > > dchpd, etc. > > > > Situation reproduce itself at least two times, then as an emergency new > > drive > > had been labelled independently and rsynced over. > > > > Any thoughts? > > > > Thanks in advance! > > > > > > -- > > Sincerely, > > D.Marck [DM5020, MCK-RIPE, DM3-RIPN] > > [ FreeBSD committer: marck@FreeBSD.org ] > > ------------------------------------------------------------------------ > > *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** > > ------------------------------------------------------------------------ > > _______________________________________________ > > freebsd-stable@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------