From owner-freebsd-fs@FreeBSD.ORG Tue May 19 16:03:10 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D193D8D4 for ; Tue, 19 May 2015 16:03:10 +0000 (UTC) Received: from mail.egr.msu.edu (hill.egr.msu.edu [35.9.37.162]) by mx1.freebsd.org (Postfix) with ESMTP id AC78B10F5 for ; Tue, 19 May 2015 16:03:10 +0000 (UTC) Received: from hill (localhost [127.0.0.1]) by mail.egr.msu.edu (Postfix) with ESMTP id E9A6E2A0E1 for ; Tue, 19 May 2015 12:03:08 -0400 (EDT) X-Virus-Scanned: amavisd-new at egr.msu.edu Received: from mail.egr.msu.edu ([127.0.0.1]) by hill (hill.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hfYgl_5uUU6f for ; Tue, 19 May 2015 12:03:08 -0400 (EDT) Received: from EGR authenticated sender mcdouga9 Message-ID: <555B5EBB.20306@egr.msu.edu> Date: Tue, 19 May 2015 12:03:07 -0400 From: Adam McDougall User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: hardware fault during ZFS send/receive blocks /dev/zfs indefinitely References: <86wq048x8h.fsf@emacs.campese.org> In-Reply-To: <86wq048x8h.fsf@emacs.campese.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 May 2015 16:03:10 -0000 (trimmed) On 05/19/2015 10:20, Simon Campese wrote: > Hello, > > I tried to send/receive a ZFS filesystem from a raidz2-pool to another > pool with just a single disk, when this disk failed. As a result, now > both, the zfs send and zfs receive processes are in uninterruptible > sleep state and all new zpool and zfs commands which I issue immediately > enter uninterruptible sleep. Is this just bad luck (i.e. my disk failed > in the wrong moment) or might this be a bug? > > Anyway, my only solution is to schedule a reboot soon as the machine is > a file server and the operational status of zfs is critical. > > I'm not very experienced with zfs or the FreeBSD kernel, so I just try > to supply as much relevant information as possible. Please tell me if > there is more I can do. > > The system I run is FreeBSD 10.1-RELEASE-p6, the machine is a small intel > file server (eight core Atom, 64G Ram, Supermicro board, two raidz2 > pools connected via reflashed IBM M1015 controllers). Here are the > relevant lines from "ps ax" (with anonymized pool/filesystem names): > > The errors showing up in /var/log/messages when my harddisk went west > are (excerpt): > > May 19 15:00:48 srv0 kernel: ahcich7: Timeout on slot 0 port 0 > May 19 15:00:48 srv0 kernel: ahcich7: is 00000000 cs c000001f ss > f800001f rs f800001f tfd 40 serr 00000000 cmd 0004dd17 > May 19 15:00:48 srv0 kernel: (ada7:ahcich7:0:0:0): > WRITE_FPDMA_QUEUED. ACB: 61 0b 8c f3 6a 40 00 00 00 00 00 00 > May 19 15:00:48 srv0 kernel: (ada7:ahcich7:0:0:0): CAM status: Command > timeout > May 19 15:00:48 srv0 kernel: (ada7:ahcich7:0:0:0): Retrying command > > Lines of this form continued for some minutes and after a while, my geli > volume on this hdd began complaining as well: > > May 19 15:03:09 srv0 kernel: GEOM_ELI: Crypto WRITE request failed > (error=6). label/bkp101.eli[WRITE(offset=3595775488, length=131072)] > > Is there any hope for me to resolve this issue without a reboot? > > Thanks for your help, > > Simon Can you try using the geli and/or glabel command to force detach label/bkp101.eli so zfs treats it as a failure? Also I'm not sure how geli and glabel will treat it but you could try sysctl kern.cam.ada.retry_count=0 to make the kernel give up on the disk quicker and the "failure" might cascade up to zfs where it should hopefully give up on the disk. I think the problem here is ZFS does not know about the incomplete failures on the lower layers.