From owner-freebsd-fs@FreeBSD.ORG Wed May 20 06:29:36 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A8EC7DBC for ; Wed, 20 May 2015 06:29:36 +0000 (UTC) Received: from mail.worldserver.net (mail.worldserver.net [217.13.200.36]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.worldserver.net", Issuer "RapidSSL SHA256 CA - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6CE0A1516 for ; Wed, 20 May 2015 06:29:35 +0000 (UTC) Received: from mailer.rman (unknown [2.239.24.102]) (Authenticated sender: mailer@campese.de) by mail.worldserver.net (Postfix) with ESMTPSA id 8F631300390; Wed, 20 May 2015 08:29:27 +0200 (CEST) From: Simon Campese To: freebsd-fs@freebsd.org Subject: Re: hardware fault during ZFS send/receive blocks /dev/zfs indefinitely In-Reply-To: <86wq048x8h.fsf@emacs.campese.org> References: <86wq048x8h.fsf@emacs.campese.org> Date: Wed, 20 May 2015 08:29:25 +0200 Message-ID: <867fs3danu.fsf@emacs.campese.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 May 2015 06:29:36 -0000 Hello, >Can you try using the geli and=C3=B8r glabel command to force detach >label/bkp101.eli so zfs treats it as a failure? Also I'm not sure how >geli and glabel will treat it but you could try sysctl >kern.cam.ada.retry_count=3D0 to make the kernel give up on the disk >quicker and the "failure"might cascade up to zfs where it should >hopefully give up on the disk. I think the problem here is ZFS does not >know about the incomplete failures on the lower layers. I've tried that already (forcefully closing the geli device and removing the label) but it doesn't change anything. In fact, this happened automatically as the drive stopped reacting after some time. So in the end it had to be a reboot. I can replicate the situation on a test machine with the same hardware but, strangely enough, it turns out that the drive seems to be in perfectly fine condition (and it should be, used it as cold storage, it is 5 years old but was powered up <30 times for just a couple of hours).=20= =20=20 I will open another thread for this but for the moment, let me tell you the situation: I've created a label on the naked drive and put a geli volume on it. On this geli volume, I can read and write with dd just fine (tested data up to 10G), but if I put a zfs pool on it and either write directly to it, or via send/receive, the drive stops reacting after some seconds (the "write" hdd light stays on indefinitely) and I start getting the CAM errors.=20 To rule out a FreeBSD specific issue, the disk is tested in a linux machine right now (where it has been working before). It already passed an extended SMART test without any errors and right now is in the middle of a badblocks check (also without errors so far). If no errors are found, I will put it back into my FreeBSD test machine and try to put a plain ufs filesystem on it. Best, Simon