From owner-freebsd-fs@FreeBSD.ORG  Wed May 20 06:29:36 2015
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A8EC7DBC
 for <freebsd-fs@freebsd.org>; Wed, 20 May 2015 06:29:36 +0000 (UTC)
Received: from mail.worldserver.net (mail.worldserver.net [217.13.200.36])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "*.worldserver.net", Issuer "RapidSSL SHA256 CA - G3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 6CE0A1516
 for <freebsd-fs@freebsd.org>; Wed, 20 May 2015 06:29:35 +0000 (UTC)
Received: from mailer.rman (unknown [2.239.24.102])
 (Authenticated sender: mailer@campese.de)
 by mail.worldserver.net (Postfix) with ESMTPSA id 8F631300390;
 Wed, 20 May 2015 08:29:27 +0200 (CEST)
From: Simon Campese <freebsd_fs@campese.de>
To: freebsd-fs@freebsd.org
Subject: Re: hardware fault during ZFS send/receive blocks /dev/zfs
 indefinitely
In-Reply-To: <86wq048x8h.fsf@emacs.campese.org>
References: <86wq048x8h.fsf@emacs.campese.org>
Date: Wed, 20 May 2015 08:29:25 +0200
Message-ID: <867fs3danu.fsf@emacs.campese.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 May 2015 06:29:36 -0000

Hello,

>Can you try using the geli and=C3=B8r glabel command to force detach
>label/bkp101.eli so zfs treats it as a failure?  Also I'm not sure how
>geli and glabel will treat it but you could try sysctl
>kern.cam.ada.retry_count=3D0 to make the kernel give up on the disk
>quicker and the "failure"might cascade up to zfs where it should
>hopefully give up on the disk.  I think the problem here is ZFS does not
>know about the incomplete failures on the lower layers.

I've tried that already (forcefully closing the geli device and removing
the label) but it doesn't change anything. In fact, this happened
automatically as the drive stopped reacting after some time. So in the
end it had to be a reboot.
I can replicate the situation on a test machine with the same hardware
but, strangely enough, it turns out that the drive seems to be in
perfectly fine condition (and it should be, used it as cold storage, it
is 5 years old but was powered up <30 times for just a couple of hours).=20=
=20=20

I will open another thread for this but for the moment, let me
tell you the situation: I've created a label on the naked drive and put
a geli volume on it. On this geli volume, I can read and write with dd
just fine (tested data up to 10G), but if I put a zfs pool on it and
either write directly to it, or via send/receive, the drive stops
reacting after some seconds (the "write" hdd light stays on
indefinitely) and I start getting the CAM errors.=20

To rule out a FreeBSD specific issue, the disk is tested in a linux
machine right now (where it has been working before). It already passed
an extended SMART test without any errors and right now is in the middle
of a badblocks check (also without errors so far). If no errors are
found, I will put it back into my FreeBSD test machine and try to put a
plain ufs filesystem on it.


Best,

Simon