From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 19 14:47:47 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9A9A0E5B
 for <freebsd-fs@freebsd.org>; Wed, 19 Feb 2014 14:47:47 +0000 (UTC)
Received: from cu01176a.smtpx.saremail.com (cu01176a.smtpx.saremail.com
 [195.16.150.151])
 by mx1.freebsd.org (Postfix) with ESMTP id 59A691D32
 for <freebsd-fs@freebsd.org>; Wed, 19 Feb 2014 14:47:47 +0000 (UTC)
Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11])
 by proxypop03.sare.net (Postfix) with ESMTPSA id 875C09DCAEF
 for <freebsd-fs@freebsd.org>; Wed, 19 Feb 2014 15:47:40 +0100 (CET)
From: Borja Marcos <borjam@sarenet.es>
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: Possible ZFS bug? Insufficient sanity checks
Date: Wed, 19 Feb 2014 15:47:35 +0100
Message-Id: <BF2D310C-7B8F-413D-9D20-A887236C5913@sarenet.es>
To: "freebsd-fs@FreeBSD.org Filesystems" <freebsd-fs@freebsd.org>
Mime-Version: 1.0 (Apple Message framework v1283)
X-Mailer: Apple Mail (2.1283)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Feb 2014 14:47:47 -0000


Hello,

Doing something stupid I managed to corrupt a ZFS pool. I think it =
shouldn=B4t have been possible. I hope to reproduce it next week, but
it's better to share just in case.=20

I know what I did was quite foolish, and no dolphins were hurt as it's =
just a test machine.

FreeBSD pruebassd 10.0-STABLE FreeBSD 10.0-STABLE #8: Wed Feb 12 =
09:32:29 UTC 2014     root@pruebassd:/usr/obj/usr/src/sys/PRUEBASSD2_10  =
amd64

The pool has one RAIDZ vdev, with 6 OCZ Vertex 4 SSDs.

The stupid manoeuvre was as follows:

1) Pick up one of the disks at random.

2) Extract it.

So far so good. zpool warns that the pool is in degraded state, but =
everythng works.

3) Take the disk to a different system. Insert it and create a new pool =
on it. Just one disk, I was testing a data corruption issue with a "mfi" =
adapter.

4) Do some tests.

5) Probably (not sure) destroy the newly created pool.

6) take the ssd to the original machine -> insert it

And here the fun comes.

7) zpool online cashopul (the previously removed disk)

8) KABOOM! zpool warns of data corruption all over the place. -> most =
files corrupted.


My  theory: When doing the "zpool online" ZFS just checked the disk =
serial number or identification, and, being the same, *not verifying the =
pool identity* it mixed it into the pool with disastrous consequences.

What I think should have happened instead:

- ZFS should verify the physical disk "identity" *and* verify that the =
ZFS metadata on the disk indeed belongs to the pool on which it's being =
"onlined".


Again, I do know that I did something very foolish (I behave in a =
foolish and careless way with that machine on purpose).

I'll try to reproduce this next week (I'm waiting to receive some SAS  =
cables).


Cheers,


Borja.