From owner-freebsd-fs@freebsd.org  Wed Apr 27 13:22:54 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A94A9B1E7CF
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 27 Apr 2016 13:22:54 +0000 (UTC)
 (envelope-from gerrit.kuehn@aei.mpg.de)
Received: from umail.aei.mpg.de (umail.aei.mpg.de [194.94.224.6])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2EBC81F04
 for <freebsd-fs@freebsd.org>; Wed, 27 Apr 2016 13:22:53 +0000 (UTC)
 (envelope-from gerrit.kuehn@aei.mpg.de)
Received: from mailgate.aei.mpg.de (mailgate.aei.mpg.de [194.94.224.5])
 by umail.aei.mpg.de (Postfix) with ESMTP id 9E5F520028D
 for <freebsd-fs@freebsd.org>; Wed, 27 Apr 2016 15:22:44 +0200 (CEST)
Received: from mailgate.aei.mpg.de (localhost [127.0.0.1])
 by localhost (Postfix) with SMTP id 8EDD9406ADE
 for <freebsd-fs@freebsd.org>; Wed, 27 Apr 2016 15:22:44 +0200 (CEST)
Received: from intranet.aei.uni-hannover.de (ahin1.aei.uni-hannover.de
 [130.75.117.40])
 by mailgate.aei.mpg.de (Postfix) with ESMTP id 61ADB406ADB
 for <freebsd-fs@freebsd.org>; Wed, 27 Apr 2016 15:22:44 +0200 (CEST)
Received: from arc.aei.uni-hannover.de ([130.75.117.1])
 by intranet.aei.uni-hannover.de (IBM Domino Release 9.0.1FP5)
 with ESMTP id 2016042715224342-55002 ;
 Wed, 27 Apr 2016 15:22:43 +0200 
Date: Wed, 27 Apr 2016 15:22:44 +0200
From: Gerrit =?ISO-8859-1?Q?K=FChn?= <gerrit.kuehn@aei.mpg.de>
To: freebsd-fs@freebsd.org
Subject: zfs on nvme: gnop breaks pool, zfs gets stuck
Message-Id: <20160427152244.ff36ff74ae64c1f86fdc960a@aei.mpg.de>
Organization: Max Planck Gesellschaft
X-Mailer: Sylpheed 3.4.2 (GTK+ 2.24.22; amd64-portbld-freebsd10.0)
Mime-Version: 1.0
X-MIMETrack: Itemize by SMTP Server on intranet/aei-hannover(Release
 9.0.1FP5|November 22, 2015) at 27/04/2016 15:22:43,
 Serialize by Router on intranet/aei-hannover(Release 9.0.1FP5|November  22,
 2015) at 27/04/2016 15:22:43,
 Serialize complete at 27/04/2016 15:22:43
X-TNEFEvaluated: 1
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII
X-PMX-Version: 6.0.2.2308539, Antispam-Engine: 2.7.2.2107409,
 Antispam-Data: 2016.4.27.131516
X-PerlMx-Spam: Gauge=IIIIIIII, Probability=8%, Report='
 HTML_00_01 0.05, HTML_00_10 0.05, MIME_LOWER_CASE 0.05, BODY_SIZE_3000_3999 0,
 BODY_SIZE_5000_LESS 0, BODY_SIZE_7000_LESS 0, NO_URI_HTTPS 0,
 SINGLE_URI_IN_BODY 0, __ANY_URI 0, __C230066_P5 0, __CP_URI_IN_BODY 0, __CT 0,
 __CTE 0, __CT_TEXT_PLAIN 0, __FRAUD_MONEY_BIG_COIN 0,
 __FRAUD_MONEY_BIG_COIN_DIG 0, __HAS_FROM 0, __HAS_MSGID 0, __HAS_X_MAILER 0,
 __MIME_TEXT_ONLY 0, __MIME_VERSION 0, __SANE_MSGID 0, __SINGLE_URI_TEXT 0,
 __STOCK_PHRASE_24 0, __SUBJ_ALPHA_END 0, __SUBJ_ALPHA_START 0,
 __SUBJ_ALPHA_START_END 0, __TO_MALFORMED_2 0, __TO_NO_NAME 0, __URI_IN_BODY 0,
 __URI_NO_MAILTO 0, __URI_NO_WWW 0, __URI_NS , __URI_WITH_PATH 0'
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Apr 2016 13:22:54 -0000

Hello all,

I have a set of three NVME-ssds on PCIe-converters:

---
root@storage:~ # nvmecontrol devlist
 nvme0: SAMSUNG MZVPV512HDGL-00000
    nvme0ns1 (488386MB)
 nvme1: SAMSUNG MZVPV512HDGL-00000
    nvme1ns1 (488386MB)
 nvme2: SAMSUNG MZVPV512HDGL-00000
    nvme2ns1 (488386MB)
---


I want to use a z1 raid on these and created 1m-aligned partitions:

---
root@storage:~ # gpart show
=>        34  1000215149  nvd0  GPT  (477G)
          34        2014        - free -  (1.0M)
        2048  1000212480     1  freebsd-zfs  (477G)
  1000214528         655        - free -  (328K)

=>        34  1000215149  nvd1  GPT  (477G)
          34        2014        - free -  (1.0M)
        2048  1000212480     1  freebsd-zfs  (477G)
  1000214528         655        - free -  (328K)

=>        34  1000215149  nvd2  GPT  (477G)
          34        2014        - free -  (1.0M)
        2048  1000212480     1  freebsd-zfs  (477G)
  1000214528         655        - free -  (328K)
---


After creating a zpool I recognized that it was using ashift=9. I vaguely
remembered that SSDs usually have 4k (or even larger) sectors, so I
destroyed the pool and set up gnop-providers with -S 4k to get ashift=12.
This worked as expected:

---
  pool: flash
 state: ONLINE
  scan: none requested
config:

	NAME                STATE     READ WRITE CKSUM
	flash               ONLINE       0     0     0
	  raidz1-0          ONLINE       0     0     0
	    gpt/flash0.nop  ONLINE       0     0     0
	    gpt/flash1.nop  ONLINE       0     0     0
	    gpt/flash2.nop  ONLINE       0     0     0

errors: No known data errors
---


This pool can be used, exported and imported just fine as far as I can
tell. Then I exported the pool and destroyed the gnop-providers. When
starting with "advanced format" hdds some years ago, this was the way to
make zfs recognize the disks with ashift=12. However, destroying the
gnop-devices appears to have crashed the pool in this case:

---
root@storage:~ # zpool import
   pool: flash
     id: 4978839938025863522
  state: ONLINE
 status: One or more devices contains corrupted data.
 action: The pool can be imported using its name or numeric identifier.
   see: http://illumos.org/msg/ZFS-8000-4J
 config:

	flash                                           ONLINE
	  raidz1-0                                      ONLINE
	    11456367280316708003                        UNAVAIL  corrupted
data gptid/55ae71aa-eb84-11e5-9298-0cc47a6c7484  ONLINE
	    6761786983139564172                         UNAVAIL  corrupted
data
---


How can the pool be online, when two of three devices are unavailable? I
tried to import the pool nevertheless, but the zpool command got stuck in
state tx-tx. "soft" reboot got stuck, too. I had to push the reset button
to get my system back (still with a corrupt pool). I cleared the labels
and re-did everything: the issue is perfectly reproducible.

Am I doing something utterly wrong? Why is removing the gnop-nodes
tampering with the devices (I think I did exactly this dozens of times on
normal hdds during that previous years, and it always worked just fine)?
And finally, why does the zpool import fail without any error message and
requires me to reset the system?
The system is 10.2-RELEASE-p9, update is scheduled for later this week
(just in case it would make sense to try this again with 10.3). Any other
hints are most welcome.


cu
  Gerrit