Date: Wed, 27 Apr 2016 17:59:04 +0100 From: Steven Hartland <killing@multiplay.co.uk> To: freebsd-fs@freebsd.org Subject: Re: zfs on nvme: gnop breaks pool, zfs gets stuck Message-ID: <5720EFD8.60900@multiplay.co.uk> In-Reply-To: <20160427141436.GA60370@in-addr.com> References: <20160427152244.ff36ff74ae64c1f86fdc960a@aei.mpg.de> <20160427141436.GA60370@in-addr.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 27/04/2016 15:14, Gary Palmer wrote: > On Wed, Apr 27, 2016 at 03:22:44PM +0200, Gerrit K?hn wrote: >> Hello all, >> >> I have a set of three NVME-ssds on PCIe-converters: >> >> --- >> root@storage:~ # nvmecontrol devlist >> nvme0: SAMSUNG MZVPV512HDGL-00000 >> nvme0ns1 (488386MB) >> nvme1: SAMSUNG MZVPV512HDGL-00000 >> nvme1ns1 (488386MB) >> nvme2: SAMSUNG MZVPV512HDGL-00000 >> nvme2ns1 (488386MB) >> --- >> >> >> I want to use a z1 raid on these and created 1m-aligned partitions: >> >> --- >> root@storage:~ # gpart show >> => 34 1000215149 nvd0 GPT (477G) >> 34 2014 - free - (1.0M) >> 2048 1000212480 1 freebsd-zfs (477G) >> 1000214528 655 - free - (328K) >> >> => 34 1000215149 nvd1 GPT (477G) >> 34 2014 - free - (1.0M) >> 2048 1000212480 1 freebsd-zfs (477G) >> 1000214528 655 - free - (328K) >> >> => 34 1000215149 nvd2 GPT (477G) >> 34 2014 - free - (1.0M) >> 2048 1000212480 1 freebsd-zfs (477G) >> 1000214528 655 - free - (328K) >> --- >> >> >> After creating a zpool I recognized that it was using ashift=9. I vaguely >> remembered that SSDs usually have 4k (or even larger) sectors, so I >> destroyed the pool and set up gnop-providers with -S 4k to get ashift=12. >> This worked as expected: >> >> --- >> pool: flash >> state: ONLINE >> scan: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> flash ONLINE 0 0 0 >> raidz1-0 ONLINE 0 0 0 >> gpt/flash0.nop ONLINE 0 0 0 >> gpt/flash1.nop ONLINE 0 0 0 >> gpt/flash2.nop ONLINE 0 0 0 >> >> errors: No known data errors >> --- >> >> >> This pool can be used, exported and imported just fine as far as I can >> tell. Then I exported the pool and destroyed the gnop-providers. When >> starting with "advanced format" hdds some years ago, this was the way to >> make zfs recognize the disks with ashift=12. However, destroying the >> gnop-devices appears to have crashed the pool in this case: >> >> --- >> root@storage:~ # zpool import >> pool: flash >> id: 4978839938025863522 >> state: ONLINE >> status: One or more devices contains corrupted data. >> action: The pool can be imported using its name or numeric identifier. >> see: http://illumos.org/msg/ZFS-8000-4J >> config: >> >> flash ONLINE >> raidz1-0 ONLINE >> 11456367280316708003 UNAVAIL corrupted >> data gptid/55ae71aa-eb84-11e5-9298-0cc47a6c7484 ONLINE >> 6761786983139564172 UNAVAIL corrupted >> data >> --- >> >> >> How can the pool be online, when two of three devices are unavailable? I >> tried to import the pool nevertheless, but the zpool command got stuck in >> state tx-tx. "soft" reboot got stuck, too. I had to push the reset button >> to get my system back (still with a corrupt pool). I cleared the labels >> and re-did everything: the issue is perfectly reproducible. >> >> Am I doing something utterly wrong? Why is removing the gnop-nodes >> tampering with the devices (I think I did exactly this dozens of times on >> normal hdds during that previous years, and it always worked just fine)? >> And finally, why does the zpool import fail without any error message and >> requires me to reset the system? >> The system is 10.2-RELEASE-p9, update is scheduled for later this week >> (just in case it would make sense to try this again with 10.3). Any other >> hints are most welcome. > Did you destroy the gnop devices with the pool online? In the procedure > I remember you export the pool, destroy the gnop devices, and then > reimport the pool. > > Also, you only need to do the gnop trick for a single device in the pool > for the entire pool's ashift to be changed AFAIK. There is a sysctl > now too > > vfs.zfs.min_auto_ashift > > which lets you manage the ashift on a new pool without having to try > the gnop trick > This applies to each top level vdev that makes up a pool, so its not limited to just new pool creation, so there should be never a reason to use the gnop hack to set ashift. Regards Steve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5720EFD8.60900>