Date: Thu, 22 Jun 2017 00:33:40 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 219972] Unable to zpool export following some zfs recv Message-ID: <bug-219972-3630-Ks61K16TrD@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-219972-3630@https.bugs.freebsd.org/bugzilla/> References: <bug-219972-3630@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219972 --- Comment #3 from pfribeiro@gmail.com --- It seems the ability to reproduce this bug relies on there being more than one CPU core, relatively unused, on the same system. In the case of my original system, if I run 'dd if=/dev/zero of=/dev/null' while I try the steps mentioned in comment #1, then the bug is not reproducible. However, as soon as this command is killed, the bug is reproducible. On the forum thread linked to in comment #2, another user tried to reproduce the problem, but without success. I then configured two FreeBSD VMs on ESXi (host is a Xeon-D 1518, 4 core, with HyperThreading enabled), one running the VMDK image provided on the official FreeBSD download page, and another installed from the ISO image (with ZFS as root filesystem) to better mirror my installation on my original system, an Intel NUC5CPYH. Initially, I also could not reproduce the bug in the VMs, no matter how many times I tried. However, having observed the behaviour on the NUC installation, I then proceeded to change the CPU affinity on ESXi, so that the VM is allocated logical cores 0,2, and has minimum 1000MHz reserved. By running the import/export of 'slave' multiple times (after the respective zfs send/recv), I was eventually able to trigger this on the 56th run of import/export. Regarding the NUC installation, I can see that killing 'dd if=/dev/zero of=/dev/null' (ie, making the other core widely available) is only relevant for the import/export of 'slave' after the 'zfs send | recv' has taken place, which suggests that there is a race-condition of sorts in the zpool export ioctl code (which somehow relies on a previous recv). I would be happy to provide more diagnostics, however I would need further guidance from you, as I am not very familiar with the synchronization primitives of FreeBSD. I believe this bug will be hard to track down. I would also like to add that I have successfully (following the above caveats) reproduced this bug under more than one platform, with the following versions, all on amd64: 11.0-RELEASE-p1 #0 r306420 11.0-RELEASE-p9 #0: Tue Apr 11 08:48:40 UTC 2017 11.1-BETA2 #0 r320072: Sun Jun 18 18:45:14 BST 2017 (I compiled my own kernel with debugging on) Finally, for completeness my rudimentary test bash script is available at: https://pastebin.com/YcKSU1LA. You're welcome to check the related forum post from comment #2 as well. -- You are receiving this mail because: You are the assignee for the bug.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-219972-3630-Ks61K16TrD>
