Date: Fri, 22 Jul 2016 14:39:01 +0100 From: Steven Hartland <killing@multiplay.co.uk> To: Andriy Gapon <avg@FreeBSD.org>, Karl Denninger <karl@denninger.net>, freebsd-stable@FreeBSD.org Subject: Re: Panic on BETA1 in the ZFS subsystem Message-ID: <89b66fd6-09d8-d8a2-4894-3a6e5f73a0bb@multiplay.co.uk> In-Reply-To: <6cb46059-85c8-0c3b-7346-773647f1a962@FreeBSD.org> References: <8f44bc09-1237-44d0-fe7a-7eb9cf4fe85b@denninger.net> <54e5974c-312e-c33c-ab83-9e1148618ddc@FreeBSD.org> <97cf5283-683b-83fd-c484-18c14973b065@denninger.net> <c2f24b1e-be84-bcdd-ea0b-515cd2aca266@FreeBSD.org> <1f064549-fa72-fe9b-d66d-85923437bb9b@denninger.net> <6cb46059-85c8-0c3b-7346-773647f1a962@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 21/07/2016 13:52, Andriy Gapon wrote: > On 21/07/2016 15:25, Karl Denninger wrote: >> The crash occurred during a backup script operating, which is (roughly) >> the following: >> >> zpool import -N backup (mount the pool to copy to) >> >> iterate over a list of zfs filesystems and... >> >> zfs rename fs@zfs-base fs@zfs-old >> zfs snapshot fs@zfs-base >> zfs send -RI fs@zfs-old fs@zfs-base | zfs receive -Fudv backup >> zfs destroy -vr fs@zfs-old >> >> The first filesystem to be done is the rootfs, that is when it panic'd, >> and from the traceback it appears that the Zio's in there are from the >> backup volume, so the answer to your question is "yes". > I think that what happened here was that a quite large number of TRIM > requests was queued by ZFS before it had a chance to learn that the > target vdev in the backup pool did not support TRIM. So, when the the > first request failed with ENOTSUP the vdev was marked as not supporting > TRIM. After that all subsequent requests were failed without sending > them down the storage stack. But the way it is done means that all the > requests were processed by the nested zio_execute() calls on the same > stack. And that lead to the stack overflow. > > Steve, do you think that this is a correct description of what happened? > > The state of the pools that you described below probably contributed to > the avalanche of TRIMs that caused the problem. > Yes does indeed sound like what happened to me. Regards Steve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?89b66fd6-09d8-d8a2-4894-3a6e5f73a0bb>