Date: Thu, 27 Aug 2015 15:22:41 -0500 From: Karl Denninger <karl@denninger.net> To: freebsd-fs@freebsd.org Subject: Re: Panic in ZFS during zfs recv (while snapshots being destroyed) Message-ID: <55DF7191.2080409@denninger.net> In-Reply-To: <55CF7926.1030901@denninger.net> References: <55BB443E.8040801@denninger.net> <55CF7926.1030901@denninger.net>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] On 8/15/2015 12:38, Karl Denninger wrote: > Update: > > This /appears /to be related to attempting to send or receive a > /cloned /snapshot. > > I use /beadm /to manage boot environments and the crashes have all > come while send/recv-ing the root pool, which is the one where these > clones get created. It is /not /consistent within a given snapshot > when it crashes and a second attempt (which does a "recovery" > send/receive) succeeds every time -- I've yet to have it panic twice > sequentially. > > I surmise that the problem comes about when a file in the cloned > snapshot is modified, but this is a guess at this point. > > I'm going to try to force replication of the problem on my test system. > > On 7/31/2015 04:47, Karl Denninger wrote: >> I have an automated script that runs zfs send/recv copies to bring a >> backup data set into congruence with the running copies nightly. The >> source has automated snapshots running on a fairly frequent basis >> through zfs-auto-snapshot. >> >> Recently I have started having a panic show up about once a week during >> the backup run, but it's inconsistent. It is in the same place, but I >> cannot force it to repeat. >> >> The trap itself is a page fault in kernel mode in the zfs code at >> zfs_unmount_snap(); here's the traceback from the kvm (sorry for the >> image link but I don't have a better option right now.) >> >> I'll try to get a dump, this is a production machine with encrypted swap >> so it's not normally turned on. >> >> Note that the pool that appears to be involved (the backup pool) has >> passed a scrub and thus I would assume the on-disk structure is ok..... >> but that might be an unfair assumption. It is always occurring in the >> same dataset although there are a half-dozen that are sync'd -- if this >> one (the first one) successfully completes during the run then all the >> rest will as well (that is, whenever I restart the process it has always >> failed here.) The source pool is also clean and passes a scrub. >> >> traceback is at http://www.denninger.net/kvmimage.png; apologies for the >> image traceback but this is coming from a remote KVM. >> >> I first saw this on 10.1-STABLE and it is still happening on FreeBSD >> 10.2-PRERELEASE #9 r285890M, which I updated to in an attempt to see if >> the problem was something that had been addressed. >> >> > > -- > Karl Denninger > karl@denninger.net <mailto:karl@denninger.net> > /The Market Ticker/ > /[S/MIME encrypted email preferred]/ Second update: I have now taken another panic on 10.2-Stable, same deal, but without any cloned snapshots in the source image. I had thought that removing cloned snapshots might eliminate the issue; that is now out the window. It ONLY happens on this one filesystem (the root one, incidentally) which is fairly-recently created as I moved this machine from spinning rust to SSDs for the OS and root pool -- and only when it is being backed up by using zfs send | zfs recv (with the receive going to a different pool in the same machine.) I have yet to be able to provoke it when using zfs send to copy to a different machine on the same LAN, but given that it is not able to be reproduced on demand I can't be certain it's timing related (e.g. performance between the two pools in question) or just that I haven't hit the unlucky combination. This looks like some sort of race condition and I will continue to see if I can craft a case to make it occur "on demand" -- Karl Denninger karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ [-- Attachment #2 --] 0 *H 010 `He 0 *H _0[0C)0 *H 010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA0 150421022159Z 200419022159Z0Z10 UUS10UFlorida10U Cuda Systems LLC10UKarl Denninger (OCSP)0"0 *H 0 X@vkY Tq/vE]5#֯MX\8LJ/V?5Da+ sJc*/r{ȼnS+ w")ąZ^DtdCOZ ~7Q '@a#ijc۴oZdB&!Ӝ-< ?HN5y 5}F|ef"Vلio74zn">a1qWuɖbFeGE&3(KhixG3!#e_XƬϜ/,$+;4y'Bz<qT9_?rRUpn5 Jn&Rx/p Jyel*pN8/#9u/YPEC)TY>~/˘N[vyiDKˉ,^" ?$T8 v&K%z8C @?K{9f`+@,|Mbia 007++0)0'+0http://cudasystems.net:88880 U0 0 `HB0U0, `HB OpenSSL Generated Certificate0U-h\Ff Y0U#0$q}ݽʒm50U0karl@denninger.net0 *H Owbabɺx&Uk[(Oj!%p MQ0I!#QH}.>~2&D}<wm_>V6v]f>=Nn+8;q wfΰ/RLyUG#b}n!Dր_up|_ǰc/%ۥ nN8:d;-UJd/m1~VނיnN I˾$tF1&}|?q?\đXԑ&\4V<lKۮ3%Am_(q-(cAeGX)f}-˥6cv~Kg8m~v;|9:-iAPқ6ېn-.)<[$KJtt/L4ᖣ^Cmu4vb{+BG$M0c\[MR|0FԸP&78"4p#}DZ9;V9#>Sw"[UP7100010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA)0 `He M0 *H 1 *H 0 *H 1 150827202241Z0O *H 1B@ wibQK}@MʣCr[, S1}X ѷL70l *H 1_0]0 `He*0 `He0 *H 0*H 0 *H @0+0 *H (0 +710010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA)0*H 1010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA)0 *H V;Xj>盹, $##B%yX}Fln76.1s+W[(['UsE,3<&F W( )&=3BVk~s%sN\h~!HT-|+G6,ceI{pr%C8} &K>n>b~'/8X+B/ >[S aIsC*a@1QH["K˔'??%+_E/PGy_s`ɖ"q4hK6a*{ (6hH1j.b&5K֪E=^ 82̓5},:G==/6#جG*Yq]pMSg :jpVC~rK P- sd TǑw;^?ReE%=7}oQz~TX?䭗pm](P8 y ҇h5\?D2 ~j9)y91
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55DF7191.2080409>
