Date: Sat, 15 Aug 2015 12:38:46 -0500 From: Karl Denninger <karl@denninger.net> To: freebsd-fs@freebsd.org Subject: Re: Panic in ZFS during zfs recv (while snapshots being destroyed) Message-ID: <55CF7926.1030901@denninger.net> In-Reply-To: <55BB443E.8040801@denninger.net> References: <55BB443E.8040801@denninger.net>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] Update: This /appears /to be related to attempting to send or receive a /cloned /snapshot. I use /beadm /to manage boot environments and the crashes have all come while send/recv-ing the root pool, which is the one where these clones get created. It is /not /consistent within a given snapshot when it crashes and a second attempt (which does a "recovery" send/receive) succeeds every time -- I've yet to have it panic twice sequentially. I surmise that the problem comes about when a file in the cloned snapshot is modified, but this is a guess at this point. I'm going to try to force replication of the problem on my test system. On 7/31/2015 04:47, Karl Denninger wrote: > I have an automated script that runs zfs send/recv copies to bring a > backup data set into congruence with the running copies nightly. The > source has automated snapshots running on a fairly frequent basis > through zfs-auto-snapshot. > > Recently I have started having a panic show up about once a week during > the backup run, but it's inconsistent. It is in the same place, but I > cannot force it to repeat. > > The trap itself is a page fault in kernel mode in the zfs code at > zfs_unmount_snap(); here's the traceback from the kvm (sorry for the > image link but I don't have a better option right now.) > > I'll try to get a dump, this is a production machine with encrypted swap > so it's not normally turned on. > > Note that the pool that appears to be involved (the backup pool) has > passed a scrub and thus I would assume the on-disk structure is ok..... > but that might be an unfair assumption. It is always occurring in the > same dataset although there are a half-dozen that are sync'd -- if this > one (the first one) successfully completes during the run then all the > rest will as well (that is, whenever I restart the process it has always > failed here.) The source pool is also clean and passes a scrub. > > traceback is at http://www.denninger.net/kvmimage.png; apologies for the > image traceback but this is coming from a remote KVM. > > I first saw this on 10.1-STABLE and it is still happening on FreeBSD > 10.2-PRERELEASE #9 r285890M, which I updated to in an attempt to see if > the problem was something that had been addressed. > > -- Karl Denninger karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ [-- Attachment #2 --] 0 *H 010 + 0 *H _0[0C)0 *H 010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA0 150421022159Z 200419022159Z0Z10 UUS10UFlorida10U Cuda Systems LLC10UKarl Denninger (OCSP)0"0 *H 0 X@vkY Tq/vE]5#֯MX\8LJ/V?5Da+ sJc*/r{ȼnS+ w")ąZ^DtdCOZ ~7Q '@a#ijc۴oZdB&!Ӝ-< ?HN5y 5}F|ef"Vلio74zn">a1qWuɖbFeGE&3(KhixG3!#e_XƬϜ/,$+;4y'Bz<qT9_?rRUpn5 Jn&Rx/p Jyel*pN8/#9u/YPEC)TY>~/˘N[vyiDKˉ,^" ?$T8 v&K%z8C @?K{9f`+@,|Mbia 007++0)0'+0http://cudasystems.net:88880 U0 0 `HB0U0, `HB OpenSSL Generated Certificate0U-h\Ff Y0U#0$q}ݽʒm50U0karl@denninger.net0 *H Owbabɺx&Uk[(Oj!%p MQ0I!#QH}.>~2&D}<wm_>V6v]f>=Nn+8;q wfΰ/RLyUG#b}n!Dր_up|_ǰc/%ۥ nN8:d;-UJd/m1~VނיnN I˾$tF1&}|?q?\đXԑ&\4V<lKۮ3%Am_(q-(cAeGX)f}-˥6cv~Kg8m~v;|9:-iAPқ6ېn-.)<[$KJtt/L4ᖣ^Cmu4vb{+BG$M0c\[MR|0FԸP&78"4p#}DZ9;V9#>Sw"[UP7100010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA)0 + !0 *H 1 *H 0 *H 1 150815173846Z0# *H 1=Խ%Z)R^c41 HrB0l *H 1_0]0 `He*0 `He0 *H 0*H 0 *H @0+0 *H (0 +710010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA)0*H 1010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA)0 *H &SPo|T2B BAczO1MNh8YϨ/RocF}`o-!O;ڈE7:Wkh>]'G*xe&eA<>6gTޖ_+6'C/x0JӅ7z\0<
