Date: Mon, 21 Nov 2016 12:05:18 -0600 From: Karl Denninger <karl@denninger.net> To: freebsd-stable@freebsd.org Subject: Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE Message-ID: <ff050b05-f971-dbe4-8a5b-6c6a472b8925@denninger.net> In-Reply-To: <4d4909b7-c44b-996e-90e1-ca446e8e4813@multiplay.co.uk> References: <3d4f25c9-a262-a373-ec7e-755325f8810b@denninger.net> <9adecd24-6659-0da5-5c05-d0d3957a2cb3@denninger.net> <CANCZdfq5QCDNhLY5GOpmBoh5ONYy2VPteuaMhQ2=3v%2B0vcoM0g@mail.gmail.com> <0f58b11f-0bca-bc08-6f90-4e6e530f9956@denninger.net> <43a67287-f4f8-5d3e-6c5e-b3599c6adb4d@multiplay.co.uk> <76551fd6-0565-ee6c-b0f2-7d472ad6a4b3@denninger.net> <25ff3a3e-77a9-063b-e491-8d10a06e6ae2@multiplay.co.uk> <26e092b2-17c6-8744-5035-d0853d733870@denninger.net> <d2afc0b0-0e7f-e7ac-fb21-fa4ffd1c1003@multiplay.co.uk> <f9a4a12d-62df-482d-feeb-9d9f64de3e55@denninger.net> <4d4909b7-c44b-996e-90e1-ca446e8e4813@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] On 10/17/2016 18:32, Steven Hartland wrote: > > On 17/10/2016 22:50, Karl Denninger wrote: >> I will make some effort on the sandbox machine to see if I can come up >> with a way to replicate this. I do have plenty of spare larger drives >> laying around that used to be in service and were obsolesced due to >> capacity -- but what I don't know if whether the system will misbehave >> if the source is all spinning rust. >> >> In other words: >> >> 1. Root filesystem is mirrored spinning rust (production is mirrored >> SSDs) >> >> 2. Backup is mirrored spinning rust (of approx the same size) >> >> 3. Set up auto-snapshot exactly as the production system has now (which >> the sandbox is NOT since I don't care about incremental recovery on that >> machine; it's a sandbox!) >> >> 4. Run a bunch of build-somethings (e.g. buildworlds, cross-build for >> the Pi2s I have here, etc) to generate a LOT of filesystem entropy >> across lots of snapshots. >> >> 5. Back that up. >> >> 6. Export the backup pool. >> >> 7. Re-import it and "zfs destroy -r" the backup filesystem. >> >> That is what got me in a reboot loop after the *first* panic; I was >> simply going to destroy the backup filesystem and re-run the backup, but >> as soon as I issued that zfs destroy the machine panic'd and as soon as >> I re-attached it after a reboot it panic'd again. Repeat until I set >> trim=0. >> >> But... if I CAN replicate it that still shouldn't be happening, and the >> system should *certainly* survive attempting to TRIM on a vdev that >> doesn't support TRIMs, even if the removal is for a large amount of >> space and/or files on the target, without blowing up. >> >> BTW I bet it isn't that rare -- if you're taking timed snapshots on an >> active filesystem (with lots of entropy) and then make the mistake of >> trying to remove those snapshots (as is the case with a zfs destroy -r >> or a zfs recv of an incremental copy that attempts to sync against a >> source) on a pool that has been imported before the system realizes that >> TRIM is unavailable on those vdevs. >> >> Noting this: >> >> Yes need to find some time to have a look at it, but given how rare >> this is and with TRIM being re-implemented upstream in a totally >> different manor I'm reticent to spend any real time on it. >> >> What's in-process in this regard, if you happen to have a reference? > Looks like it may be still in review: https://reviews.csiden.org/r/263/ > > Having increased the kernel stack page count I have not had another instance of this in the last couple of weeks+, and I am running daily backup jobs as usual... So this *does not* appear to be an infinite recursion problem... -- Karl Denninger karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ [-- Attachment #2 --] 0 *H 010 `He 0 *H _0[0C)0 *H 010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA0 150421022159Z 200419022159Z0Z10 UUS10UFlorida10U Cuda Systems LLC10UKarl Denninger (OCSP)0"0 *H 0 X@vkY Tq/vE]5#֯MX\8LJ/V?5Da+ sJc*/r{ȼnS+ w")ąZ^DtdCOZ ~7Q '@a#ijc۴oZdB&!Ӝ-< ?HN5y 5}F|ef"Vلio74zn">a1qWuɖbFeGE&3(KhixG3!#e_XƬϜ/,$+;4y'Bz<qT9_?rRUpn5 Jn&Rx/p Jyel*pN8/#9u/YPEC)TY>~/˘N[vyiDKˉ,^" ?$T8 v&K%z8C @?K{9f`+@,|Mbia 007++0)0'+0http://cudasystems.net:88880 U0 0 `HB0U0, `HB OpenSSL Generated Certificate0U-h\Ff Y0U#0$q}ݽʒm50U0karl@denninger.net0 *H Owbabɺx&Uk[(Oj!%p MQ0I!#QH}.>~2&D}<wm_>V6v]f>=Nn+8;q wfΰ/RLyUG#b}n!Dր_up|_ǰc/%ۥ nN8:d;-UJd/m1~VނיnN I˾$tF1&}|?q?\đXԑ&\4V<lKۮ3%Am_(q-(cAeGX)f}-˥6cv~Kg8m~v;|9:-iAPқ6ېn-.)<[$KJtt/L4ᖣ^Cmu4vb{+BG$M0c\[MR|0FԸP&78"4p#}DZ9;V9#>Sw"[UP7100010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA)0 `He M0 *H 1 *H 0 *H 1 161121180518Z0O *H 1B@#x#Z k'dgf?oBt/^fy[$=gE5<%tt% 0l *H 1_0]0 `He*0 `He0 *H 0*H 0 *H @0+0 *H (0 +710010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA)0*H 1010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA)0 *H 5hgI~V|C%$/s -M6T+u }g//8(6{ܕ?py&rc*\chA1e|bH Hpk#n^Rh[|5LAĚchojG?mc nJV,<I"2Eq^2=+<ʘf }#skayp͎^aII7])&Ώ:Ռڑn}@RFyAYrӨr<)**f+O館5`s{(QyW_ռ^*pߔG1p{T@Q1L1jF\HdEG E*_`ŋ-fG{[NfM'區$[@#b- `?K0aPhm8I0ci !ond`t$?Mrۂ`fCR(obڼ&#jEeQy뤠RZmKt
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ff050b05-f971-dbe4-8a5b-6c6a472b8925>
