Date: Thu, 18 Jul 2019 15:35:17 -0500 From: Karl Denninger <karl@denninger.net> To: freebsd-stable@freebsd.org Subject: Re: Kernel panic in zfs code; 12-STABLE Message-ID: <501734b7-22ec-5c01-eea5-26b458945e7e@denninger.net> In-Reply-To: <d6cf2edf-81f2-fb63-fa39-c310fe7258a7@grosbein.net> References: <61e5debd-b440-16c9-2a70-0912634e52aa@denninger.net> <d6cf2edf-81f2-fb63-fa39-c310fe7258a7@grosbein.net>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --]
On 7/18/2019 15:19, Eugene Grosbein wrote:
> 19.07.2019 3:13, Karl Denninger wrote:
>
>> FreeBSD 12.0-STABLE #2 r349024M: Thu Jun 13 18:01:16 CDT 2019
>> karl@NewFS.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP
>>
>> Note -- no patches of any sort in the ZFS code; I am NOT running any of
>> my former patch set.
>>
>> NewFS.denninger.net dumped core - see /var/crash/vmcore.8
>>
>> Thu Jul 18 15:02:54 CDT 2019
>>
>> FreeBSD NewFS.denninger.net 12.0-STABLE FreeBSD 12.0-STABLE #2 r349024M:
>> Thu Jun 13 18:01:16 CDT 2019
>> karl@NewFS.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP amd64
>>
>> panic: double fault
> [skip]
>
>> #283 0xffffffff82748d91 in zio_vdev_io_done (zio=0xfffff8000b8b8000)
>> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3376
>> #284 0xffffffff82744eac in zio_execute (zio=0xfffff8000b8b8000)
>> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786
>> #285 0xffffffff80c3b7f4 in taskqueue_run_locked (queue=0xfffff801a8b35100)
>> at /usr/src/sys/kern/subr_taskqueue.c:467
>> #286 0xffffffff80c3cb28 in taskqueue_thread_loop (arg=<value optimized out>)
>> at /usr/src/sys/kern/subr_taskqueue.c:773
>> #287 0xffffffff80b9ab23 in fork_exit (
>> callout=0xffffffff80c3ca90 <taskqueue_thread_loop>,
>> arg=0xfffff801a0577520, frame=0xfffffe009d4edc00)
>> at /usr/src/sys/kern/kern_fork.c:1063
>> #288 0xffffffff810b367e in fork_trampoline ()
>> at /usr/src/sys/amd64/amd64/exception.S:996
>> #289 0x0000000000000000 in ?? ()
>> Current language: auto; currently minimal
>> (kgdb)
> You have "double fault" and completely insane number of stack frames in the trace.
> This is obviously infinite recursion resulting in kernel stack overflow and panic.
Yes, but.... why and how?
What's executing at the time is this command:
zfs send -RI $i@zfs-old $i@zfs-base | zfs receive -Fudv $BACKUP
Which in turn results in the old snapshots on the target not on the
source being deleted, then the new ones being sent. It never gets to
the sending part; it blows up during the delete of the OLD snapshots.
The one(s) it deletes, however, it DOES delete. When the box is
rebooted those two snapshots on the target are indeed gone.
That is, it is NOT getting "stuck" on one (which would imply there's an
un-detected fault in the filesystem on the target in the metadata for
that snapshot, resulting in a recursive call that blows up the stack)
and it never gets to send the new snapshot, so whatever is going on is
NOT on the source filesystem. Neither source or destination shows any
errors on the filesystem; both pools are healthy with zero error counts.
Therefore the question -- is the system queueing enough work to blow the
stack *BUT* the work it queues is all legitimate? If so there's a
serious problem in the way the code now functions in that an "ordinary"
operation can result in what amounts to kernel stack exhaustion.
One note -- I haven't run this backup for the last five days, as I do it
manually and I've been out of town. Previous running it on a daily
basis completed without trouble. This smells like a backlog of "things
to do" when the send runs that results in the allegedly-infinite
recursion (that isn't really infinite) that runs the stack out of space
-- and THAT implies that the system is trying to queue a crazy amount of
work on a recursive basis for what is a perfectly-legitimate operation
-- which it should *NOT* do.
--
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
[-- Attachment #2 --]
0 *H
010
`He 0 *H
00 H^Ōc!5
H0
*H
010 UUS10UFlorida10U Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA0
170817164217Z
270815164217Z0{10 UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0"0
*H
0
h-5B>[;olӴ0~͎O9}9Ye*$g!ukvʶLzN`jL>MD'7U 45CB+kY`bd~b*c3Ny-78ju]9HeuέsӬDؽmgwER?&UURj'}9nWD i`XcbGz \gG=u%\Oi13ߝ4
K44pYQr]Ie/r0+eEޝݖ0C15Mݚ@JSZ(zȏ NTa(25DD5.l<g[[ZarQQ%Buȴ~~`IohRbʳڟu2MS8EdFUClCMaѳ !}ș+2k/bųE,n当ꖛ\(8WV8 d]b yXw ܊:I39
00U]^§Q\ӎ0U#0T039N0b010 UUS10UFlorida10U Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA @Ui0U0 0U0
*H
:P U!>vJnio-#ן]WyujǑR̀Q
nƇ!GѦFg\yLxgw=OPycehf[}ܷ['4ڝ\[p 6\o.B&JF"ZC{;*o*mcCcLY߾`
t*S!(`]DHP5A~/NPp6=mhk밣'doA$86hm5ӚS@jެEgl
)0JG`%k35PaC?σ
׳HEt}!P㏏%*BxbQwaKG$6h¦Mve;[o-Iی&
I,Tcߎ#t wPA@l0P+KXBպT zGv;NcI3&JĬUPNa?/%W6G۟N000 k#Xd\=0
*H
0{10 UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0
170817212120Z
220816212120Z0W10 UUS10UFlorida10U
Cuda Systems LLC10Ukarl@denninger.net0"0
*H
0
T[I-ΆϏ dn;Å@שy.us~_ZG%<MYd\gvfnsa1'6Egyjs"C [{~_K Pn+<*pv#Q+H/7[-vqDV^U>f%GX)H.|l`M(Cr>е͇6#odc"YljҦln8@5SA0&ۖ"OGj?UDWZ5 dDB7k-)9Izs-JAv
J6L$Ն1SmY.Lqw*SH;EF'DĦH]MOgQQ|Mٙג2Z9y@y]}6ٽeY9Y2xˆ$T=eCǺǵbn֛{j|@LLt1[Dk5:$= ` M 00<+00.0,+0 http://ocsp.cudasystems.net:88880 U0 0 `HB0U0U%0++03 `HB
&$OpenSSL Generated Client Certificate0U%՞V=;bzQ0U#0]^§Q\ӎϡ010 UUS10UFlorida10U Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA H^Ōc!5
H0U0karl@denninger.net0
*H
۠A0-j%--$%g2#ޡ1^>{K+uGEv1ş7Af&b&O;.;A5*U)ND2bF|\=]<sˋL!wrw٧>YMÄ3\mWR hSv!_zvl? 3_ xU%\^#O*Gk̍YI_&Fꊛ@&1n } ͬ:{hTP3B.;bU8:Z=^Gw8!k-@xE@i,+'Iᐚ:fhztX7/(hY` O.1}a`%RW^akǂpCAufgDix UTЩ/7}%=jnVZvcF<M=
2^GKH5魉
_O4ެByʈySkw=5@h.0z>
W1000{10 UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA k#Xd\=0
`He E0 *H
1 *H
0 *H
1
190718203517Z0O *H
1B@B<X1h/`dI"<ۡRZ&F!<6kU
p9=z8&Jzv^0l *H
1_0]0 `He*0 `He0
*H
0*H
0
*H
@0+0
*H
(0 +7100{10 UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA k#Xd\=0*H
10{10 UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA k#Xd\=0
*H
dLq#6UZw:X.+8%vj\
U!dlxPjjsB?4v#*GؽvG@97Z&LL"*~add
3NSU{{ׁΐ#|Ϊ-`ZL_.Kcferz:FCjTIo.
Go'KLT)
x^ AQI,=$SP@><>O]愘@_iћ&r4NF@ izWN'a.*h-dkH}|ؒ w&6sLzofl弋ßhHq95ۗRєqItAcBz]5R3 mBt1k1ιyv~3s$Ƹl)c:}&텇犅jBbe:| Ж.EiEe <\lNK$Ok
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?501734b7-22ec-5c01-eea5-26b458945e7e>
