Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 18 Jul 2019 16:17:17 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-stable@freebsd.org
Subject:   Re: Kernel panic in zfs code; 12-STABLE
Message-ID:  <6a3d5e55-6d50-3e08-63e4-dd1cd2209abe@denninger.net>
In-Reply-To: <501734b7-22ec-5c01-eea5-26b458945e7e@denninger.net>
References:  <61e5debd-b440-16c9-2a70-0912634e52aa@denninger.net> <d6cf2edf-81f2-fb63-fa39-c310fe7258a7@grosbein.net> <501734b7-22ec-5c01-eea5-26b458945e7e@denninger.net>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms080701050702040803020600
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On 7/18/2019 15:35, Karl Denninger wrote:
> On 7/18/2019 15:19, Eugene Grosbein wrote:
>> 19.07.2019 3:13, Karl Denninger wrote:
>>
>>> FreeBSD 12.0-STABLE #2 r349024M: Thu Jun 13 18:01:16 CDT 2019   =20
>>> karl@NewFS.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP
>>>
>>> Note -- no patches of any sort in the ZFS code; I am NOT running any =
of
>>> my former patch set.
>>>
>>> NewFS.denninger.net dumped core - see /var/crash/vmcore.8
>>>
>>> Thu Jul 18 15:02:54 CDT 2019
>>>
>>> FreeBSD NewFS.denninger.net 12.0-STABLE FreeBSD 12.0-STABLE #2 r34902=
4M:
>>> Thu Jun 13 18:01:16 CDT 2019   =20
>>> karl@NewFS.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP  am=
d64
>>>
>>> panic: double fault
>> [skip]
>>
>>> #283 0xffffffff82748d91 in zio_vdev_io_done (zio=3D0xfffff8000b8b8000=
)
>>>     at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:=
3376
>>> #284 0xffffffff82744eac in zio_execute (zio=3D0xfffff8000b8b8000)
>>>     at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:=
1786
>>> #285 0xffffffff80c3b7f4 in taskqueue_run_locked (queue=3D0xfffff801a8=
b35100)
>>>     at /usr/src/sys/kern/subr_taskqueue.c:467
>>> #286 0xffffffff80c3cb28 in taskqueue_thread_loop (arg=3D<value optimi=
zed out>)
>>>     at /usr/src/sys/kern/subr_taskqueue.c:773
>>> #287 0xffffffff80b9ab23 in fork_exit (
>>>     callout=3D0xffffffff80c3ca90 <taskqueue_thread_loop>,
>>>     arg=3D0xfffff801a0577520, frame=3D0xfffffe009d4edc00)
>>>     at /usr/src/sys/kern/kern_fork.c:1063
>>> #288 0xffffffff810b367e in fork_trampoline ()
>>>     at /usr/src/sys/amd64/amd64/exception.S:996
>>> #289 0x0000000000000000 in ?? ()
>>> Current language:  auto; currently minimal
>>> (kgdb)
>> You have "double fault" and completely insane number of stack frames i=
n the trace.
>> This is obviously infinite recursion resulting in kernel stack overflo=
w and panic.
> Yes, but.... why and how?
>
> What's executing at the time is this command:
>
> zfs send -RI $i@zfs-old $i@zfs-base | zfs receive -Fudv $BACKUP
>
> Which in turn results in the old snapshots on the target not on the
> source being deleted, then the new ones being sent.=C2=A0 It never gets=
 to
> the sending part; it blows up during the delete of the OLD snapshots.
>
> The one(s) it deletes, however, it DOES delete.=C2=A0 When the box is
> rebooted those two snapshots on the target are indeed gone.
>
> That is, it is NOT getting "stuck" on one (which would imply there's an=

> un-detected fault in the filesystem on the target in the metadata for
> that snapshot, resulting in a recursive call that blows up the stack)
> and it never gets to send the new snapshot, so whatever is going on is
> NOT on the source filesystem.=C2=A0 Neither source or destination shows=
 any
> errors on the filesystem; both pools are healthy with zero error counts=
=2E
>
> Therefore the question -- is the system queueing enough work to blow th=
e
> stack *BUT* the work it queues is all legitimate?=C2=A0 If so there's a=

> serious problem in the way the code now functions in that an "ordinary"=

> operation can result in what amounts to kernel stack exhaustion.
>
> One note -- I haven't run this backup for the last five days, as I do i=
t
> manually and I've been out of town.=C2=A0 Previous running it on a dail=
y
> basis completed without trouble.=C2=A0 This smells like a backlog of "t=
hings
> to do" when the send runs that results in the allegedly-infinite
> recursion (that isn't really infinite) that runs the stack out of space=

> -- and THAT implies that the system is trying to queue a crazy amount o=
f
> work on a recursive basis for what is a perfectly-legitimate operation
> -- which it should *NOT* do.

Update: This looks like an OLD bug that came back.

Previously the system would go absolutely insane on the first few
accesses to spinning rust during a snapshot delete and ATTEMPT to send
thousands of TRIM requests -- which spinning rust does not support.=C2=A0=
 On
a system with mixed vdevs, where some pools are rust and some are SSD,
this was a problem since you can't turn TRIM off because you REALLY want
it on those disks.

The FIX for this was to do this on the import of said pool comprised of
spinning rust:

#
# Now try to trigger TRIM so that we don't have a storm of them
#
# echo "Attempting to disable TRIM on spinning rust...."

mount -t zfs $BACKUP/no-trim /mnt
dd if=3D/dev/random of=3D/mnt/kill-trim bs=3D128k count=3D2
echo "Performed 2 writes"
sleep 2
rm /mnt/kill-trim
echo "Performed delete of written file; wait"
sleep 35
umount /mnt
echo "Unmounted temporary filesystem"
sleep 2
echo "TRIM disable theoretically done...."

This would cause ZFS to "figure out" that TRIM doesn't work on the VDEVs
on that pool and not send them.=C2=A0 You were then ok.

When this went away I commented those out.

I just un-commented them -- and the backup is now running, and appears
to be ok -- it's well past where it blew up before, and no panic.

So it APPEARS that ZFS (still) doesn't figure out that TRIM doesn't work
on those disks on a pool import quickly enough and this leads to a CRAZY
number of queued requests which blow up the stack.=C2=A0 But, if you perf=
orm
an operation that *should* result in a small number of TRIMs being
issued and then wait a few seconds the ZFS code does correctly realize
that these are spinning drives that don't do TRIM, doesn't stack them
up, and it doesn't panic.

This wouldn't bite you on an import and then MOST ordinary I/O
sequences, because the first few blocks you delete or overwrite will
generate a TRIM attempt, it will fail, and the vdev's get properly
marked.=C2=A0 But if the first thing you do after the pool import is a "z=
fs
recv" *and* that is an incremental receive, that is, it first deletes
the non-existent snapshots on the source, you get screwed.

I don't know why this was "fixed" and now once again is not or if it was
just luck that it didn't bite me earlier after moving to 12.0 -- but
going through the backtrace I found a crap-ton of TRIMs in those frames
which just flat-out don't work, of course, on spinning drives.

IMHO ZFS should figure this out when the pool is imported and set the
flags appropriately, but it appears that it does not until you actually
attempt to TRIM, and the "attempt" in the case of a send/recv pair
results in a very large number of TRIM requests all being queued up at
once -- enough to cause a panic -- before the code figures out that it
can't do them at all.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms080701050702040803020600
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
DdgwggagMIIEiKADAgECAhMA5EiKghDOXrvfxYxjITXYDdhIMA0GCSqGSIb3DQEBCwUAMIGL
MQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJTmljZXZpbGxlMRkw
FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExITAf
BgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQTAeFw0xNzA4MTcxNjQyMTdaFw0yNzA4
MTUxNjQyMTdaMHsxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkwFwYDVQQKDBBD
dWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExJTAjBgNVBAMMHEN1
ZGEgU3lzdGVtcyBMTEMgMjAxNyBJbnQgQ0EwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIK
AoICAQC1aJotNUI+W4jP7xQDO8L/b4XiF4Rss9O0B+3vMH7Njk85fZ052QhZpMVlpaaO+sCI
KqG3oNEbuOHzJB/NDJFnqh7ijBwhdWutdsq23Ux6TvxgakyMPpT6TRNEJzcBVQA0kpby1DVD
0EKSK/FrWWBiFmSxg7qUfmIq/mMzgE6epHktyRM3OGq3dbRdOUgfumWrqHXOrdJz06xE9NzY
vc9toqZnd79FUtE/nSZVm1VS3Grq7RKV65onvX3QOW4W1ldEHwggaZxgWGNiR/D4eosAGFxn
uYeWlKEC70c99Mp1giWux+7ur6hc2E+AaTGh+fGeijO5q40OGd+dNMgK8Es0nDRw81lRcl24
SWUEky9y8DArgIFlRd6d3ZYwgc1DMTWkTavx3ZpASp5TWih6yI8ACwboTvlUYeooMsPtNa9E
6UQ1nt7VEi5syjxnDltbEFoLYcXBcqhRhFETJe9CdenItAHAtOya3w5+fmC2j/xJz29og1KH
YqWHlo3Kswi9G77an+zh6nWkMuHs+03DU8DaOEWzZEav3lVD4u76bKRDTbhh0bMAk4eXriGL
h4MUoX3Imfcr6JoyheVrAdHDL/BixbMH1UUspeRuqQMQ5b2T6pabXP0oOB4FqldWiDgJBGRd
zWLgCYG8wPGJGYgHibl5rFiI5Ix3FQncipc6SdUzOQIDAQABo4IBCjCCAQYwHQYDVR0OBBYE
FF3AXsKnjdPND5+bxVECGKtc047PMIHABgNVHSMEgbgwgbWAFBu1oRhUMNEzjODolDka5k4Q
EDBioYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJ
TmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5
c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYIJAKxAy1WBo2kY
MBIGA1UdEwEB/wQIMAYBAf8CAQAwDgYDVR0PAQH/BAQDAgGGMA0GCSqGSIb3DQEBCwUAA4IC
AQCB5686UCBVIT52jO3sz9pKuhxuC2npi8ZvoBwt/IH9piPA15/CGF1XeXUdu2qmhOjHkVLN
gO7XB1G8CuluxofOIUce0aZGyB+vZ1ylHXlMeB0R82f5dz3/T7RQso55Y2Vog2Zb7PYTC5B9
oNy3ylsnNLzanYlcW3AAfzZcbxYuAdnuq0Im3EpGm8DoItUcf1pDezugKm/yKtNtY6sDyENj
tExZ377cYA3IdIwqn1Mh4OAT/Rmh8au2rZAo0+bMYBy9C11Ex0hQ8zWcvPZBDn4v4RtO8g+K
uQZQcJnO09LJNtw94W3d2mj4a7XrsKMnZKvm6W9BJIQ4Nmht4wXAtPQ1xA+QpxPTmsGAU0Cv
HmqVC7XC3qxFhaOrD2dsvOAK6Sn3MEpH/YrfYCX7a7cz5zW3DsJQ6o3pYfnnQz+hnwLlz4MK
17NIA0WOdAF9IbtQqarf44+PEyUbKtz1r0KGeGLs+VGdd2FLA0e7yuzxJDYcaBTVwqaHhU2/
Fna/jGU7BhrKHtJbb/XlLeFJ24yvuiYKpYWQSSyZu1R/gvZjHeGb344jGBsZdCDrdxtQQcVA
6OxsMAPSUPMrlg9LWELEEYnVulQJerWxpUecGH92O06wwmPgykkz//UmmgjVSh7ErNvL0lUY
UMfunYVO/O5hwhW+P4gviCXzBFeTtDZH259O7TCCBzAwggUYoAMCAQICEwCg0WvVwekjGFiO
62SckFwepz0wDQYJKoZIhvcNAQELBQAwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3Jp
ZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBD
QTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExMQyAyMDE3IEludCBDQTAeFw0xNzA4MTcyMTIx
MjBaFw0yMjA4MTYyMTIxMjBaMFcxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkw
FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRswGQYDVQQDDBJrYXJsQGRlbm5pbmdlci5uZXQw
ggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQC+HVSyxVtJhy3Ohs+PAGRuO//Dha9A
16l5FPATr6wude9zjX5f2lrkRyU8vhCXTZW7WbvWZKpcZ8r0dtZmiK9uF58Ec6hhvfkxJzbg
96WHBw5Fumd5ahZzuCJDtCAWW8R7/KN+zwzQf1+B3MVLmbaXAFBuKzySKhKMcHbK3/wjUYTg
y+3UK6v2SBrowvkUBC+jxNg3Wy12GsTXcUS/8FYIXgVVPgfZZrbJJb5HWOQpvvhILpPCD3xs
YJFNKEPltXKWHT7Qtc2HNqikgNwj8oqOb+PeZGMiWapsatKm8mxuOOGOEBhAoTVTwUHlMNTg
6QUCJtuWFCK38qOCyk9Haj+86lUU8RG6FkRXWgMbNQm1mWREQhw3axgGLSntjjnznJr5vsvX
SYR6c+XKLd5KQZcS6LL8FHYNjqVKHBYM+hDnrTZMqa20JLAF1YagutDiMRURU23iWS7bA9tM
cXcqkclTSDtFtxahRifXRI7Epq2GSKuEXe/1Tfb5CE8QsbCpGsfSwv2tZ/SpqVG08MdRiXxN
5tmZiQWo15IyWoeKOXl/hKxA9KPuDHngXX022b1ly+5ZOZbxBAZZMod4y4b4FiRUhRI97r9l
CxsP/EPHuuTIZ82BYhrhbtab8HuRo2ofne2TfAWY2BlA7ExM8XShMd9bRPZrNTokPQPUCWCg
CdIATQIDAQABo4IBzzCCAcswPAYIKwYBBQUHAQEEMDAuMCwGCCsGAQUFBzABhiBodHRwOi8v
b2NzcC5jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNVHRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIF
oDAOBgNVHQ8BAf8EBAMCBeAwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMDMGCWCG
SAGG+EIBDQQmFiRPcGVuU1NMIEdlbmVyYXRlZCBDbGllbnQgQ2VydGlmaWNhdGUwHQYDVR0O
BBYEFLElmNWeVgsBPe7O8NiBzjvjYnpRMIHKBgNVHSMEgcIwgb+AFF3AXsKnjdPND5+bxVEC
GKtc047PoYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UE
BwwJTmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRh
IFN5c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYITAORIioIQ
zl6738WMYyE12A3YSDAdBgNVHREEFjAUgRJrYXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcN
AQELBQADggIBAJXboPFBMLMtaiUt4KEtJCXlHO/3ZzIUIw/eobWFMdhe7M4+0u3te0sr77QR
dcPKR0UeHffvpth2Mb3h28WfN0FmJmLwJk+pOx4u6uO3O0E1jNXoKh8fVcL4KU79oEQyYkbu
2HwbXBU9HbldPOOZDnPLi0whi/sbFHdyd4/w/NmnPgzAsQNZ2BYT9uBNr+jZw4SsluQzXG1X
lFL/qCBoi1N2mqKPIepfGYF6drbr1RnXEJJsuD+NILLooTNf7PMgHPZ4VSWQXLNeFfygoOOK
FiO0qfxPKpDMA+FHa8yNjAJZAgdJX5Mm1kbqipvb+r/H1UAmrzGMbhmf1gConsT5f8KU4n3Q
IM2sOpTQe7BoVKlQM/fpQi6aBzu67M1iF1WtODpa5QUPvj1etaK+R3eYBzi4DIbCIWst8MdA
1+fEeKJFvMEZQONpkCwrJ+tJEuGQmjoQZgK1HeloepF0WDcviiho5FlgtAij+iBPtwMuuLiL
shAXA5afMX1hYM4l11JXntle12EQFP1r6wOUkpOdxceCcMVDEJBBCHW2ZmdEaXgAm1VU+fnQ
qS/wNw/S0X3RJT1qjr5uVlp2Y0auG/eG0jy6TT0KzTJeR9tLSDXprYkN2l/Qf7/nT6Q03qyE
QnnKiBXWAZXveafyU/zYa7t3PTWFQGgWoC4w6XqgPo4KV44OMYIFBzCCBQMCAQEwgZIwezEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM
TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM
QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBglghkgBZQMEAgMFAKCCAkUw
GAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTkwNzE4MjExNzE3
WjBPBgkqhkiG9w0BCQQxQgRAUMZIK9lQq9jzx0nuhaDDTsQ5SRpTCLGZP0ppv3di1+HwP/jJ
avyJlZOm3pzYWQthmoCd+4LGDEgQMosYVChKGzBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFl
AwQBKjALBglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3
DQMCAgFAMAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIGjBgkrBgEEAYI3EAQxgZUwgZIwezEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM
TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM
QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTCBpQYLKoZIhvcNAQkQAgsxgZWg
gZIwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lz
dGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0
ZW1zIExMQyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBgkqhkiG9w0BAQEF
AASCAgBgGQ0JfOZA8x521fURGxvJdmIgF5MTtcojE8NKbzSK34zzhrfDuYF4eoJWqI3izdDC
4CLxZkLhcDWHTjTfrPNwZk0/jHfJf8eIIe0x80YW2nC8k/RfAFe2tHuVd2BPobUH/XzCTeb/
RTCQHT3+IvZiXxYQisOgmaRq9z6WiwUbLvvvl2C4Pn6J0llszzsT9HMeeAavLSLkP4CT4HdJ
Y4RvxhRjj6HEepsO7wvg74FLRk+y7d5NiAxwX7zAqyAp+awqz526J1pkWWQJk5rP+hdtIVas
HPoyIlnPW0Kh1FoacLSNfR+sOeoCFrAObKTjlnxQAcpSeUhBsRvQkIFl/4guqZlT3VerTxgL
L0/zqp6xI2YuTGxmwLUni3w+UNWRoOPVBaeT0+zVYl7EOjhpOCrMezcKEKU+pR3kGsJPkjIW
es7SyktaCIoZ2NLKW2HuumJ/kBIzd5JBAFGUmVjuyyWPm6tnTHU6ck6TWfymjOWpLuUlDJ9V
7Q56dgTEG48gXZYJRWa0PgbqMLrJXsavnZO2trWiMo0JguKHFlZ30ansgRBMRoLellofAZGD
vmxy6h4eBdBRRGYvEquSmp8satlS4aYbcijgK1R98Fen//WLLFS8WXDX/Q25U5NY1QZ+ERlp
icdAaIeFumvuGCy7UbriVHAPXLIOYb9IjxshUkgPeAAAAAAAAA==
--------------ms080701050702040803020600--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6a3d5e55-6d50-3e08-63e4-dd1cd2209abe>