From owner-freebsd-stable@freebsd.org Tue Dec 22 19:14:44 2020 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 920094C5F39 for ; Tue, 22 Dec 2020 19:14:44 +0000 (UTC) (envelope-from mike@sentex.net) Received: from pyroxene2a.sentex.ca (pyroxene19.sentex.ca [IPv6:2607:f3e0:0:3::19]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "pyroxene.sentex.ca", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4D0mHR5r6sz4n6f; Tue, 22 Dec 2020 19:14:43 +0000 (UTC) (envelope-from mike@sentex.net) Received: from [IPv6:2607:f3e0:0:4:a036:2df8:c2fe:5d58] ([IPv6:2607:f3e0:0:4:a036:2df8:c2fe:5d58]) by pyroxene2a.sentex.ca (8.15.2/8.15.2) with ESMTPS id 0BMJEh5f068858 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Tue, 22 Dec 2020 14:14:43 -0500 (EST) (envelope-from mike@sentex.net) From: mike tancsa To: Mark Johnston Cc: FreeBSD-STABLE Mailing List References: <878824fe-dde2-b551-4685-e8bd27371275@sentex.net> <5b3415cb-2176-895e-9d22-4f4f0f359d85@sentex.net> <7f49e64d-c875-f12d-744e-7b174e197cbb@sentex.net> Subject: Re: zfs panic RELENG_12 Message-ID: <5e2bef97-f124-036f-4e71-874707925ef0@sentex.net> Date: Tue, 22 Dec 2020 14:14:43 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <7f49e64d-c875-f12d-744e-7b174e197cbb@sentex.net> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Language: en-US X-Rspamd-Queue-Id: 4D0mHR5r6sz4n6f X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of mike@sentex.net designates 2607:f3e0:0:3::19 as permitted sender) smtp.mailfrom=mike@sentex.net X-Spamd-Result: default: False [0.00 / 15.00]; RCVD_TLS_ALL(0.00)[]; ARC_NA(0.00)[]; NEURAL_SPAM_SHORT(1.00)[1.000]; FREEFALL_USER(0.00)[mike]; FROM_HAS_DN(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[2607:f3e0:0:3::19:from]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; MIME_GOOD(-0.10)[text/plain]; HFILTER_HELO_IP_A(1.00)[pyroxene2a.sentex.ca]; HFILTER_HELO_NORES_A_OR_MX(0.30)[pyroxene2a.sentex.ca]; R_SPF_ALLOW(-0.20)[+ip6:2607:f3e0::/32]; SPAMHAUS_ZRD(0.00)[2607:f3e0:0:3::19:from:127.0.2.255]; DMARC_NA(0.00)[sentex.net]; TO_DN_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:11647, ipnet:2607:f3e0::/32, country:CA]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-stable] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Dec 2020 19:14:44 -0000 On 12/22/2020 10:09 AM, mike tancsa wrote: > On 12/22/2020 10:07 AM, Mark Johnston wrote: >> Could you go to frame 11 and print zone->uz_name and >> bucket->ub_bucket[18]? I'm wondering if the item pointer was mangled >> somehow. > Thank you for looking! > > (kgdb) frame 11 > > #11 0xffffffff80ca47d4 in bucket_drain (zone=3D0xfffff800037da000, > bucket=3D0xfffff801c7fd5200) at /usr/src/sys/vm/uma_core.c:758 > 758=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 zone->uz_release(zone->uz_arg, bucket->ub_bucket, > bucket->ub_cnt); > (kgdb) p zone->uz_name > $1 =3D 0xffffffff8102118a "mbuf_jumbo_9k" > (kgdb) p bucket->ub_bucket[18] > $2 =3D (void *) 0xfffff80de4654000 > (kgdb) p bucket->ub_bucket=C2=A0=C2=A0=C2=A0 > $3 =3D 0xfffff801c7fd5218 > > (kgdb) > Not sure if its coincidence or not, but previously I was running with arc being limited to ~30G of the 64G of RAM on the box.=C2=A0 I removed t= hat limit a few weeks ago after upgrading the box to RELENG_12 to pull in the OpenSSL changes.=C2=A0 The panic seems to happen under disk load. I h= ave 3 zfs pools that are pretty busy receiving snapshots. One day a week, we write a full set to a 4th zfs pool off some geli attached drives via USB for offsite cold storage.=C2=A0 The crashes happened with that extra leve= l of disk work.=C2=A0 gstat shows most of the 12 drives off 2 mrsas controller= s at or close to 100% busy during the 18hrs it takes to dump out the files. Trying a new cold storage run now with the arc limit back to vfs.zfs.arc_max=3D29334498304 =C2=A0=C2=A0=C2=A0 ---Mike