From owner-freebsd-fs@freebsd.org Wed Dec 4 12:48:10 2019 Return-Path: Delivered-To: freebsd-fs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 845CB1B6686 for ; Wed, 4 Dec 2019 12:48:10 +0000 (UTC) (envelope-from pen@lysator.liu.se) Received: from mail.lysator.liu.se (mail.lysator.liu.se [IPv6:2001:6b0:17:f0a0::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 47Sdtc69cjz3Jq1; Wed, 4 Dec 2019 12:48:08 +0000 (UTC) (envelope-from pen@lysator.liu.se) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 1AA4F4000D; Wed, 4 Dec 2019 13:48:06 +0100 (CET) Received: from [IPv6:2001:6b0:17:f002:8d12:4cc9:4751:6d43] (unknown [IPv6:2001:6b0:17:f002:8d12:4cc9:4751:6d43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 998C54000E; Wed, 4 Dec 2019 13:48:05 +0100 (CET) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3601.0.10\)) Subject: Re: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini() From: Peter Eriksson In-Reply-To: Date: Wed, 4 Dec 2019 13:48:05 +0100 Cc: Andriy Gapon Content-Transfer-Encoding: quoted-printable Message-Id: <1119144A-C6DA-4CBF-8616-108FB50D2F8D@lysator.liu.se> References: <20191202225424.GG43802@raichu> <3b71fe37-c29f-e3e5-ff96-5dce15cc7553@FreeBSD.org> <20191203162219.GI43802@raichu> <247C6990-BC6E-4E3D-8CEF-5A861D8A25EC@lysator.liu.se> <4D44B25E-2F15-4536-A653-DA242051C8A9@lysator.liu.se> To: FreeBSD FS X-Mailer: Apple Mail (2.3601.0.10) X-Virus-Scanned: ClamAV using ClamSMTP X-Rspamd-Queue-Id: 47Sdtc69cjz3Jq1 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=none) header.from=liu.se; spf=pass (mx1.freebsd.org: domain of pen@lysator.liu.se designates 2001:6b0:17:f0a0::3 as permitted sender) smtp.mailfrom=pen@lysator.liu.se X-Spamd-Result: default: False [-3.91 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; MV_CASE(0.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[]; TO_DN_ALL(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.a.0.f.7.1.0.0.0.b.6.0.1.0.0.2.list.dnswl.org : 127.0.11.2]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[liu.se,none]; IP_SCORE(-1.41)[ip: (-6.19), ipnet: 2001:6b0::/32(-0.46), asn: 1653(-0.38), country: EU(-0.00)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:1653, ipnet:2001:6b0::/32, country:EU]; MID_RHS_MATCH_FROM(0.00)[] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Dec 2019 12:48:10 -0000 Going deeper=E2=80=A6 now it=E2=80=99s pointing towards pmap_remove(), = called from kmem_unback(), which is called from kmem_free(), which is = called from page_free() which is called from keg_free_slab()... kmem_unback: pmap_remove(kernel_pmap, 18446741884786954240, = 18446741884786970624) took 1 seconds kmem_free: kmem_unback(kmem_object, 18446741884786954240, 16384) took 1 = seconds page_free: kmem_free(kmem_arena, 18446741884786954240, 16384) took 1 = seconds keg_free_slab: keg->uk_freef(mem) {page_free} took 1 seconds kmem_unback: pmap_remove(kernel_pmap, 18446741884989739008, = 18446741884989755392) took 1 seconds kmem_free: kmem_unback(kmem_object, 18446741884989739008, 16384) took 1 = seconds page_free: kmem_free(kmem_arena, 18446741884989739008, 16384) took 1 = seconds keg_free_slab: keg->uk_freef(mem) {page_free} took 1 seconds kmem_unback: pmap_remove(kernel_pmap, 18446741885257895936, = 18446741885257912320) took 1 seconds kmem_free: kmem_unback(kmem_object, 18446741885257895936, 16384) took 1 = seconds page_free: kmem_free(kmem_arena, 18446741885257895936, 16384) took 1 = seconds keg_free_slab: keg->uk_freef(mem) {page_free} took 1 seconds - Peter > On 4 Dec 2019, at 11:13, Peter Eriksson wrote: >=20 > Yeah, I noticed that after I sent that email (should send stuff just = before you go to bed :-) >=20 > Anyway, some more timing/debug printfs point to the keg->uk_freef(mem) = call in keg_free_slab() which sometimes take about a second to execute. > (Should probably switch to some other time source with greater = resolution than 1s next :-) >=20 > keg_free_slab: keg->uk_freef(mem) took 1 seconds > keg_free_slab: keg->uk_freef(mem) took 1 seconds > keg_free_slab: keg->uk_freef(mem) took 1 seconds > keg_free_slab: keg->uk_freef(mem) took 1 seconds > keg_free_slab: keg->uk_freef(mem) took 1 seconds > keg_free_slab: keg->uk_freef(mem) took 1 seconds > keg_free_slab: keg->uk_freef(mem) took 1 seconds >=20 > - Peter >=20 >> On 4 Dec 2019, at 10:48, Andriy Gapon wrote: >>=20 >> On 04/12/2019 00:28, Peter Eriksson wrote: >>> =E2=80=A6 and X levels deeper it gives this (not all filesystems = mounted - I rebooted fairly quickly): >>>=20 >>> keg_drain: while (SLIST_FIRST/SLIST_REMOVE)-loop took 14 seconds = [20021 loops] >>> zone_drain_wait(): zone_foreach_keg(zone, &keg_drain) took 14 = seconds >>> zone_dtor(): zone_drain_wait(zone, M_WAITOK) took 14 seconds >>> zone_free_item(zone=3DUMA Zones): zone->uz_dtor() took 14 seconds >>> uma_zdestroy(zio_buf_12288) took 14 seconds >>> kmem_cache_destroy: uma_zdestroy(0xfffff803467c8ac0) [zio_buf_12288] = took 14 seconds >>> kmem_cache_destroy(zio_buf_cache[20]) took 14 seconds >>>=20 >>>=20 >>> Hmm=E2=80=A6 I wonder if it isn=E2=80=99t this code I = /usr/src/sys/sys/queue.h: >>>=20 >>>> #define SLIST_REMOVE(head, elm, type, field) do { = \ >>>> QMD_SAVELINK(oldnext, (elm)->field.sle_next); = \ >>>> if (SLIST_FIRST((head)) =3D=3D (elm)) { = \ >>>> SLIST_REMOVE_HEAD((head), field); = \ >>>> } = \ >>>> else { = \ >>>> QUEUE_TYPEOF(type) *curelm =3D SLIST_FIRST(head); = \ >>>> while (SLIST_NEXT(curelm, field) !=3D (elm)) = \ >>>> curelm =3D SLIST_NEXT(curelm, field); = \ >>>> SLIST_REMOVE_AFTER(curelm, field); = \ >>>> } = \ >>>> TRASHIT(*oldnext); = \ >>>> } while (0) >>>=20 >>> Combined with this in /usr/src/sys/vm/uma_core.c: keg_drain(): >>>=20 >>>> finished: >>>> KEG_UNLOCK(keg); >>>>=20 >>>> while ((slab =3D SLIST_FIRST(&freeslabs)) !=3D NULL) { >>>> SLIST_REMOVE(&freeslabs, slab, uma_slab, us_hlink); >>>> keg_free_slab(keg, slab, keg->uk_ipers); >>>> } >>>=20 >>> (The keg_drain print above is the time for the while() loop). >>>=20 >>> If I=E2=80=99m reading this right it looks like it=E2=80=99s = potentially doing a linear search through a (long) linked list, for = every entry in the free slabs list. 20000 entries in 14 seconds is 1400 = entries per second. >>=20 >> It's not a search. The loop is removing *all* items. There is no = other way to >> do it :-) >> Expanding your calculations, it takes on the order of a millisecond = to process a >> single slab. I bet that the time is spent in keg_free_slab() and = it's a lot of >> time. >>=20 >> I wonder if the OR should actually be AND in this condition: >> if (!uma_dbg_kskip(keg, slab->us_data + (keg->uk_rsize = * i)) || >> keg->uk_fini !=3D trash_fini) >> Looks like there is a "ton" of memory reading done here. >>=20 >> --=20 >> Andriy Gapon >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"