From owner-freebsd-fs@freebsd.org  Sun Dec  1 07:19:03 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id C37961C1CA5
 for <freebsd-fs@mailman.nyi.freebsd.org>; Sun,  1 Dec 2019 07:19:03 +0000 (UTC)
 (envelope-from isafonov@me.com)
Received: from mr85p00im-hyfv06021301.me.com (mr85p00im-hyfv06021301.me.com
 [17.58.23.188])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 47QfkF69fMz4XLm
 for <freebsd-fs@freebsd.org>; Sun,  1 Dec 2019 07:19:01 +0000 (UTC)
 (envelope-from isafonov@me.com)
Received: from [192.168.37.155] (ip-097-206-122-091.static.atnet.ru
 [91.122.206.97])
 by mr85p00im-hyfv06021301.me.com (Postfix) with ESMTPSA id 2086540D5A
 for <freebsd-fs@freebsd.org>; Sun,  1 Dec 2019 07:18:56 +0000 (UTC)
From: Ivan Safonov <isafonov@me.com>
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Subject: zfs panic: solaris assert
Message-Id: <78272047-A0E6-4EE0-B3B3-1F927D29EC12@me.com>
References: <C5344C23-2C22-4F97-B555-48A7CCC682D9@me.com>
To: freebsd-fs@freebsd.org
Date: Sun, 1 Dec 2019 10:18:53 +0300
X-Mailer: Apple Mail (2.3445.104.11)
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, ,
 definitions=2019-12-01_01:, , signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1
 malwarescore=0
 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 mlxscore=0
 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=8.0.1-1908290000 definitions=main-1912010064
X-Rspamd-Queue-Id: 47QfkF69fMz4XLm
X-Spamd-Bar: ---
X-Spamd-Result: default: False [-3.60 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:17.58.0.0/16:c];
 FREEMAIL_FROM(0.00)[me.com]; TO_DN_NONE(0.00)[];
 MV_CASE(0.50)[]; DKIM_TRACE(0.00)[me.com:+];
 DMARC_POLICY_ALLOW(-0.50)[me.com,quarantine];
 RCVD_IN_DNSWL_LOW(-0.10)[188.23.58.17.list.dnswl.org : 127.0.5.1];
 FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~];
 FREEMAIL_ENVFROM(0.00)[me.com];
 ASN(0.00)[asn:714, ipnet:17.58.16.0/20, country:US];
 MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-0.999,0];
 R_DKIM_ALLOW(-0.20)[me.com:s=1a1hai]; FROM_HAS_DN(0.00)[];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0];
 MIME_GOOD(-0.10)[multipart/alternative,text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org];
 IP_SCORE_FREEMAIL(0.00)[]; RCPT_COUNT_ONE(0.00)[1];
 DWL_DNSWL_LOW(-1.00)[me.com.dwl.dnswl.org : 127.0.5.1];
 IP_SCORE(0.00)[ip: (-3.62), ipnet: 17.58.16.0/20(-1.57), asn: 714(-2.31),
 country: US(-0.05)]; RCVD_COUNT_TWO(0.00)[2];
 RCVD_TLS_ALL(0.00)[]
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Dec 2019 07:19:03 -0000

Hi there
I have a small home server with a ZFS pool created in FreeBSD 11.1 and =
updated to 12.0 -> 12.1. A few days ago, the server rebooted with panic

panic: solaris assert: size <=3D (1ULL << 24) (0x1401000 <=3D =
0x1000000), file: =
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c, line: 296
cpuid =3D 3
time =3D 1574981747
KDB: stack backtrace:
#0 0xffffffff80c1d297 at kdb_backtrace+0x67
#1 0xffffffff80bd05cd at vpanic+0x19d
#2 0xffffffff80bd0423 at panic+0x43
#3 0xffffffff82a6e22c at assfail3+0x2c
#4 0xffffffff8284a8f7 at abd_alloc+0x67
#5 0xffffffff82850319 at arc_hdr_alloc_pabd+0x99
#6 0xffffffff8284d554 at arc_hdr_alloc+0x124
#7 0xffffffff8284ef13 at arc_read+0x243
#8 0xffffffff8287942d at traverse_prefetch_metadata+0xbd
#9 0xffffffff828788cc at traverse_visitbp+0x3dc
#10 0xffffffff82878930 at traverse_visitbp+0x440
#11 0xffffffff82878930 at traverse_visitbp+0x440
#12 0xffffffff82878930 at traverse_visitbp+0x440
#13 0xffffffff82878930 at traverse_visitbp+0x440
#14 0xffffffff82879513 at traverse_dnode+0xd3
#15 0xffffffff82878c30 at traverse_visitbp+0x740
#16 0xffffffff828780a7 at traverse_impl+0x317
#17 0xffffffff8287837c at traverse_pool+0x14c
Uptime: 1m0s

I tested the memory and drives and they look OK. Then I installed =
FreeBSD 12.1 on a USB drive
zpool import

   pool: zroot
     id: 5722521002676846505
  state: ONLINE
 status: Some supported features are not enabled on the pool.
 action: The pool can be imported using its name or numeric identifier, =
though
        some features will not be available without an explicit 'zpool =
upgrade'.
 config:

        zroot                 ONLINE
          raidz1-0            ONLINE
            gpt/D0-33GTM0VGS  ONLINE
            gpt/D1-Z1E3Q7KP   ONLINE
            gpt/D2-Z4ZARG1V   ONLINE

I tried to import the pool but got the same panic
zpool import -o readonly -f -F -N -R /pool zroot

zdb -ue zroot

Uberblock:
    magic =3D 0000000000bab10c
    version =3D 5000
    txg =3D 11751485
    guid_sum =3D 14638353410936556308
    timestamp =3D 1574776136 UTC =3D Tue Nov 26 16:48:56 2019
    checkpoint_txg =3D 0

zdb -l /dev/gpt/D0-33GTM0VGS
Code:
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'zroot'
    state: 0
    txg: 11750848
    pool_guid: 5722521002676846505
    hostid: 1550328424
    hostname: ''
    top_guid: 10071190156355008053
    guid: 4016597551985842896
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 10071190156355008053
        nparity: 1
        metaslab_array: 39
        metaslab_shift: 35
        ashift: 12
        asize: 5997325713408
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 4016597551985842896
            path: '/dev/gpt/D0-33GTM0VGS'
            whole_disk: 1
            DTL: 70406
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 18015782253695591313
            path: '/dev/gpt/D1-Z1E3Q7KP'
            whole_disk: 1
            DTL: 70405
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 13705750593642370773
            path: '/dev/gpt/D2-Z4ZARG1V'
            whole_disk: 1
            DTL: 70323
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
------------------------------------
LABEL 1
------------------------------------
    version: 5000
    name: 'zroot'
    state: 0
    txg: 11750848
    pool_guid: 5722521002676846505
    hostid: 1550328424
    hostname: ''
    top_guid: 10071190156355008053
    guid: 4016597551985842896
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 10071190156355008053
        nparity: 1
        metaslab_array: 39
        metaslab_shift: 35
        ashift: 12
        asize: 5997325713408
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 4016597551985842896
            path: '/dev/gpt/D0-33GTM0VGS'
            whole_disk: 1
            DTL: 70406
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 18015782253695591313
            path: '/dev/gpt/D1-Z1E3Q7KP'
            whole_disk: 1
            DTL: 70405
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 13705750593642370773
            path: '/dev/gpt/D2-Z4ZARG1V'
            whole_disk: 1
            DTL: 70323
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
------------------------------------
LABEL 2
------------------------------------
    version: 5000
    name: 'zroot'
    state: 0
    txg: 11750848
    pool_guid: 5722521002676846505
    hostid: 1550328424
    hostname: ''
    top_guid: 10071190156355008053
    guid: 4016597551985842896
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 10071190156355008053
        nparity: 1
        metaslab_array: 39
        metaslab_shift: 35
        ashift: 12
        asize: 5997325713408
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 4016597551985842896
            path: '/dev/gpt/D0-33GTM0VGS'
            whole_disk: 1
            DTL: 70406
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 18015782253695591313
            path: '/dev/gpt/D1-Z1E3Q7KP'
            whole_disk: 1
            DTL: 70405
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 13705750593642370773
            path: '/dev/gpt/D2-Z4ZARG1V'
            whole_disk: 1
            DTL: 70323
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
------------------------------------
LABEL 3
------------------------------------
    version: 5000
    name: 'zroot'
    state: 0
    txg: 11750848
    pool_guid: 5722521002676846505
    hostid: 1550328424
    hostname: ''
    top_guid: 10071190156355008053
    guid: 4016597551985842896
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 10071190156355008053
        nparity: 1
        metaslab_array: 39
        metaslab_shift: 35
        ashift: 12
        asize: 5997325713408
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 4016597551985842896
            path: '/dev/gpt/D0-33GTM0VGS'
            whole_disk: 1
            DTL: 70406
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 18015782253695591313
            path: '/dev/gpt/D1-Z1E3Q7KP'
            whole_disk: 1
            DTL: 70405
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 13705750593642370773
            path: '/dev/gpt/D2-Z4ZARG1V'
            whole_disk: 1
            DTL: 70323
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data

zdb -AAA -F -X zroot

...
                            capacity   operations   bandwidth  ---- =
errors ----
description                used avail  read write  read write  read =
write cksum
zroot                     1.47T 3.97T    82     0  351K     0     0     =
0    30
  raidz1                  1.47T 3.97T    82     0  351K     0     0     =
0   121
    /dev/gpt/D0-33GTM0VGS                27     0  146K     0     0     =
0     0
    /dev/gpt/D1-Z1E3Q7KP                 27     0  146K     0     0     =
0     0
    /dev/gpt/D2-Z4ZARG1V                 27     0  146K     0     0     =
0     0
    ...

Is it possible to somehow restore the pool or data?=

From owner-freebsd-fs@freebsd.org  Sun Dec  1 12:13:21 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 0DAB91A99D3
 for <freebsd-fs@mailman.nyi.freebsd.org>; Sun,  1 Dec 2019 12:13:21 +0000 (UTC)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
Received: from uucp.dinoex.org (uucp.dinoex.sub.de [IPv6:2001:1440:5001:1::2])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "uucp.dinoex.sub.de",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 47QnFq4VNZz3HT4
 for <freebsd-fs@FreeBSD.ORG>; Sun,  1 Dec 2019 12:13:18 +0000 (UTC)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
Received: from uucp.dinoex.sub.de (uucp.dinoex.org [185.220.148.12])
 by uucp.dinoex.org (8.16.0.41/8.16.0.41) with ESMTPS id xB1CD5XP059870
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO)
 for <freebsd-fs@FreeBSD.ORG>; Sun, 1 Dec 2019 13:13:06 +0100 (CET)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
X-MDaemon-Deliver-To: <freebsd-fs@FreeBSD.ORG>
X-Authentication-Warning: uucp.dinoex.sub.de: Host uucp.dinoex.org
 [185.220.148.12] claimed to be uucp.dinoex.sub.de
Received: from citylink.dinoex.sub.org (uucp@localhost)
 by uucp.dinoex.sub.de (8.16.0.41/8.16.0.41/Submit) with UUCP id xB1CD53A059869
 for freebsd-fs@FreeBSD.ORG; Sun, 1 Dec 2019 13:13:05 +0100 (CET)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
Received: from gate.oper.dinoex.org (gate-e [192.168.98.2])
 by citylink.dinoex.sub.de (8.15.2/8.15.2) with ESMTP id xB1C6rBE073197
 for <freebsd-fs@FreeBSD.ORG>; Sun, 1 Dec 2019 13:06:53 +0100 (CET)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
Received: from gate.oper.dinoex.org (gate-e [192.168.98.2])
 by gate.oper.dinoex.org (8.15.2/8.15.2) with ESMTP id xB1C411i071315
 for <freebsd-fs@FreeBSD.ORG>; Sun, 1 Dec 2019 13:04:01 +0100 (CET)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
Received: (from news@localhost)
 by gate.oper.dinoex.org (8.15.2/8.15.2/Submit) id xB1C41qC071311
 for freebsd-fs@FreeBSD.ORG; Sun, 1 Dec 2019 13:04:01 +0100 (CET)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
X-Authentication-Warning: gate.oper.dinoex.org: news set sender to
 li-fbsd@citylink.dinoex.sub.org using -f
From: "Peter Much" <pmc@citylink.dinoex.sub.org>
Subject: ZFS overgrowing arc_max -> OOM kills
Date: Sun, 01 Dec 2019 12:52:53 +0100
Organization: n/a
Message-ID: <op.0b4bafvnio2kyh@localhost>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Injection-Info: oper.dinoex.de;
 logging-data="68697"; mail-complaints-to="usenet@citylink.dinoex.sub.org"
User-Agent: Opera Mail/12.16 (FreeBSD)
Sender: li-fbsd@citylink.dinoex.sub.org
To: freebsd-fs@FreeBSD.ORG
X-Milter: Spamilter (Reciever: uucp.dinoex.sub.de; Sender-ip: 185.220.148.12;
 Sender-helo: uucp.dinoex.sub.de; )
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2
 (uucp.dinoex.org [185.220.148.12]); Sun, 01 Dec 2019 13:13:08 +0100 (CET)
X-Rspamd-Queue-Id: 47QnFq4VNZz3HT4
X-Spamd-Bar: ++++
Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none;
 spf=none (mx1.freebsd.org: domain of li-fbsd@citylink.dinoex.sub.org has no
 SPF policy when checking 2001:1440:5001:1::2)
 smtp.mailfrom=li-fbsd@citylink.dinoex.sub.org
X-Spamd-Result: default: False [4.03 / 15.00]; ARC_NA(0.00)[];
 RCVD_COUNT_FIVE(0.00)[5]; FROM_HAS_DN(0.00)[];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; MV_CASE(0.50)[];
 MIME_GOOD(-0.10)[text/plain]; HAS_XAW(0.00)[];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org];
 AUTH_NA(1.00)[]; RCPT_COUNT_ONE(0.00)[1];
 HAS_ORG_HEADER(0.00)[]; NEURAL_SPAM_MEDIUM(0.38)[0.376,0];
 MIME_TRACE(0.00)[0:+];
 IP_SCORE(0.46)[ip: (1.23), ipnet: 2001:1440::/32(0.61), asn: 8469(0.49),
 country: DE(-0.01)]; NEURAL_SPAM_LONG(0.99)[0.992,0];
 TO_DN_NONE(0.00)[]; R_SPF_NA(0.00)[];
 FORGED_SENDER(0.30)[pmc@citylink.dinoex.sub.org,li-fbsd@citylink.dinoex.sub.org];
 DMARC_NA(0.00)[sub.org]; R_DKIM_NA(0.00)[];
 MID_RHS_NOT_FQDN(0.50)[];
 ASN(0.00)[asn:8469, ipnet:2001:1440::/32, country:DE];
 FROM_NEQ_ENVFROM(0.00)[pmc@citylink.dinoex.sub.org,li-fbsd@citylink.dinoex.sub.org];
 RCVD_TLS_LAST(0.00)[]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Dec 2019 12:13:21 -0000

Hi all,

I thought this matter was solved a long time ago! In fact, I am
using ZFS almost from the beginning, and I never observed it
seriousely overgrowing ArcMax, until now (this is 11.3-RELEASE-p3).

> System Memory:
>        9.38%   187.32  MiB Active,     0.37%   7.35    MiB Inact
>         78.60%  1.53    GiB Wired,      0.00%   0 Cache
>         1.11%   22.14   MiB Free,       10.54%  210.59  MiB Gap
>        Real Installed:                         2.00    GiB
>         Real Available:                 99.19%  1.98    GiB
>         Real Managed:                   98.33%  1.95    GiB
>        Logical Total:                          2.00    GiB
>         Logical Used:                   98.56%  1.97    GiB
>         Logical Free:                   1.44%   29.49   MiB
> Kernel Memory:                                  768.84  MiB
>         Data:                           98.21%  755.05  MiB
>         Text:                           1.79%   13.79   MiB
> Kernel Memory Map:                              1.38    GiB
>         Size:                           98.79%  1.36    GiB
>         Free:                           1.21%   17.10   MiB

See the KMEM! Obviousely, this was the last useful vital sign
   from the machine, then it stalled normal I/O and did only
network gatewaying:

> last pid: 51252;  load averages:  0.39,  3.09,  3.68    up 5+07:46:39   
> 03:09:25
> 1569 processes: 3 running, 1530 sleeping, 36 waiting
> CPU 0:  1.1% user,  0.0% nice,  5.0% system,  0.4% interrupt, 93.5% idle
> CPU 1:  0.0% user,  0.0% nice,  3.4% system,  1.9% interrupt, 94.6% idle
> Mem: 169M Active, 4K Inact, 204M Laundry, 1582M Wired, 85M Buf, 43M Free
> ARC: 1026M Total, 483M MFU, 160M MRU, 11M Anon, 188M Header, 184M Other
>      127M Compressed, 526M Uncompressed, 4.16:1 Ratio
> Swap: 8192M Total, 1965M Used, 6227M Free, 23% Inuse, 56K In

Now see the ARC:

> ARC Summary: (HEALTHY)
>         Memory Throttle Count:                  0
> ARC Misc:
>         Deleted:                                22.35m
>         Recycle Misses:                         0
>         Mutex Misses:                           22.11m
>         Evict Skips:                            733.63m
> ARC Size:                               127.01% 1016.06 MiB
>         Target Size: (Adaptive)         25.00%  200.00  MiB
>         Min Size (Hard Limit):          25.00%  200.00  MiB
>         Max Size (High Water):          4:1     800.00  MiB

This was steadily growing since 03:01 - likely due to the influx of
metadata from these daily/security/100/110 "find" jobs.

I did my calculations to cater for a 25% safety margin (which appear to
be quite accurate), but nevertheless, ZFS goes beyond that. :(

(Yes, this is i386 - it's a twenty-year-old machine. ;))

The problem started with OOM kills at 03:02 reporting "out of swap",
which is bogus (there is no swap shortage); so I found a way to switch
such annoying behaviour off:
> vm.pageout_oom_seq=512

Then, instead, I got massive "indefinite buffer wait" messages. And these
seem to come from what You can read above: out of kernel heap.

PMc

From owner-freebsd-fs@freebsd.org  Mon Dec  2 14:32:44 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 5644E1AB981
 for <freebsd-fs@mailman.nyi.freebsd.org>; Mon,  2 Dec 2019 14:32:44 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 47RSJB6yjfz3QY2
 for <freebsd-fs@freebsd.org>; Mon,  2 Dec 2019 14:32:42 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id 0455D40015
 for <freebsd-fs@freebsd.org>; Mon,  2 Dec 2019 15:32:40 +0100 (CET)
Received: from [IPv6:2001:6b0:17:f002:cd44:8234:6c47:5d2e] (unknown
 [IPv6:2001:6b0:17:f002:cd44:8234:6c47:5d2e])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id DF8D240011
 for <freebsd-fs@freebsd.org>; Mon,  2 Dec 2019 15:32:39 +0100 (CET)
From: Peter Eriksson <pen@lysator.liu.se>
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3601.0.10\))
Subject: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
Message-Id: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
Date: Mon, 2 Dec 2019 15:32:39 +0100
To: freebsd-fs@freebsd.org
X-Mailer: Apple Mail (2.3601.0.10)
X-Virus-Scanned: ClamAV using ClamSMTP
X-Rspamd-Queue-Id: 47RSJB6yjfz3QY2
X-Spamd-Bar: ----
Authentication-Results: mx1.freebsd.org; dkim=none;
 dmarc=pass (policy=none) header.from=liu.se;
 spf=pass (mx1.freebsd.org: domain of pen@lysator.liu.se designates
 130.236.254.3 as permitted sender) smtp.mailfrom=pen@lysator.liu.se
X-Spamd-Result: default: False [-4.12 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 FROM_HAS_DN(0.00)[];
 R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org];
 TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; RCVD_COUNT_THREE(0.00)[3];
 RCVD_TLS_LAST(0.00)[]; MV_CASE(0.50)[];
 RCVD_IN_DNSWL_MED(-0.20)[3.254.236.130.list.dnswl.org : 127.0.11.2];
 DMARC_POLICY_ALLOW(-0.50)[liu.se,none];
 IP_SCORE(-1.62)[ipnet: 130.236.0.0/16(-4.50), asn: 2843(-3.60), country:
 SE(-0.03)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:2843, ipnet:130.236.0.0/16, country:SE];
 MID_RHS_MATCH_FROM(0.00)[]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Dec 2019 14:32:44 -0000

I=E2=80=99ve been looking at trying to figure out why our servers take =
so long to reboot, where the most time is spent doing a =E2=80=9Cshutdown=E2=
=80=9D. We=E2=80=99ve seen examples where it has taken 10-20 minutes (or =
more).

This is Dell PowerEdge R730xd servers with 256GB RAM and ~140TB of =
disks. FreeBSD 11.3. With ~24000 filsystems per server.
We normally cap ARC to 128GB RAM.

Adding a lot of debugging printf() calls to relevant parts of the code =
points to:

kern_shutdown() ->
  EVENTHANDLER_INVOKE(shutdown_post_sync) ->
  zfsshutdown() ->=20
  zfs__fini() ->=20
  spa_fini() ->=20
  zio_fini():

Debug output from a test run:
zio_fini: kmem_cache_destroy(zio_buf_cache & zio_data_buf_cache):

kmem_cache_destroy: uma_zfree_arg(0xfffff803465eec00) [zio_buf_12288] =
took 16 seconds
kmem_cache_destroy(zio_buf_cache[20]) took 16 seconds

kmem_cache_destroy: uma_zfree_arg(0xfffff803465eeb00) [zio_buf_16384] =
took 61 seconds
kmem_cache_destroy(zio_buf_cache[28]) took 61 seconds

kmem_cache_destroy: uma_zfree_arg(0xfffff8034c9018c0) [zio_buf_131072] =
took 87 seconds
kmem_cache_destroy(zio_buf_cache[224]) took 87 seconds

kmem_cache_destroy: uma_zfree_arg(0xfffff8034c901880) =
[zio_data_buf_131072] took 5 seconds
kmem_cache_destroy(zio_data_buf_cache[224]) took 5 seconds

(I modified the code here to print the time spent if it took 2 seconds =
or more)

This is on a newly rebooted server (with all filesystems mounted). Seems =
like uma_zfree_arg() is taking really long to execute. Now that code =
isn=E2=80=99t exactly easy to read (for me atleast)=E2=80=A6 Lot=E2=80=99s=
 of barrier/locks and stuff.

I wonder why this code should take so long? There shouldn=E2=80=99t be =
any disk I/O involved and it=E2=80=99s just a cache so I wonder if there =
might be some way to get rid of it quicker? Any UMA experts online? :-)

Reason for this is that I=E2=80=99d like to be able to make sure a =
server reboots more quickly in case of problems. Now with the parallell =
ZFS mount stuff being done at boot time that part is much quicker :-).

- Peter


From owner-freebsd-fs@freebsd.org  Mon Dec  2 21:13:22 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 861741B86F2
 for <freebsd-fs@mailman.nyi.freebsd.org>; Mon,  2 Dec 2019 21:13:22 +0000 (UTC)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
Received: from uucp.dinoex.org (uucp.dinoex.sub.de [IPv6:2001:1440:5001:1::2])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "uucp.dinoex.sub.de",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 47RdBT4xQCz4Pq9
 for <freebsd-fs@FreeBSD.ORG>; Mon,  2 Dec 2019 21:13:21 +0000 (UTC)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
Received: from uucp.dinoex.sub.de (uucp.dinoex.org [185.220.148.12])
 by uucp.dinoex.org (8.16.0.41/8.16.0.41) with ESMTPS id xB2LD4XK059143
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO)
 for <freebsd-fs@FreeBSD.ORG>; Mon, 2 Dec 2019 22:13:05 +0100 (CET)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
X-MDaemon-Deliver-To: <freebsd-fs@FreeBSD.ORG>
X-Authentication-Warning: uucp.dinoex.sub.de: Host uucp.dinoex.org
 [185.220.148.12] claimed to be uucp.dinoex.sub.de
Received: from citylink.dinoex.sub.org (uucp@localhost)
 by uucp.dinoex.sub.de (8.16.0.41/8.16.0.41/Submit) with UUCP id xB2LD4Vr059142
 for freebsd-fs@FreeBSD.ORG; Mon, 2 Dec 2019 22:13:04 +0100 (CET)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
Received: from gate.oper.dinoex.org (gate-e [192.168.98.2])
 by citylink.dinoex.sub.de (8.15.2/8.15.2) with ESMTP id xB2KQGDQ038453
 for <freebsd-fs@FreeBSD.ORG>; Mon, 2 Dec 2019 21:26:16 +0100 (CET)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
Received: from gate.oper.dinoex.org (gate-e [192.168.98.2])
 by gate.oper.dinoex.org (8.15.2/8.15.2) with ESMTP id xB2KO1WF038018
 for <freebsd-fs@FreeBSD.ORG>; Mon, 2 Dec 2019 21:24:01 +0100 (CET)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
Received: (from news@localhost)
 by gate.oper.dinoex.org (8.15.2/8.15.2/Submit) id xB2KO0s8038017
 for freebsd-fs@FreeBSD.ORG; Mon, 2 Dec 2019 21:24:00 +0100 (CET)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
X-Authentication-Warning: gate.oper.dinoex.org: news set sender to
 li-fbsd@citylink.dinoex.sub.org using -f
From: Peter <peter@citylink.dinoex.sub.org>
Subject: Re: ZFS overgrowing arc_max -> OOM kills
Date: Mon, 02 Dec 2019 21:12:30 +0100
Organization: n/a
Message-ID: <op.0b6s24m7aas8k8@localhost>
References: <op.0b4a23t3io2kyh@localhost>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Injection-Info: oper.dinoex.de;
 logging-data="35134"; mail-complaints-to="usenet@citylink.dinoex.sub.org"
User-Agent: Opera Mail/12.16 (FreeBSD)
Sender: li-fbsd@citylink.dinoex.sub.org
To: freebsd-fs@FreeBSD.ORG
X-Milter: Spamilter (Reciever: uucp.dinoex.sub.de; Sender-ip: 185.220.148.12;
 Sender-helo: uucp.dinoex.sub.de; )
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2
 (uucp.dinoex.org [185.220.148.12]); Mon, 02 Dec 2019 22:13:07 +0100 (CET)
X-Rspamd-Queue-Id: 47RdBT4xQCz4Pq9
X-Spamd-Bar: ++++
Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none;
 spf=none (mx1.freebsd.org: domain of li-fbsd@citylink.dinoex.sub.org has no
 SPF policy when checking 2001:1440:5001:1::2)
 smtp.mailfrom=li-fbsd@citylink.dinoex.sub.org
X-Spamd-Result: default: False [4.27 / 15.00]; ARC_NA(0.00)[];
 RCVD_COUNT_FIVE(0.00)[5]; FROM_HAS_DN(0.00)[];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; MV_CASE(0.50)[];
 MIME_GOOD(-0.10)[text/plain]; HAS_XAW(0.00)[];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org];
 AUTH_NA(1.00)[]; RCPT_COUNT_ONE(0.00)[1];
 HAS_ORG_HEADER(0.00)[]; NEURAL_SPAM_MEDIUM(0.69)[0.692,0];
 MIME_TRACE(0.00)[0:+];
 IP_SCORE(0.39)[ip: (1.04), ipnet: 2001:1440::/32(0.52), asn: 8469(0.42),
 country: DE(-0.01)]; NEURAL_SPAM_LONG(0.98)[0.982,0];
 TO_DN_NONE(0.00)[]; R_SPF_NA(0.00)[];
 FORGED_SENDER(0.30)[peter@citylink.dinoex.sub.org,li-fbsd@citylink.dinoex.sub.org];
 DMARC_NA(0.00)[sub.org]; R_DKIM_NA(0.00)[];
 MID_RHS_NOT_FQDN(0.50)[];
 ASN(0.00)[asn:8469, ipnet:2001:1440::/32, country:DE];
 FROM_NEQ_ENVFROM(0.00)[peter@citylink.dinoex.sub.org,li-fbsd@citylink.dinoex.sub.org];
 RCVD_TLS_LAST(0.00)[]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Dec 2019 21:13:22 -0000

It seems this issue is a combination of two different matters.

On Sun, 01 Dec 2019 12:48:29 +0100, Peter Much  
<pmc@citylink.dinoex.sub.org> wrote:

>> Kernel Memory Map:                              1.38    GiB
>>         Size:                           98.79%  1.36    GiB
>>         Free:                           1.21%   17.10   MiB

Now I worked myself thru the code in arc.c, and everything there appears  
to work correctly.

What I seem to have is about 85% metadata, and 75% of what is called  
"overhead" data (arcstats.overhead_size), and that seems not evictable. So  
in that case the ARC goes beyond it's upper limit, and it goes beyond the  
provisioned kmem just the same way, because it's the same code. So, the  
exitus is inevitable.

The other issue is the OOM kills, and that is the same issue as already  
reported by many people (bug 232800 and 231457, where Mark Millard  
declared this behaviour "works as designed"). It is probably unrelated to  
ZFS (except that ZFS does not tend to improve memory abundance).

So now I go and hack my arc.c, and probably hack the vm pageout as well,  
and see how to make that stuff behave.
(Commentary and inspiration is welcome, as usual.)

rgds,
PMc

From owner-freebsd-fs@freebsd.org  Mon Dec  2 22:39:08 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 0FB4D1BAD7E
 for <freebsd-fs@mailman.nyi.freebsd.org>; Mon,  2 Dec 2019 22:39:08 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (mail.lysator.liu.se
 [IPv6:2001:6b0:17:f0a0::3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 47Rg5P5lWNz4VTV
 for <freebsd-fs@freebsd.org>; Mon,  2 Dec 2019 22:39:05 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id 16C2340026
 for <freebsd-fs@freebsd.org>; Mon,  2 Dec 2019 23:39:02 +0100 (CET)
Received: from [192.168.1.132] (h-201-140.A785.priv.bahnhof.se
 [98.128.201.140])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id EA55D4001A
 for <freebsd-fs@freebsd.org>; Mon,  2 Dec 2019 23:39:01 +0100 (CET)
From: Peter Eriksson <pen@lysator.liu.se>
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3601.0.10\))
Subject: Re: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
Date: Mon, 2 Dec 2019 23:39:01 +0100
References: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
To: =?utf-8?Q?Karli_Sj=C3=B6berg_via_freebsd-fs?= <freebsd-fs@freebsd.org>
In-Reply-To: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
Message-Id: <D2A11CE9-9B24-4E40-A51A-8D318E0288C9@lysator.liu.se>
X-Mailer: Apple Mail (2.3601.0.10)
X-Virus-Scanned: ClamAV using ClamSMTP
X-Rspamd-Queue-Id: 47Rg5P5lWNz4VTV
X-Spamd-Bar: ---
Authentication-Results: mx1.freebsd.org; dkim=none;
 dmarc=pass (policy=none) header.from=liu.se;
 spf=pass (mx1.freebsd.org: domain of pen@lysator.liu.se designates
 2001:6b0:17:f0a0::3 as permitted sender) smtp.mailfrom=pen@lysator.liu.se
X-Spamd-Result: default: False [-3.10 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0];
 FROM_HAS_DN(0.00)[];
 R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; RCPT_COUNT_ONE(0.00)[1];
 RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[];
 TO_DN_ALL(0.00)[];
 RCVD_IN_DNSWL_MED(-0.20)[3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.a.0.f.7.1.0.0.0.b.6.0.1.0.0.2.list.dnswl.org
 : 127.0.11.2]; DMARC_POLICY_ALLOW(-0.50)[liu.se,none];
 MV_CASE(0.50)[];
 IP_SCORE(-0.60)[ip: (-2.13), ipnet: 2001:6b0::/32(-0.48), asn: 1653(-0.39),
 country: EU(-0.00)]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:1653, ipnet:2001:6b0::/32, country:EU];
 MID_RHS_MATCH_FROM(0.00)[]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Dec 2019 22:39:08 -0000

Sigh.

Slight correction, the output below should have said uma_zdestroy() and =
not uma_zfree_arg() (wrong printf text, but the right times).

After an uptime of 7 hours, a reboot have these times (I removed the =
=E2=80=9Cuma=E2=80=9D printf in this run):

kmem_cache_destroy(zio_data_buf_cache[8]) took 2 seconds
kmem_cache_destroy(zio_buf_cache[10]) took 6 seconds
kmem_cache_destroy(zio_buf_cache[14]) took 2 seconds
kmem_cache_destroy(zio_buf_cache[16]) took 136 seconds
kmem_cache_destroy(zio_buf_cache[20]) took 31 seconds
kmem_cache_destroy(zio_buf_cache[28]) took 303 seconds
kmem_cache_destroy(zio_buf_cache[224]) took 89 seconds
kmem_cache_destroy(zio_data_buf_cache[224]) took 31 seconds

This is on a mostly idle server (well, apart from compiling the kernel =
code :-) and some snapshots being taken of all filesystems (once per =
hour).


So now on to finding out why uma_destroy() is taking so long=E2=80=A6 =
:-).

- Peter


> On 2 Dec 2019, at 15:32, Peter Eriksson <pen@lysator.liu.se> wrote:
>=20
> I=E2=80=99ve been looking at trying to figure out why our servers take =
so long to reboot, where the most time is spent doing a =E2=80=9Cshutdown=E2=
=80=9D. We=E2=80=99ve seen examples where it has taken 10-20 minutes (or =
more).
>=20
> This is Dell PowerEdge R730xd servers with 256GB RAM and ~140TB of =
disks. FreeBSD 11.3. With ~24000 filsystems per server.
> We normally cap ARC to 128GB RAM.
>=20
> Adding a lot of debugging printf() calls to relevant parts of the code =
points to:
>=20
> kern_shutdown() ->
>  EVENTHANDLER_INVOKE(shutdown_post_sync) ->
>  zfsshutdown() ->=20
>  zfs__fini() ->=20
>  spa_fini() ->=20
>  zio_fini():
>=20
> Debug output from a test run:
> zio_fini: kmem_cache_destroy(zio_buf_cache & zio_data_buf_cache):
>=20
> kmem_cache_destroy: uma_zfree_arg(0xfffff803465eec00) [zio_buf_12288] =
took 16 seconds
> kmem_cache_destroy(zio_buf_cache[20]) took 16 seconds
>=20
> kmem_cache_destroy: uma_zfree_arg(0xfffff803465eeb00) [zio_buf_16384] =
took 61 seconds
> kmem_cache_destroy(zio_buf_cache[28]) took 61 seconds
>=20
> kmem_cache_destroy: uma_zfree_arg(0xfffff8034c9018c0) [zio_buf_131072] =
took 87 seconds
> kmem_cache_destroy(zio_buf_cache[224]) took 87 seconds
>=20
> kmem_cache_destroy: uma_zfree_arg(0xfffff8034c901880) =
[zio_data_buf_131072] took 5 seconds
> kmem_cache_destroy(zio_data_buf_cache[224]) took 5 seconds
>=20
> (I modified the code here to print the time spent if it took 2 seconds =
or more)
>=20
> This is on a newly rebooted server (with all filesystems mounted). =
Seems like uma_zfree_arg() is taking really long to execute. Now that =
code isn=E2=80=99t exactly easy to read (for me atleast)=E2=80=A6 =
Lot=E2=80=99s of barrier/locks and stuff.
>=20
> I wonder why this code should take so long? There shouldn=E2=80=99t be =
any disk I/O involved and it=E2=80=99s just a cache so I wonder if there =
might be some way to get rid of it quicker? Any UMA experts online? :-)
>=20
> Reason for this is that I=E2=80=99d like to be able to make sure a =
server reboots more quickly in case of problems. Now with the parallell =
ZFS mount stuff being done at boot time that part is much quicker :-).
>=20
> - Peter
>=20
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@freebsd.org  Mon Dec  2 22:54:30 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id C9C7E1BB52A
 for <freebsd-fs@mailman.nyi.freebsd.org>; Mon,  2 Dec 2019 22:54:30 +0000 (UTC)
 (envelope-from markjdb@gmail.com)
Received: from mail-qk1-x72b.google.com (mail-qk1-x72b.google.com
 [IPv6:2607:f8b0:4864:20::72b])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 47RgR92sVgz4WVX
 for <freebsd-fs@freebsd.org>; Mon,  2 Dec 2019 22:54:29 +0000 (UTC)
 (envelope-from markjdb@gmail.com)
Received: by mail-qk1-x72b.google.com with SMTP id v23so1470351qkg.2
 for <freebsd-fs@freebsd.org>; Mon, 02 Dec 2019 14:54:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=sender:date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:content-transfer-encoding:in-reply-to
 :user-agent; bh=RsXJYb88LHCvWCJzesynw/K27JrgCfAiPAdKpxbJXew=;
 b=MPfq213+Kf0l8VPeXvmxCFsiefFD6BIoJoAd5u2eIjWSvNUKi+h3AM33YawmwHFqae
 hU477Utb6GZd4FQ5kH06QVhqs5ithQYYWWKBVuDCy26SfC9FWQQJxAVDm+dF5+gTIpWK
 A+BVYd/9ZQjkoq+3CJp7KrtSMHluLyzzqvtNCH6yZRojpXBZ6SrJTSVwZOhwC8r1kvk1
 3HwXGyQLGi6SqYe2e0S6v1ieNIOo7u6XxUqOg112CodZcTpwen4D8b+5zTDVUWp8BqPp
 xkx/b0VDjJ7dDIhkoRn4AKZG+4XCm6nWWHr0TdceqhO4ABThbRKUv+lMu4okKuycIRpv
 pdrQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:sender:date:from:to:cc:subject:message-id
 :references:mime-version:content-disposition
 :content-transfer-encoding:in-reply-to:user-agent;
 bh=RsXJYb88LHCvWCJzesynw/K27JrgCfAiPAdKpxbJXew=;
 b=gMmjKzev72fPmF+J75NXuKoIx1IXQtboObUIlSpKyhZJY66ySpLIEiGfV5Ckw50vw+
 EWgoGl0QonIl6UfnmUP24SNAzBOT27ODpDjzEfLUN8W48WVVd65J49XHD61Rw5T1l2qI
 ORtOeChJuYaf7p6uGaNUdvC4XVVgjw+clpADg9SZ2XG4ui6xcfk8lmEhyIOi+CEWdKIS
 7pZfo41SNA3V8NhdeTTBKq7GOAogkCqTk8IjdabC4DHFW0L+IWM5nXtawHJMsTwvkCio
 +7zMIVaa0BmYeetCK9yBYI1cUzS4A4McajiS38BNwX7K/RRyOs2XQCS0KbbJHPc4X6lm
 pioA==
X-Gm-Message-State: APjAAAU4TKPbSkGWUsnIBulw8OmZuytP7kf5FRm9U2YTW5IhI/EWsYLu
 yNWeSuhvhNhNlPQEcEVy0gLip0CxLf8=
X-Google-Smtp-Source: APXvYqwR/PF2IiaX0IBOkUhGtT17WWquYixCDNBK3J45fU8JMhlbCqUORb6BNfVLy83N6+id6Hk0ZA==
X-Received: by 2002:a37:81c6:: with SMTP id c189mr1679653qkd.223.1575327268093; 
 Mon, 02 Dec 2019 14:54:28 -0800 (PST)
Received: from raichu (toroon0560w-lp130-05-69-158-183-252.dsl.bell.ca.
 [69.158.183.252])
 by smtp.gmail.com with ESMTPSA id h28sm621893qte.54.2019.12.02.14.54.26
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Mon, 02 Dec 2019 14:54:27 -0800 (PST)
Sender: Mark Johnston <markjdb@gmail.com>
Date: Mon, 2 Dec 2019 17:54:24 -0500
From: Mark Johnston <markj@freebsd.org>
To: Peter Eriksson <pen@lysator.liu.se>
Cc: freebsd-fs@freebsd.org
Subject: Re: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
Message-ID: <20191202225424.GG43802@raichu>
References: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
 <D2A11CE9-9B24-4E40-A51A-8D318E0288C9@lysator.liu.se>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <D2A11CE9-9B24-4E40-A51A-8D318E0288C9@lysator.liu.se>
User-Agent: Mutt/1.12.2 (2019-09-21)
X-Rspamd-Queue-Id: 47RgR92sVgz4WVX
X-Spamd-Bar: ----
Authentication-Results: mx1.freebsd.org;
 dkim=pass header.d=gmail.com header.s=20161025 header.b=MPfq213+;
 dmarc=none;
 spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates
 2607:f8b0:4864:20::72b as permitted sender) smtp.mailfrom=markjdb@gmail.com
X-Spamd-Result: default: False [-4.43 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[];
 R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[];
 TO_DN_SOME(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org];
 DMARC_NA(0.00)[freebsd.org]; RCVD_COUNT_THREE(0.00)[3];
 MIME_TRACE(0.00)[0:+]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2];
 RCVD_IN_DNSWL_NONE(0.00)[b.2.7.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org
 : 127.0.5.0]; 
 IP_SCORE(-2.73)[ip: (-9.40), ipnet: 2607:f8b0::/32(-2.24), asn: 15169(-1.94),
 country: US(-0.05)]; 
 FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com];
 MID_RHS_NOT_FQDN(0.50)[]; FREEMAIL_ENVFROM(0.00)[gmail.com];
 ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US];
 FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com];
 RCVD_TLS_ALL(0.00)[]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Dec 2019 22:54:30 -0000

On Mon, Dec 02, 2019 at 11:39:01PM +0100, Peter Eriksson wrote:
> Sigh.
> 
> Slight correction, the output below should have said uma_zdestroy() and not uma_zfree_arg() (wrong printf text, but the right times).
> 
> After an uptime of 7 hours, a reboot have these times (I removed the “uma” printf in this run):
> 
> kmem_cache_destroy(zio_data_buf_cache[8]) took 2 seconds
> kmem_cache_destroy(zio_buf_cache[10]) took 6 seconds
> kmem_cache_destroy(zio_buf_cache[14]) took 2 seconds
> kmem_cache_destroy(zio_buf_cache[16]) took 136 seconds
> kmem_cache_destroy(zio_buf_cache[20]) took 31 seconds
> kmem_cache_destroy(zio_buf_cache[28]) took 303 seconds
> kmem_cache_destroy(zio_buf_cache[224]) took 89 seconds
> kmem_cache_destroy(zio_data_buf_cache[224]) took 31 seconds
> 
> This is on a mostly idle server (well, apart from compiling the kernel code :-) and some snapshots being taken of all filesystems (once per hour).
> 
> 
> So now on to finding out why uma_destroy() is taking so long… :-).

uma_destroy() frees all of the memory cached in the zone back to the
page allocator.  This operation takes time proportional to the number of
cached items.  I would expect most of the time to be spent in
zone_reclaim(), called by zone_dtor().

Before rebooting, could you try grabbing the output of vmstat -z?

From owner-freebsd-fs@freebsd.org  Tue Dec  3 08:34:21 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 65CB11CD677
 for <freebsd-fs@mailman.nyi.freebsd.org>; Tue,  3 Dec 2019 08:34:21 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 47RwJ84PcPz44s5;
 Tue,  3 Dec 2019 08:34:16 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id 69BF44002A;
 Tue,  3 Dec 2019 09:34:10 +0100 (CET)
Received: from [IPv6:2001:6b0:17:f002:182f:9bd3:109e:7075] (unknown
 [IPv6:2001:6b0:17:f002:182f:9bd3:109e:7075])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id 321AE40005;
 Tue,  3 Dec 2019 09:34:10 +0100 (CET)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3601.0.10\))
Subject: Re: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
From: Peter Eriksson <pen@lysator.liu.se>
In-Reply-To: <20191202225424.GG43802@raichu>
Date: Tue, 3 Dec 2019 09:34:09 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <BDE6D2DB-EE3F-4EEF-9C03-3E9B5D128170@lysator.liu.se>
References: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
 <D2A11CE9-9B24-4E40-A51A-8D318E0288C9@lysator.liu.se>
 <20191202225424.GG43802@raichu>
To: FreeBSD FS <freebsd-fs@freebsd.org>
X-Mailer: Apple Mail (2.3601.0.10)
X-Virus-Scanned: ClamAV using ClamSMTP
X-Rspamd-Queue-Id: 47RwJ84PcPz44s5
X-Spamd-Bar: -----
Authentication-Results: mx1.freebsd.org; dkim=none;
 dmarc=pass (policy=none) header.from=liu.se;
 spf=pass (mx1.freebsd.org: domain of pen@lysator.liu.se designates
 130.236.254.3 as permitted sender) smtp.mailfrom=pen@lysator.liu.se
X-Spamd-Result: default: False [-5.95 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 FROM_HAS_DN(0.00)[];
 R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain];
 MV_CASE(0.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0];
 RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[];
 TO_DN_ALL(0.00)[];
 RCVD_IN_DNSWL_MED(-0.20)[3.254.236.130.list.dnswl.org : 127.0.11.2];
 RCPT_COUNT_TWO(0.00)[2];
 DMARC_POLICY_ALLOW(-0.50)[liu.se,none];
 IP_SCORE(-3.45)[ip: (-9.04), ipnet: 130.236.0.0/16(-4.56), asn: 2843(-3.64),
 country: SE(-0.03)]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:2843, ipnet:130.236.0.0/16, country:SE];
 MID_RHS_MATCH_FROM(0.00)[]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Dec 2019 08:34:21 -0000

Sure. Here=E2=80=99s one from just before I rebooted last evening (at =
the 7 hour uptime mark):

ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP

UMA Kegs:               384,      0,     247,       3,     247,   0,   0
UMA Zones:             4736,      0,     264,       0,     264,   0,   0
UMA Slabs:               80,      0, 2894986,      14, 2898224,   0,   0
UMA Hash:               256,      0,      48,      72,     108,   0,   0
4 Bucket:                32,      0,    3497,    9753,  338875,   0,   0
6 Bucket:                48,      0,    1700,   10501,   89938,   0,   0
8 Bucket:                64,      0,     611,   10053,  118928,  15,   0
12 Bucket:               96,      0,    3293,    6096,  131089,   0,   0
16 Bucket:              128,      0,     834,    5304,   55984,   0,   0
32 Bucket:              256,      0,    2747,    5668,  373187, 139,   0
64 Bucket:              512,      0,    1884,    4436,  483281, 139,   0
128 Bucket:            1024,      0,    4828,    2996,  613296,16078,   =
0
256 Bucket:            2048,      0,   34106,    1058, 1566552, 164,   0
vmem btag:               56,      0,  904562,    3741,  905062,6399,   0
VM OBJECT:              240,      0,   89303,    5129,24512558,   0,   0
RADIX NODE:             144,      0,  634446,   15444,60638431,   0,   0
MAP:                    240,      0,       3,      61,       3,   0,   0
KMAP ENTRY:             128,      0,      12,     639,      12,   0,   0
MAP ENTRY:              128,      0,  105906,   71972,273723932,   0,   =
0
VMSPACE:               2520,      0,     120,     294, 1064743,   0,   0
fakepg:                 104,      0,       7,    3755,    3136,   0,   0
mt_zone:              16400,      0,     420,       0,     420,   0,   0
16:                      16,      0, 1810762,    5474,247600975,   0,   =
0
32:                      32,      0,  430077,  113173,777148991,   0,   =
0
64:                      64,      0,24475835, 6313799,350608033,   0,   =
0
128:                    128,      0, 3914907,   76529,394374364,   0,   =
0
256:                    256,      0,  508512, 1081773,201807283,   0,   =
0
512:                    512,      0,   74387,  185061,131179473,   0,   =
0
1024:                  1024,      0,  127067,    8929,51960099,   0,   0
2048:                  2048,      0,   48730,    1396, 5337386,   0,   0
4096:                  4096,      0,  276060,     827, 6748826,   0,   0
8192:                  8192,      0,      54,      68, 1059804,   0,   0
16384:                16384,      0,      38,      65,   23218,   0,   0
32768:                32768,      0,     139,     129,   21968,   0,   0
65536:                65536,      0,     115,      49,  119907,   0,   0
64 pcpu:                  8,      0,    4020,    4172,    4118,   0,   0
SLEEPQUEUE:              80,      0,    2131,    2767,    2131,   0,   0
Files:                   80,      0,     454,    4796,28333466,   0,   0
filedesc0:             1104,      0,     152,     433, 1064769,   0,   0
TURNSTILE:              136,      0,    2131,    1289,    2131,   0,   0
rl_entry:                40,      0,     561,    7539,     561,   0,   0
umtx pi:                 96,      0,       0,       0,       0,   0,   0
umtx_shm:                88,      0,   31217,    3928,   34273,   0,   0
MAC labels:              40,      0,       0,       0,       0,   0,   0
PROC:                  1424,      0,     152,     434, 1064768,   0,   0
THREAD:                1552,      0,    1704,     426,    9448,   0,   0
cpuset:                  96,      0,    1208,    3712,    1529,   0,   0
audit_record:          1248,      0,       0,    4929,44799582,   0,   0
mbuf_packet:            256, 104630070,   24552,   12386, 4231181,   0,  =
 0
mbuf:                   256, 104630070,       3,   12394, 8548105,   0,  =
 0
mbuf_cluster:          2048, 16348446,   36938,     138,   60263,   0,   =
0
mbuf_jumbo_page:       4096, 8174223,       0,     141,  542447,   0,   =
0
mbuf_jumbo_9k:         9216, 2421992,       0,       0,       0,   0,   =
0
mbuf_jumbo_16k:       16384, 1362370,       0,       0,       0,   0,   =
0
DMAR_MAP_ENTRY:         120,      0,       0,       0,       0,   0,   0
ttyinq:                 160,      0,     360,     765,     990,   0,   0
ttyoutq:                256,      0,     188,     667,     517,   0,   0
FPU_save_area:          832,      0,       0,       0,       0,   0,   0
g_bio:                  376,      0,       0,    2720,11874693,   0,   0
nvme_request:           128,      0,       4,    4398,   26030,   0,   0
vtnet_tx_hdr:            24,      0,       0,       0,       0,   0,   0
cryptop:                 88,      0,       0,    4005,  647540,   0,   0
cryptodesc:              72,      0,       0,    4015,  647540,   0,   0
taskq_zone:              48,      0,       0,   35192, 2413940,   0,   0
VNODE:                  472,      0,  142242,    1286,  601710,   0,   0
VNODEPOLL:              120,      0,       2,     625,       9,   0,   0
BUF TRIE:               144,      0,       0,  105948,       0,   0,   0
S VFS Cache:            108,      0,   89310,    2950, 1595636,   0,   0
STS VFS Cache:          148,      0,       0,       0,       0,   0,   0
L VFS Cache:            328,      0,      67,    1037,    1791,   0,   0
LTS VFS Cache:          368,      0,       0,       0,       0,   0,   0
NAMEI:                 1024,      0,       0,     704,154021593,   0,   =
0
rentr:                   24,      0,       0,    8183,    2858,   0,   0
NCLNODE:                576,      0,       0,       0,       0,   0,   0
DIRHASH:               1024,      0,       0,       0,       0,   0,   0
pipe:                   760,      0,      32,     568,  297208,   0,   0
procdesc:               136,      0,       1,    1710,     897,   0,   0
AIO:                    208,      0,       0,       0,       0,   0,   0
AIOP:                    32,      0,       0,       0,       0,   0,   0
AIOCB:                  752,      0,       0,       0,       0,   0,   0
AIOLIO:                 280,      0,       0,       0,       0,   0,   0
range_seg_cache:         72,      0, 1117410,   81040,23301057,   0,   0
metaslab_alloc_trace_cache:     72,      0,       0,   10065, 4408239,   =
0,   0
zio_cache:             1048,      0,      73,  115556,241609900,   0,   =
0
zio_link_cache:          48,      0,       0,  116200,64873334,   0,   0
zio_buf_512:            512,      0,  555291,    1877,54193375,   0,   0
zio_data_buf_512:       512,      0,    1805,    1331, 1491667,   0,   0
zio_buf_1024:          1024,      0,    1159,   24081,  224958,   0,   0
zio_data_buf_1024:     1024,      0,     968,     632,   29721,   0,   0
zio_buf_1536:          1536,      0,     342,   24328,  210638,   0,   0
zio_data_buf_1536:     1536,      0,    1129,     401,   49580,   0,   0
zio_buf_2048:          2048,      0,   48731,     531,  630275,   0,   0
zio_data_buf_2048:     2048,      0,    2176,     160,   21869,   0,   0
zio_buf_2560:          2560,      0,      64,     119,  179116,   0,   0
zio_data_buf_2560:     2560,      0,    2181,     132,   20371,   0,   0
zio_buf_3072:          3072,      0,      46,     110,  177331,   0,   0
zio_data_buf_3072:     3072,      0,    1720,     116,   14832,   0,   0
zio_buf_3584:          3584,      0,      40,      50,  174997,   0,   0
zio_data_buf_3584:     3584,      0,    1262,      94,   15376,   0,   0
zio_buf_4096:          4096,      0,     527,    9091, 8627750,   0,   0
zio_data_buf_4096:     4096,      0,    1139,     376,  236336,   0,   0
zio_buf_5120:          5120,      0,     752,     145,  345173,   0,   0
zio_data_buf_5120:     5120,      0,    1584,     136,   28763,   0,   0
zio_buf_6144:          6144,      0,    5941,    3678,  346446,   0,   0
zio_data_buf_6144:     6144,      0,    1059,      61,   24013,   0,   0
zio_buf_7168:          7168,      0,     167,     177,  325035,   0,   0
zio_data_buf_7168:     7168,      0,     880,      56,   21846,   0,   0
zio_buf_8192:          8192,      0,    1110,    1539,  871622,   0,   0
zio_data_buf_8192:     8192,      0,     698,      81,   24196,   0,   0
zio_buf_10240:        10240,      0,  131387,   66570,  521772,   0,   0
zio_data_buf_10240:   10240,      0,    1177,      39,   47154,   0,   0
zio_buf_12288:        12288,      0,   23008,   22661,  630894,   0,   0
zio_data_buf_12288:   12288,      0,    1499,     252,   39611,   0,   0
zio_buf_14336:        14336,      0,       4,     108,    5683,   0,   0
zio_data_buf_14336:   14336,      0,     884,     532,   32480,   0,   0
zio_buf_16384:        16384,      0,  321809,  122568, 1774887,   0,   0
zio_data_buf_16384:   16384,      0,     826,      67,   32998,   0,   0
zio_buf_20480:        20480,      0,      24,     131,  109109,   0,   0
zio_data_buf_20480:   20480,      0,    1414,      97,   49762,   0,   0
zio_buf_24576:        24576,      0,       5,     201,  208036,   0,   0
zio_data_buf_24576:   24576,      0,     788,      94,   46730,   0,   0
zio_buf_28672:        28672,      0,       0,     517,  260314,   0,   0
zio_data_buf_28672:   28672,      0,     563,      50,   44097,   0,   0
zio_buf_32768:        32768,      0,       2,     121,   71174,   0,   0
zio_data_buf_32768:   32768,      0,     485,      59,   44268,   0,   0
zio_buf_40960:        40960,      0,       5,     189,  120351,   0,   0
zio_data_buf_40960:   40960,      0,     846,     148,   88458,   0,   0
zio_buf_49152:        49152,      0,       2,     496,  114293,   0,   0
zio_data_buf_49152:   49152,      0,    1252,      91,   36323,   0,   0
zio_buf_57344:        57344,      0,       0,     206,   93761,   0,   0
zio_data_buf_57344:   57344,      0,     617,     298,   21824,   0,   0
zio_buf_65536:        65536,      0,       1,     213,   50490,   0,   0
zio_data_buf_65536:   65536,      0,     504,      96,   20772,   0,   0
zio_buf_81920:        81920,      0,       2,     124,   76214,   0,   0
zio_data_buf_81920:   81920,      0,     772,     227,   38083,   0,   0
zio_buf_98304:        98304,      0,       1,     139,   77399,   0,   0
zio_data_buf_98304:   98304,      0,     568,      79,   33028,   0,   0
zio_buf_114688:      114688,      0,       2,     122,   69859,   0,   0
zio_data_buf_114688: 114688,      0,     467,      70,   34427,   0,   0
zio_buf_131072:      131072,      0,  126330,    2110,  803167,   0,   0
zio_data_buf_131072: 131072,      0,   38681,     195,  542909,   0,   0
zio_buf_163840:      163840,      0,       0,     119,   68167,   0,   0
zio_data_buf_163840: 163840,      0,       0,       0,       0,   0,   0
zio_buf_196608:      196608,      0,       0,     116,   26793,   0,   0
zio_data_buf_196608: 196608,      0,       0,       0,       0,   0,   0
zio_buf_229376:      229376,      0,       0,      44,    8159,   0,   0
zio_data_buf_229376: 229376,      0,       0,       0,       0,   0,   0
zio_buf_262144:      262144,      0,       0,      43,    5843,   0,   0
zio_data_buf_262144: 262144,      0,       0,       0,       0,   0,   0
zio_buf_327680:      327680,      0,       0,      44,    7784,   0,   0
zio_data_buf_327680: 327680,      0,       0,       0,       0,   0,   0
zio_buf_393216:      393216,      0,       0,      42,    4732,   0,   0
zio_data_buf_393216: 393216,      0,       0,       0,       0,   0,   0
zio_buf_458752:      458752,      0,       0,      42,    3733,   0,   0
zio_data_buf_458752: 458752,      0,       0,       0,       0,   0,   0
zio_buf_524288:      524288,      0,       0,      42,    1996,   0,   0
zio_data_buf_524288: 524288,      0,       0,       0,       0,   0,   0
zio_buf_655360:      655360,      0,       0,     109,    2377,   0,   0
zio_data_buf_655360: 655360,      0,       0,       0,       0,   0,   0
zio_buf_786432:      786432,      0,       0,      41,     959,   0,   0
zio_data_buf_786432: 786432,      0,       0,       0,       0,   0,   0
zio_buf_917504:      917504,      0,       0,      38,     478,   0,   0
zio_data_buf_917504: 917504,      0,       0,       0,       0,   0,   0
zio_buf_1048576:     1048576,      0,       0,      39,     556,   0,   =
0
zio_data_buf_1048576: 1048576,      0,       0,       0,       0,   0,   =
0
zio_buf_1310720:     1310720,      0,       0,       3,       4,   0,   =
0
zio_data_buf_1310720: 1310720,      0,       0,       0,       0,   0,   =
0
zio_buf_1572864:     1572864,      0,       0,       0,       0,   0,   =
0
zio_data_buf_1572864: 1572864,      0,       0,       0,       0,   0,   =
0
zio_buf_1835008:     1835008,      0,       0,       0,       0,   0,   =
0
zio_data_buf_1835008: 1835008,      0,       0,       0,       0,   0,   =
0
zio_buf_2097152:     2097152,      0,       0,       0,       0,   0,   =
0
zio_data_buf_2097152: 2097152,      0,       0,       0,       0,   0,   =
0
zio_buf_2621440:     2621440,      0,       0,       0,       0,   0,   =
0
zio_data_buf_2621440: 2621440,      0,       0,       0,       0,   0,   =
0
zio_buf_3145728:     3145728,      0,       0,       0,       0,   0,   =
0
zio_data_buf_3145728: 3145728,      0,       0,       0,       0,   0,   =
0
zio_buf_3670016:     3670016,      0,       0,       0,       0,   0,   =
0
zio_data_buf_3670016: 3670016,      0,       0,       0,       0,   0,   =
0
zio_buf_4194304:     4194304,      0,       0,       0,       0,   0,   =
0
zio_data_buf_4194304: 4194304,      0,       0,       0,       0,   0,   =
0
zio_buf_5242880:     5242880,      0,       0,       0,       0,   0,   =
0
zio_data_buf_5242880: 5242880,      0,       0,       0,       0,   0,   =
0
zio_buf_6291456:     6291456,      0,       0,       0,       0,   0,   =
0
zio_data_buf_6291456: 6291456,      0,       0,       0,       0,   0,   =
0
zio_buf_7340032:     7340032,      0,       0,       0,       0,   0,   =
0
zio_data_buf_7340032: 7340032,      0,       0,       0,       0,   0,   =
0
zio_buf_8388608:     8388608,      0,       0,       0,       0,   0,   =
0
zio_data_buf_8388608: 8388608,      0,       0,       0,       0,   0,   =
0
zio_buf_10485760:    10485760,      0,       0,       0,       0,   0,   =
0
zio_data_buf_10485760: 10485760,      0,       0,       0,       0,   0, =
  0
zio_buf_12582912:    12582912,      0,       0,       0,       0,   0,   =
0
zio_data_buf_12582912: 12582912,      0,       0,       0,       0,   0, =
  0
zio_buf_14680064:    14680064,      0,       0,       0,       0,   0,   =
0
zio_data_buf_14680064: 14680064,      0,       0,       0,       0,   0, =
  0
zio_buf_16777216:    16777216,      0,       0,       0,       0,   0,   =
0
zio_data_buf_16777216: 16777216,      0,       0,       0,       0,   0, =
  0
lz4_ctx:              16384,      0,       0,     167, 2371452,   0,   0
abd_chunk:             4096,      0, 1498299,  193360,66794913,   0,   0
sa_cache:               144,      0,  117910,    1970,  610708,   0,   0
dnode_t:                728,      0, 1118459,    1106, 2539158,   0,   0
arc_buf_hdr_t_full:     224,      0,  985562,    1662,19179064,   0,   0
arc_buf_hdr_t_l2only:     88,      0,       0,    4005,   25678,   0,   =
0
arc_buf_t:               64,      0,  729765,  134763,19547924,   0,   0
dmu_buf_impl_t:         240,      0, 1236172,   97444,16369819,   0,   0
zil_lwb_cache:          320,      0,       3,     345,      54,   0,   0
zil_zcw_cache:           80,      0,       0,    1650,      51,   0,   0
sio_cache:              128,      0,       0,       0,       0,   0,   0
zfs_znode_cache:        264,      0,  117910,    1235,  553283,   0,   0
Mountpoints:            816,      0,   24268,     202,   24268,   0,   0
ksiginfo:               112,      0,     570,    6570,   61686,   0,   0
itimer:                 352,      0,       1,     175,       9,   0,   0
KNOTE:                  128,      0,      45,    3458,   66190,   0,   0
socket:                 872, 8384332,     130,     562,  923394,   0,   =
0
ipq:                     56,  51262,       0,       0,       0,   0,   0
udp_inpcb:              464, 8384336,      25,    1199,  274109,   0,   =
0
udpcb:                   32, 8384375,      25,    8100,  274109,   0,   =
0
tcp_inpcb:              464, 8384336,      36,     892,   36547,   0,   =
0
tcpcb:                 1048, 8384334,      36,     270,   36547,   0,   =
0
tcptw:                   88,  27810,       0,    2880,     897,   0,   0
syncache:               160,  15375,       0,    1900,   34233,   0,   0
hostcache:              128,  15376,      21,    1126,      23,   0,   0
sackhole:                32,      0,       0,       0,       0,   0,   0
tcpreass:                48, 1021813,       0,    1328,       9,   0,   =
0
sctp_ep:               1248, 8384334,       0,       0,       0,   0,   =
0
sctp_asoc:             2408,  40000,       0,       0,       0,   0,   0
sctp_laddr:              48,  80012,       0,    1577,       6,   0,   0
sctp_raddr:             736,  80000,       0,       0,       0,   0,   0
sctp_chunk:             152, 400010,       0,       0,       0,   0,   0
sctp_readq:             152, 400010,       0,       0,       0,   0,   0
sctp_stream_msg_out:    112, 400015,       0,       0,       0,   0,   0
sctp_asconf:             40, 400000,       0,       0,       0,   0,   0
sctp_asconf_ack:         48, 400060,       0,       0,       0,   0,   0
udplite_inpcb:          464, 8384336,       0,       0,       0,   0,   =
0
ripcb:                  464, 8384336,       2,     902,    1792,   0,   =
0
unpcb:                  240, 8384336,      65,    2447,  610936,   0,   =
0
IPsec SA lft_c:          16,      0,       0,       0,       0,   0,   0
rtentry:                208,      0,      18,     723,      20,   0,   0
selfd:                   64,      0,     677,    8995,47464291,   0,   0
swpctrie:               144, 32696973,       0,       0,       0,   0,   =
0
swblk:                  136, 32696949,       0,       0,       0,   0,   =
0
pf mtags:                48,      0,       0,    2822,      41,   0,   0
pf states:              296, 100009,     200,    1113,  109141,   0,   0
pf state keys:           88,      0,     200,    5065,  109141,   0,   0
pf source nodes:        136,  50025,       0,       0,       0,   0,   0
pf table entries:       216, 200016,      18,      54,      18,   0,   0
pf frags:               112,      0,       0,       0,       0,   0,   0
pf frag entries:         40,  50000,       0,       0,       0,   0,   0
pf state scrubs:         40,      0,       0,       0,       0,   0,   0




If I compare that one from a production server that has been running for =
56 days and have about 500 active users (mostly SMB but some NFS), then =
the zio_* values that differ much are:

ITEM                           SIZE  LIMIT     USED     FREE      REQ =
FAIL SLEEP
zio_buf_512:                512,      0,  23957617,  20604807,  =
28379999753,   0,   0
zio_buf_2048:            2048,      0,        48613,       10045,       =
152698166,   0,   0
zio_buf_5120:            5120,      0,      147618,     145002,        =
114926222,   0,   0
zio_buf_6144:            6144,      0,       213681,    296569,          =
96756130,   0,   0
zio_buf_10240:        10240,      0,        15485,       28955,          =
26953062,   0,   0
zio_buf_12288:        12288,      0,          8795,       29201,        =
383299702,   0,   0.  (zio_buf_cache[20])
zio_buf_16384:        16384,      0,    1300372,    391818,       =
1519180982,   0,   0.  (zio_buf_cache[28])


The test server for comparison:

ITEM                          SIZE  LIMIT     USED     FREE      REQ =
FAIL SLEEP
zio_buf_512:                512,      0,  555291,      1877,  54193375,  =
 0,   0
zio_buf_5120:            5120,      0,        752,        145,      =
345173,   0,   0
zio_buf_6144:            6144,      0,       5941,      3678,     =
346446,   0,   0
zio_buf_10240:        10240,      0,   131387,    66570,      521772,   =
0,   0
zio_buf_12288:        12288,      0,     23008,    22661,      630894,   =
0,   0
zio_buf_16384:        16384,      0,   321809,  122568,    1774887,   0, =
  0


If the test server at 7 hours uptime takes 10 minutes to clean up the =
zio_buffers, then I can see why it sometimes will seem to take forever =
on a production server and why we thought it had freezed up on us (we =
usually give up and send it a hard reset after a while).

hmm.. I wonder if perhaps one could add a progress counter to =
uma_destroy() somewhere. *Reads the code some more*  :-)

- Peter


> On 2 Dec 2019, at 23:54, Mark Johnston <markj@freebsd.org> wrote:
>=20
> On Mon, Dec 02, 2019 at 11:39:01PM +0100, Peter Eriksson wrote:
>> Sigh.
>>=20
>> Slight correction, the output below should have said uma_zdestroy() =
and not uma_zfree_arg() (wrong printf text, but the right times).
>>=20
>> After an uptime of 7 hours, a reboot have these times (I removed the =
=E2=80=9Cuma=E2=80=9D printf in this run):
>>=20
>> kmem_cache_destroy(zio_data_buf_cache[8]) took 2 seconds
>> kmem_cache_destroy(zio_buf_cache[10]) took 6 seconds
>> kmem_cache_destroy(zio_buf_cache[14]) took 2 seconds
>> kmem_cache_destroy(zio_buf_cache[16]) took 136 seconds
>> kmem_cache_destroy(zio_buf_cache[20]) took 31 seconds
>> kmem_cache_destroy(zio_buf_cache[28]) took 303 seconds
>> kmem_cache_destroy(zio_buf_cache[224]) took 89 seconds
>> kmem_cache_destroy(zio_data_buf_cache[224]) took 31 seconds
>>=20
>> This is on a mostly idle server (well, apart from compiling the =
kernel code :-) and some snapshots being taken of all filesystems (once =
per hour).
>>=20
>>=20
>> So now on to finding out why uma_destroy() is taking so long=E2=80=A6 =
:-).
>=20
> uma_destroy() frees all of the memory cached in the zone back to the
> page allocator.  This operation takes time proportional to the number =
of
> cached items.  I would expect most of the time to be spent in
> zone_reclaim(), called by zone_dtor().
>=20
> Before rebooting, could you try grabbing the output of vmstat -z?


From owner-freebsd-fs@freebsd.org  Tue Dec  3 09:03:16 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id B806E1CE83F
 for <freebsd-fs@mailman.nyi.freebsd.org>; Tue,  3 Dec 2019 09:03:16 +0000 (UTC)
 (envelope-from agapon@gmail.com)
Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com
 [209.85.208.196])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 47Rwxb5qcdz4738;
 Tue,  3 Dec 2019 09:03:15 +0000 (UTC)
 (envelope-from agapon@gmail.com)
Received: by mail-lj1-f196.google.com with SMTP id c19so2813369lji.11;
 Tue, 03 Dec 2019 01:03:15 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:subject:to:cc:references:from:openpgp:autocrypt
 :message-id:date:user-agent:mime-version:in-reply-to
 :content-language:content-transfer-encoding;
 bh=dYGIdAVrHvAt1eIcBwh8g6Q3s+NpUpIKvFp6uJlcI/k=;
 b=bIotJoaY1M/nYec/yAk/bGVL8AHYRefQlVHyikCBsYzCENwBtAsDJra5tRsLsjR1Yw
 Zo5/imXX8GL1rOWZWwqhlP3ayYaBYL4coe9x5DcYh3XIR5g1qp0F7B4F45SBXdAeJX1h
 qXCmP5wNtx7SFca0hP/ef9tMG+InanfaJVpFWNJKerCYfz/DofuchdqWr7YKyrUHvDnv
 8aMBI8tcu8JAfnz/cSSUVWNrAo+hsIDMv/9OQum91Ef/vm8jSn198/ap23L+8fkFMIio
 nc2IuqPg7k4FNejQZxlUtWUFAA2g3aASA8h6JyXDSp3Z/yH205I70470bdsCQXcGkJZH
 t6nA==
X-Gm-Message-State: APjAAAWifm96t0mxDEGeVqMJAVVOxfOJp3+qvgq4s7oWBBVtyhbqHHSQ
 zB6ClIWZN2XaaKyquqpC/sk08vM46zs=
X-Google-Smtp-Source: APXvYqz+0noW63ra/dt7v9sFeb0Ixl8IOhTMRpMHVAjx5SOGs+ofWo/bzDnuUGCVUgOpZg1HVSRB6A==
X-Received: by 2002:a05:651c:327:: with SMTP id
 b7mr1828072ljp.22.1575363793518; 
 Tue, 03 Dec 2019 01:03:13 -0800 (PST)
Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96])
 by smtp.googlemail.com with ESMTPSA id v13sm927264lfq.7.2019.12.03.01.03.12
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Tue, 03 Dec 2019 01:03:12 -0800 (PST)
Subject: Re: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
To: Mark Johnston <markj@freebsd.org>, Peter Eriksson <pen@lysator.liu.se>
Cc: freebsd-fs@freebsd.org
References: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
 <D2A11CE9-9B24-4E40-A51A-8D318E0288C9@lysator.liu.se>
 <20191202225424.GG43802@raichu>
From: Andriy Gapon <avg@FreeBSD.org>
Openpgp: preference=signencrypt
Autocrypt: addr=avg@FreeBSD.org; prefer-encrypt=mutual; keydata=
 mQINBFm4LIgBEADNB/3lT7f15UKeQ52xCFQx/GqHkSxEdVyLFZTmY3KyNPQGBtyvVyBfprJ7
 mAeXZWfhat6cKNRAGZcL5EmewdQuUfQfBdYmKjbw3a9GFDsDNuhDA2QwFt8BmkiVMRYyvI7l
 N0eVzszWCUgdc3qqM6qqcgBaqsVmJluwpvwp4ZBXmch5BgDDDb1MPO8AZ2QZfIQmplkj8Y6Z
 AiNMknkmgaekIINSJX8IzRzKD5WwMsin70psE8dpL/iBsA2cpJGzWMObVTtCxeDKlBCNqM1i
 gTXta1ukdUT7JgLEFZk9ceYQQMJJtUwzWu1UHfZn0Fs29HTqawfWPSZVbulbrnu5q55R4PlQ
 /xURkWQUTyDpqUvb4JK371zhepXiXDwrrpnyyZABm3SFLkk2bHlheeKU6Yql4pcmSVym1AS4
 dV8y0oHAfdlSCF6tpOPf2+K9nW1CFA8b/tw4oJBTtfZ1kxXOMdyZU5fiG7xb1qDgpQKgHUX8
 7Rd2T1UVLVeuhYlXNw2F+a2ucY+cMoqz3LtpksUiBppJhw099gEXehcN2JbUZ2TueJdt1FdS
 ztnZmsHUXLxrRBtGwqnFL7GSd6snpGIKuuL305iaOGODbb9c7ne1JqBbkw1wh8ci6vvwGlzx
 rexzimRaBzJxlkjNfMx8WpCvYebGMydNoeEtkWldtjTNVsUAtQARAQABtB5BbmRyaXkgR2Fw
 b24gPGF2Z0BGcmVlQlNELm9yZz6JAlQEEwEIAD4WIQS+LEO7ngQnXA4Bjr538m7TUc1yjwUC
 WbgsiAIbIwUJBaOagAULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAKCRB38m7TUc1yj+JAEACV
 l9AK/nOWAt/9cufV2fRj0hdOqB1aCshtSrwHk/exXsDa4/FkmegxXQGY+3GWX3deIyesbVRL
 rYdtdK0dqJyT1SBqXK1h3/at9rxr9GQA6KWOxTjUFURsU7ok/6SIlm8uLRPNKO+yq0GDjgaO
 LzN+xykuBA0FlhQAXJnpZLcVfPJdWv7sSHGedL5ln8P8rxR+XnmsA5TUaaPcbhTB+mG+iKFj
 GghASDSfGqLWFPBlX/fpXikBDZ1gvOr8nyMY9nXhgfXpq3B6QCRYKPy58ChrZ5weeJZ29b7/
 QdEO8NFNWHjSD9meiLdWQaqo9Y7uUxN3wySc/YUZxtS0bhAd8zJdNPsJYG8sXgKjeBQMVGuT
 eCAJFEYJqbwWvIXMfVWop4+O4xB+z2YE3jAbG/9tB/GSnQdVSj3G8MS80iLS58frnt+RSEw/
 psahrfh0dh6SFHttE049xYiC+cM8J27Aaf0i9RflyITq57NuJm+AHJoU9SQUkIF0nc6lfA+o
 JRiyRlHZHKoRQkIg4aiKaZSWjQYRl5Txl0IZUP1dSWMX4s3XTMurC/pnja45dge/4ESOtJ9R
 8XuIWg45Oq6MeIWdjKddGhRj3OohsltKgkEU3eLKYtB6qRTQypHHUawCXz88uYt5e3w4V16H
 lCpSTZV/EVHnNe45FVBlvK7k7HFfDDkryLkCDQRZuCyIARAAlq0slcsVboY/+IUJdcbEiJRW
 be9HKVz4SUchq0z9MZPX/0dcnvz/gkyYA+OuM78dNS7Mbby5dTvOqfpLJfCuhaNYOhlE0wY+
 1T6Tf1f4c/uA3U/YiadukQ3+6TJuYGAdRZD5EqYFIkreARTVWg87N9g0fT9BEqLw9lJtEGDY
 EWUE7L++B8o4uu3LQFEYxcrb4K/WKmgtmFcm77s0IKDrfcX4doV92QTIpLiRxcOmCC/OCYuO
 jB1oaaqXQzZrCutXRK0L5XN1Y1PYjIrEzHMIXmCDlLYnpFkK+itlXwlE2ZQxkfMruCWdQXye
 syl2fynAe8hvp7Mms9qU2r2K9EcJiR5N1t1C2/kTKNUhcRv7Yd/vwusK7BqJbhlng5ZgRx0m
 WxdntU/JLEntz3QBsBsWM9Y9wf2V4tLv6/DuDBta781RsCB/UrU2zNuOEkSixlUiHxw1dccI
 6CVlaWkkJBxmHX22GdDFrcjvwMNIbbyfQLuBq6IOh8nvu9vuItup7qemDG3Ms6TVwA7BD3j+
 3fGprtyW8Fd/RR2bW2+LWkMrqHffAr6Y6V3h5kd2G9Q8ZWpEJk+LG6Mk3fhZhmCnHhDu6CwN
 MeUvxXDVO+fqc3JjFm5OxhmfVeJKrbCEUJyM8ESWLoNHLqjywdZga4Q7P12g8DUQ1mRxYg/L
 HgZY3zfKOqcAEQEAAYkCPAQYAQgAJhYhBL4sQ7ueBCdcDgGOvnfybtNRzXKPBQJZuCyIAhsM
 BQkFo5qAAAoJEHfybtNRzXKPBVwQAKfFy9P7N3OsLDMB56A4Kf+ZT+d5cIx0Yiaf4n6w7m3i
 ImHHHk9FIetI4Xe54a2IXh4Bq5UkAGY0667eIs+Z1Ea6I2i27Sdo7DxGwq09Qnm/Y65ADvXs
 3aBvokCcm7FsM1wky395m8xUos1681oV5oxgqeRI8/76qy0hD9WR65UW+HQgZRIcIjSel9vR
 XDaD2HLGPTTGr7u4v00UeTMs6qvPsa2PJagogrKY8RXdFtXvweQFz78NbXhluwix2Tb9ETPk
 LIpDrtzV73CaE2aqBG/KrboXT2C67BgFtnk7T7Y7iKq4/XvEdDWscz2wws91BOXuMMd4c/c4
 OmGW9m3RBLufFrOag1q5yUS9QbFfyqL6dftJP3Zq/xe+mr7sbWbhPVCQFrH3r26mpmy841ym
 dwQnNcsbIGiBASBSKksOvIDYKa2Wy8htPmWFTEOPRpFXdGQ27awcjjnB42nngyCK5ukZDHi6
 w0qK5DNQQCkiweevCIC6wc3p67jl1EMFY5+z+zdTPb3h7LeVnGqW0qBQl99vVFgzLxchKcl0
 R/paSFgwqXCZhAKMuUHncJuynDOP7z5LirUeFI8qsBAJi1rXpQoLJTVcW72swZ42IdPiboqx
 NbTMiNOiE36GqMcTPfKylCbF45JNX4nF9ElM0E+Y8gi4cizJYBRr2FBJgay0b9Cp
Message-ID: <3b71fe37-c29f-e3e5-ff96-5dce15cc7553@FreeBSD.org>
Date: Tue, 3 Dec 2019 11:03:11 +0200
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101
 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <20191202225424.GG43802@raichu>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-Rspamd-Queue-Id: 47Rwxb5qcdz4738
X-Spamd-Bar: ---
Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none;
 spf=pass (mx1.freebsd.org: domain of agapon@gmail.com designates
 209.85.208.196 as permitted sender) smtp.mailfrom=agapon@gmail.com
X-Spamd-Result: default: False [-3.09 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; MID_RHS_MATCH_FROM(0.00)[];
 FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3];
 R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17:c];
 RCVD_TLS_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain];
 DMARC_NA(0.00)[FreeBSD.org]; NEURAL_HAM_LONG(-1.00)[-1.000,0];
 TO_DN_SOME(0.00)[]; RCVD_COUNT_THREE(0.00)[3];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 RCVD_IN_DNSWL_NONE(0.00)[196.208.85.209.list.dnswl.org : 127.0.5.0];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 IP_SCORE(-1.09)[ip: (-0.31), ipnet: 209.85.128.0/17(-3.15), asn: 15169(-1.94),
 country: US(-0.05)]; 
 FORGED_SENDER(0.30)[avg@FreeBSD.org,agapon@gmail.com];
 RWL_MAILSPIKE_POSSIBLE(0.00)[196.208.85.209.rep.mailspike.net : 127.0.0.17];
 MIME_TRACE(0.00)[0:+]; R_DKIM_NA(0.00)[];
 FREEMAIL_ENVFROM(0.00)[gmail.com];
 ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US];
 FROM_NEQ_ENVFROM(0.00)[avg@FreeBSD.org,agapon@gmail.com];
 RECEIVED_SPAMHAUS_PBL(0.00)[96.151.72.93.khpj7ygk5idzvmvt5x4ziurxhy.zen.dq.spamhaus.net
 : 127.0.0.10]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Dec 2019 09:03:16 -0000

On 03/12/2019 00:54, Mark Johnston wrote:
> On Mon, Dec 02, 2019 at 11:39:01PM +0100, Peter Eriksson wrote:
>> Sigh.
>>
>> Slight correction, the output below should have said uma_zdestroy() and not uma_zfree_arg() (wrong printf text, but the right times).
>>
>> After an uptime of 7 hours, a reboot have these times (I removed the “uma” printf in this run):
>>
>> kmem_cache_destroy(zio_data_buf_cache[8]) took 2 seconds
>> kmem_cache_destroy(zio_buf_cache[10]) took 6 seconds
>> kmem_cache_destroy(zio_buf_cache[14]) took 2 seconds
>> kmem_cache_destroy(zio_buf_cache[16]) took 136 seconds
>> kmem_cache_destroy(zio_buf_cache[20]) took 31 seconds
>> kmem_cache_destroy(zio_buf_cache[28]) took 303 seconds
>> kmem_cache_destroy(zio_buf_cache[224]) took 89 seconds
>> kmem_cache_destroy(zio_data_buf_cache[224]) took 31 seconds
>>
>> This is on a mostly idle server (well, apart from compiling the kernel code :-) and some snapshots being taken of all filesystems (once per hour).
>>
>>
>> So now on to finding out why uma_destroy() is taking so long… :-).
> 
> uma_destroy() frees all of the memory cached in the zone back to the
> page allocator.  This operation takes time proportional to the number of
> cached items.  I would expect most of the time to be spent in
> zone_reclaim(), called by zone_dtor().

But spending *minutes* there is really unexpected.
I have never seen anything like that.
I wonder if there is anything untypical about the system's hardware (like a very
big number of processors) or configuration.

-- 
Andriy Gapon

From owner-freebsd-fs@freebsd.org  Tue Dec  3 09:29:25 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 91FC21CF34E
 for <freebsd-fs@mailman.nyi.freebsd.org>; Tue,  3 Dec 2019 09:29:25 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (mail.lysator.liu.se
 [IPv6:2001:6b0:17:f0a0::3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 47RxWn2R5Pz48YJ;
 Tue,  3 Dec 2019 09:29:24 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id 7094040049;
 Tue,  3 Dec 2019 10:29:22 +0100 (CET)
Received: from [IPv6:2001:6b0:17:f002:182f:9bd3:109e:7075] (unknown
 [IPv6:2001:6b0:17:f002:182f:9bd3:109e:7075])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id 5A77F40005;
 Tue,  3 Dec 2019 10:29:22 +0100 (CET)
From: Peter Eriksson <pen@lysator.liu.se>
Message-Id: <DAD21732-AB7C-4B8D-99BF-25C7DD238A31@lysator.liu.se>
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3601.0.10\))
Subject: Re: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
Date: Tue, 3 Dec 2019 10:29:22 +0100
In-Reply-To: <3b71fe37-c29f-e3e5-ff96-5dce15cc7553@FreeBSD.org>
Cc: Mark Johnston <markj@freebsd.org>,
 Andriy Gapon <avg@FreeBSD.org>
To: FreeBSD FS <freebsd-fs@freebsd.org>
References: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
 <D2A11CE9-9B24-4E40-A51A-8D318E0288C9@lysator.liu.se>
 <20191202225424.GG43802@raichu>
 <3b71fe37-c29f-e3e5-ff96-5dce15cc7553@FreeBSD.org>
X-Mailer: Apple Mail (2.3601.0.10)
X-Virus-Scanned: ClamAV using ClamSMTP
X-Rspamd-Queue-Id: 47RxWn2R5Pz48YJ
X-Spamd-Bar: -----
Authentication-Results: mx1.freebsd.org;
	none
X-Spamd-Result: default: False [-5.99 / 15.00];
 NEURAL_HAM_MEDIUM(-0.99)[-0.989,0]; REPLY(-4.00)[];
 NEURAL_HAM_LONG(-1.00)[-0.997,0]
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Dec 2019 09:29:25 -0000

It=E2=80=99s a fairly standard Dell PowerEdge R730xd server with Intel =
Xeon E5-2620v4 CPUs & 256GB of RAM=E2=80=A6 (and an LSI SAS3 HBA and =
Intel 10GE ethernet)

CPU: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz (2100.04-MHz K8-class =
CPU)
  Origin=3D"GenuineIntel"  Id=3D0x406f1  Family=3D0x6  Model=3D0x4f  =
Stepping=3D1
  =
Features=3D0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE=
,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  =
Features2=3D0x7ffefbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SS=
SE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSC=
DLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=3D0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=3D0x121<LAHF,ABM,Prefetch>
  Structured Extended =
Features=3D0x21cbfbb<FSGSBASE,TSCADJ,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,=
RTM,PQM,NFPUSG,PQE,RDSEED,ADX,SMAP,PROCTRACE>
  Structured Extended =
Features3=3D0x9c000400<MD_CLEAR,IBPB,STIBP,L1DFL,SSBD>
  XSAVE Features=3D0x1<XSAVEOPT>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
  TSC: P-state invariant, performance statistics
real memory  =3D 274869518336 (262136 MB)
avail memory =3D 267244859392 (254864 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <DELL   PE_SC3  >
FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs
FreeBSD/SMP: 2 package(s) x 8 core(s) x 2 hardware threads

- Peter

>>=20
>> uma_destroy() frees all of the memory cached in the zone back to the
>> page allocator.  This operation takes time proportional to the number =
of
>> cached items.  I would expect most of the time to be spent in
>> zone_reclaim(), called by zone_dtor().
>=20
> But spending *minutes* there is really unexpected.
> I have never seen anything like that.
> I wonder if there is anything untypical about the system's hardware =
(like a very
> big number of processors) or configuration.
>=20
> --=20
> Andriy Gapon


From owner-freebsd-fs@freebsd.org  Tue Dec  3 09:32:58 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 4F4FF1CF564
 for <freebsd-fs@mailman.nyi.freebsd.org>; Tue,  3 Dec 2019 09:32:58 +0000 (UTC)
 (envelope-from agapon@gmail.com)
Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com
 [209.85.208.195])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 47Rxbs2Mgyz48rt;
 Tue,  3 Dec 2019 09:32:57 +0000 (UTC)
 (envelope-from agapon@gmail.com)
Received: by mail-lj1-f195.google.com with SMTP id j6so2994744lja.2;
 Tue, 03 Dec 2019 01:32:57 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:subject:to:cc:references:from:openpgp:autocrypt
 :message-id:date:user-agent:mime-version:in-reply-to
 :content-language:content-transfer-encoding;
 bh=NOgwZByuqL/Hej0TLtDWZTupdDgX9+N4Sd2+ZcgQG1M=;
 b=DBgZExfFvaOjIflG/VxwR2+zKl7JNizRGYJytO3gTfoE2x72gJzqXvkibp9zRN+/Q2
 eYAurtYam525doUr/iwqHFJREQAV1cTSDSD6Qv1XvXlfpHMqGxTTrJHpYH1R6m0pEbSe
 DN2C8hgrkKsWmpIPFRY1z3sqByzKM+0eADKhyT/7SBQrpJAPL4jfRB0E0LmbiUK8vu+3
 1rfldLzxD08qQLlbF1OqicmrDnXJ+yyX5mTkbrp2uV9TpE/pmfYgzHDD3PYFzjj+BHDi
 8ps9FsYYD+aeBc2ntFjZvyctgvBAwzcSnhir8r78eOvxTxe1EiAjRBuFmQAMVUEuQVnh
 UcWA==
X-Gm-Message-State: APjAAAWFmmmph3OJrmKOdDAXv+JTcTQc1A50iQJDHtneqmTc6ASelBIL
 ZlPx7YFiG0gm1jS89vmTwzQqYZPwyLs=
X-Google-Smtp-Source: APXvYqwpNZpMXnspwFLikifrny6pgHR2CAV+NX4ACJVxyDoU/e6smKB6nLDyYURrnmyi4v8SMvz7BQ==
X-Received: by 2002:a2e:7a1a:: with SMTP id v26mr1972150ljc.76.1575365575470; 
 Tue, 03 Dec 2019 01:32:55 -0800 (PST)
Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96])
 by smtp.googlemail.com with ESMTPSA id a23sm993242ljn.79.2019.12.03.01.32.54
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Tue, 03 Dec 2019 01:32:54 -0800 (PST)
Subject: Re: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
To: Peter Eriksson <pen@lysator.liu.se>, FreeBSD FS <freebsd-fs@freebsd.org>
References: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
 <D2A11CE9-9B24-4E40-A51A-8D318E0288C9@lysator.liu.se>
 <20191202225424.GG43802@raichu>
 <3b71fe37-c29f-e3e5-ff96-5dce15cc7553@FreeBSD.org>
 <DAD21732-AB7C-4B8D-99BF-25C7DD238A31@lysator.liu.se>
From: Andriy Gapon <avg@FreeBSD.org>
Openpgp: preference=signencrypt
Autocrypt: addr=avg@FreeBSD.org; prefer-encrypt=mutual; keydata=
 mQINBFm4LIgBEADNB/3lT7f15UKeQ52xCFQx/GqHkSxEdVyLFZTmY3KyNPQGBtyvVyBfprJ7
 mAeXZWfhat6cKNRAGZcL5EmewdQuUfQfBdYmKjbw3a9GFDsDNuhDA2QwFt8BmkiVMRYyvI7l
 N0eVzszWCUgdc3qqM6qqcgBaqsVmJluwpvwp4ZBXmch5BgDDDb1MPO8AZ2QZfIQmplkj8Y6Z
 AiNMknkmgaekIINSJX8IzRzKD5WwMsin70psE8dpL/iBsA2cpJGzWMObVTtCxeDKlBCNqM1i
 gTXta1ukdUT7JgLEFZk9ceYQQMJJtUwzWu1UHfZn0Fs29HTqawfWPSZVbulbrnu5q55R4PlQ
 /xURkWQUTyDpqUvb4JK371zhepXiXDwrrpnyyZABm3SFLkk2bHlheeKU6Yql4pcmSVym1AS4
 dV8y0oHAfdlSCF6tpOPf2+K9nW1CFA8b/tw4oJBTtfZ1kxXOMdyZU5fiG7xb1qDgpQKgHUX8
 7Rd2T1UVLVeuhYlXNw2F+a2ucY+cMoqz3LtpksUiBppJhw099gEXehcN2JbUZ2TueJdt1FdS
 ztnZmsHUXLxrRBtGwqnFL7GSd6snpGIKuuL305iaOGODbb9c7ne1JqBbkw1wh8ci6vvwGlzx
 rexzimRaBzJxlkjNfMx8WpCvYebGMydNoeEtkWldtjTNVsUAtQARAQABtB5BbmRyaXkgR2Fw
 b24gPGF2Z0BGcmVlQlNELm9yZz6JAlQEEwEIAD4WIQS+LEO7ngQnXA4Bjr538m7TUc1yjwUC
 WbgsiAIbIwUJBaOagAULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAKCRB38m7TUc1yj+JAEACV
 l9AK/nOWAt/9cufV2fRj0hdOqB1aCshtSrwHk/exXsDa4/FkmegxXQGY+3GWX3deIyesbVRL
 rYdtdK0dqJyT1SBqXK1h3/at9rxr9GQA6KWOxTjUFURsU7ok/6SIlm8uLRPNKO+yq0GDjgaO
 LzN+xykuBA0FlhQAXJnpZLcVfPJdWv7sSHGedL5ln8P8rxR+XnmsA5TUaaPcbhTB+mG+iKFj
 GghASDSfGqLWFPBlX/fpXikBDZ1gvOr8nyMY9nXhgfXpq3B6QCRYKPy58ChrZ5weeJZ29b7/
 QdEO8NFNWHjSD9meiLdWQaqo9Y7uUxN3wySc/YUZxtS0bhAd8zJdNPsJYG8sXgKjeBQMVGuT
 eCAJFEYJqbwWvIXMfVWop4+O4xB+z2YE3jAbG/9tB/GSnQdVSj3G8MS80iLS58frnt+RSEw/
 psahrfh0dh6SFHttE049xYiC+cM8J27Aaf0i9RflyITq57NuJm+AHJoU9SQUkIF0nc6lfA+o
 JRiyRlHZHKoRQkIg4aiKaZSWjQYRl5Txl0IZUP1dSWMX4s3XTMurC/pnja45dge/4ESOtJ9R
 8XuIWg45Oq6MeIWdjKddGhRj3OohsltKgkEU3eLKYtB6qRTQypHHUawCXz88uYt5e3w4V16H
 lCpSTZV/EVHnNe45FVBlvK7k7HFfDDkryLkCDQRZuCyIARAAlq0slcsVboY/+IUJdcbEiJRW
 be9HKVz4SUchq0z9MZPX/0dcnvz/gkyYA+OuM78dNS7Mbby5dTvOqfpLJfCuhaNYOhlE0wY+
 1T6Tf1f4c/uA3U/YiadukQ3+6TJuYGAdRZD5EqYFIkreARTVWg87N9g0fT9BEqLw9lJtEGDY
 EWUE7L++B8o4uu3LQFEYxcrb4K/WKmgtmFcm77s0IKDrfcX4doV92QTIpLiRxcOmCC/OCYuO
 jB1oaaqXQzZrCutXRK0L5XN1Y1PYjIrEzHMIXmCDlLYnpFkK+itlXwlE2ZQxkfMruCWdQXye
 syl2fynAe8hvp7Mms9qU2r2K9EcJiR5N1t1C2/kTKNUhcRv7Yd/vwusK7BqJbhlng5ZgRx0m
 WxdntU/JLEntz3QBsBsWM9Y9wf2V4tLv6/DuDBta781RsCB/UrU2zNuOEkSixlUiHxw1dccI
 6CVlaWkkJBxmHX22GdDFrcjvwMNIbbyfQLuBq6IOh8nvu9vuItup7qemDG3Ms6TVwA7BD3j+
 3fGprtyW8Fd/RR2bW2+LWkMrqHffAr6Y6V3h5kd2G9Q8ZWpEJk+LG6Mk3fhZhmCnHhDu6CwN
 MeUvxXDVO+fqc3JjFm5OxhmfVeJKrbCEUJyM8ESWLoNHLqjywdZga4Q7P12g8DUQ1mRxYg/L
 HgZY3zfKOqcAEQEAAYkCPAQYAQgAJhYhBL4sQ7ueBCdcDgGOvnfybtNRzXKPBQJZuCyIAhsM
 BQkFo5qAAAoJEHfybtNRzXKPBVwQAKfFy9P7N3OsLDMB56A4Kf+ZT+d5cIx0Yiaf4n6w7m3i
 ImHHHk9FIetI4Xe54a2IXh4Bq5UkAGY0667eIs+Z1Ea6I2i27Sdo7DxGwq09Qnm/Y65ADvXs
 3aBvokCcm7FsM1wky395m8xUos1681oV5oxgqeRI8/76qy0hD9WR65UW+HQgZRIcIjSel9vR
 XDaD2HLGPTTGr7u4v00UeTMs6qvPsa2PJagogrKY8RXdFtXvweQFz78NbXhluwix2Tb9ETPk
 LIpDrtzV73CaE2aqBG/KrboXT2C67BgFtnk7T7Y7iKq4/XvEdDWscz2wws91BOXuMMd4c/c4
 OmGW9m3RBLufFrOag1q5yUS9QbFfyqL6dftJP3Zq/xe+mr7sbWbhPVCQFrH3r26mpmy841ym
 dwQnNcsbIGiBASBSKksOvIDYKa2Wy8htPmWFTEOPRpFXdGQ27awcjjnB42nngyCK5ukZDHi6
 w0qK5DNQQCkiweevCIC6wc3p67jl1EMFY5+z+zdTPb3h7LeVnGqW0qBQl99vVFgzLxchKcl0
 R/paSFgwqXCZhAKMuUHncJuynDOP7z5LirUeFI8qsBAJi1rXpQoLJTVcW72swZ42IdPiboqx
 NbTMiNOiE36GqMcTPfKylCbF45JNX4nF9ElM0E+Y8gi4cizJYBRr2FBJgay0b9Cp
Message-ID: <381ab886-fc3b-75e9-afa7-7bde1f712055@FreeBSD.org>
Date: Tue, 3 Dec 2019 11:32:53 +0200
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101
 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <DAD21732-AB7C-4B8D-99BF-25C7DD238A31@lysator.liu.se>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-Rspamd-Queue-Id: 47Rxbs2Mgyz48rt
X-Spamd-Bar: ---
Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none;
 spf=pass (mx1.freebsd.org: domain of agapon@gmail.com designates
 209.85.208.195 as permitted sender) smtp.mailfrom=agapon@gmail.com
X-Spamd-Result: default: False [-3.07 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_TLS_ALL(0.00)[];
 FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3];
 R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17:c];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain];
 MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[FreeBSD.org];
 RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 TO_DN_ALL(0.00)[];
 RCVD_IN_DNSWL_NONE(0.00)[195.208.85.209.list.dnswl.org : 127.0.5.0];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 IP_SCORE(-1.07)[ip: (-0.18), ipnet: 209.85.128.0/17(-3.15), asn: 15169(-1.94),
 country: US(-0.05)]; 
 FORGED_SENDER(0.30)[avg@FreeBSD.org,agapon@gmail.com];
 RWL_MAILSPIKE_POSSIBLE(0.00)[195.208.85.209.rep.mailspike.net : 127.0.0.17];
 R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com];
 ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US];
 FROM_NEQ_ENVFROM(0.00)[avg@FreeBSD.org,agapon@gmail.com];
 MID_RHS_MATCH_FROM(0.00)[];
 RECEIVED_SPAMHAUS_PBL(0.00)[96.151.72.93.khpj7ygk5idzvmvt5x4ziurxhy.zen.dq.spamhaus.net
 : 127.0.0.10]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Dec 2019 09:32:58 -0000

On 03/12/2019 11:29, Peter Eriksson wrote:
> It’s a fairly standard Dell PowerEdge R730xd server with Intel Xeon E5-2620v4
> CPUs & 256GB of RAM… (and an LSI SAS3 HBA and Intel 10GE ethernet)
> 
> CPU: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz (2100.04-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x406f1  Family=0x6  Model=0x4f  Stepping=1
>  
> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>  
> Features2=0x7ffefbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
>   AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
>   AMD Features2=0x121<LAHF,ABM,Prefetch>
>   Structured Extended
> Features=0x21cbfbb<FSGSBASE,TSCADJ,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,PQM,NFPUSG,PQE,RDSEED,ADX,SMAP,PROCTRACE>
>   Structured Extended Features3=0x9c000400<MD_CLEAR,IBPB,STIBP,L1DFL,SSBD>
>   XSAVE Features=0x1<XSAVEOPT>
>   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
>   TSC: P-state invariant, performance statistics
> real memory  = 274869518336 (262136 MB)
> avail memory = 267244859392 (254864 MB)
> Event timer "LAPIC" quality 600
> ACPI APIC Table: <DELL   PE_SC3  >
> FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs
> FreeBSD/SMP: 2 package(s) x 8 core(s) x 2 hardware threads
> 
> - Peter
> 
>>>
>>> uma_destroy() frees all of the memory cached in the zone back to the
>>> page allocator.  This operation takes time proportional to the number of
>>> cached items.  I would expect most of the time to be spent in
>>> zone_reclaim(), called by zone_dtor().
>>
>> But spending *minutes* there is really unexpected.
>> I have never seen anything like that.
>> I wonder if there is anything untypical about the system's hardware (like a very
>> big number of processors) or configuration.

Yes, this is pretty typical.
If you don't have root on ZFS (or can arrange that) maybe you can experiment
with kldunload-ing of zfs.  That way you will have more tools at your disposal
for diagnosing the problem.


-- 
Andriy Gapon

From owner-freebsd-fs@freebsd.org  Tue Dec  3 16:22:24 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id C1C821B45BA
 for <freebsd-fs@mailman.nyi.freebsd.org>; Tue,  3 Dec 2019 16:22:24 +0000 (UTC)
 (envelope-from markjdb@gmail.com)
Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com
 [IPv6:2607:f8b0:4864:20::843])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 47S6hJ4l7Mz4ZFL;
 Tue,  3 Dec 2019 16:22:24 +0000 (UTC)
 (envelope-from markjdb@gmail.com)
Received: by mail-qt1-x843.google.com with SMTP id s8so1125260qte.2;
 Tue, 03 Dec 2019 08:22:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=sender:date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:content-transfer-encoding:in-reply-to
 :user-agent; bh=fhnwZruAW+hV89B1Oe4xe+b1SQSnVof2N96jwNOigJ8=;
 b=rr3933G5ZB8A7IMwlvKTtxUsyvXxyCBHLq4BVFLR2dV41bbUFugvTjTyyMleibpP4a
 y4mOCWexf1uv7VuXjcp5U0XqIgSI9qPQeZ6k+AMiPbRdPry0n35wHq8EAGgiwG7zFwTU
 4PBor1/qqMsUYu3Y8wQqeIoNAlYacRsEYW9Hak/8CW8iUAO0iSEBSouWf87Kxcmf9ALU
 PK/k75xgRuRnk8nUKtNnWtyzZGqPStvc0qB+C1zNwxoCmd4f1fDE3/Z6QoinnLnP9njN
 4A+5FWFe4OXM5+EGEgVACsEurEyxVTdNq0UylipeA2J+t9sWXrfMhIbgvdlJTOz5ILkX
 xe4A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:sender:date:from:to:cc:subject:message-id
 :references:mime-version:content-disposition
 :content-transfer-encoding:in-reply-to:user-agent;
 bh=fhnwZruAW+hV89B1Oe4xe+b1SQSnVof2N96jwNOigJ8=;
 b=hTBrRISIpnslGbKAXdJGPON5tGep4Knkdw8kvkdfeGBGnRlegiWeCL/J3hebfHfenJ
 cJcbH+dES0oIU1bcscG9pkzkHK8IDbGNhQcu4YT+VfDvHCl84rkvUS6YM1M8jjGeQonN
 CktrRLYlUy1VuLFheZTK1WadIkcchbQPUM5sugxPghEljUBFy4MK3bqqrdd1CAabhizd
 b8I2kYOrvRLxSOnwR1BVme+VH1w8a32x7b4r9JWwktEJS6wLVvCaC+KaJUqNxca4yDi+
 4M9z0Q1ZpW8Al3aTlGJqCuYw7PVEL6zCy66Gxr6x6NdAO01ZGUZZgTfOWkZX0uRltAKw
 KddQ==
X-Gm-Message-State: APjAAAV4zcfMaGT77SxzmbLp682mAEsX6As83jhUKUSBg+B/PTBA6oR+
 Kp/6qx1dAjvXB5g4a1q8br4r34eLLzc=
X-Google-Smtp-Source: APXvYqzUWJWybkoCBPzhH0yum8OiM1dEubYFPg3CmvpNBw+UhVzhwAjwLbYS8m+Tla8wImXprozoOA==
X-Received: by 2002:ac8:3946:: with SMTP id t6mr5864326qtb.278.1575390142260; 
 Tue, 03 Dec 2019 08:22:22 -0800 (PST)
Received: from raichu (toroon0560w-lp130-05-69-158-183-252.dsl.bell.ca.
 [69.158.183.252])
 by smtp.gmail.com with ESMTPSA id u12sm2054577qta.79.2019.12.03.08.22.21
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 03 Dec 2019 08:22:21 -0800 (PST)
Sender: Mark Johnston <markjdb@gmail.com>
Date: Tue, 3 Dec 2019 11:22:19 -0500
From: Mark Johnston <markj@freebsd.org>
To: Andriy Gapon <avg@freebsd.org>
Cc: Peter Eriksson <pen@lysator.liu.se>, freebsd-fs@freebsd.org
Subject: Re: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
Message-ID: <20191203162219.GI43802@raichu>
References: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
 <D2A11CE9-9B24-4E40-A51A-8D318E0288C9@lysator.liu.se>
 <20191202225424.GG43802@raichu>
 <3b71fe37-c29f-e3e5-ff96-5dce15cc7553@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <3b71fe37-c29f-e3e5-ff96-5dce15cc7553@FreeBSD.org>
User-Agent: Mutt/1.12.2 (2019-09-21)
X-Rspamd-Queue-Id: 47S6hJ4l7Mz4ZFL
X-Spamd-Bar: -----
Authentication-Results: mx1.freebsd.org;
	none
X-Spamd-Result: default: False [-6.00 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-0.996,0]; REPLY(-4.00)[];
 NEURAL_HAM_LONG(-1.00)[-0.999,0]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Dec 2019 16:22:24 -0000

On Tue, Dec 03, 2019 at 11:03:11AM +0200, Andriy Gapon wrote:
> On 03/12/2019 00:54, Mark Johnston wrote:
> > On Mon, Dec 02, 2019 at 11:39:01PM +0100, Peter Eriksson wrote:
> >> Sigh.
> >>
> >> Slight correction, the output below should have said uma_zdestroy() and not uma_zfree_arg() (wrong printf text, but the right times).
> >>
> >> After an uptime of 7 hours, a reboot have these times (I removed the “uma” printf in this run):
> >>
> >> kmem_cache_destroy(zio_data_buf_cache[8]) took 2 seconds
> >> kmem_cache_destroy(zio_buf_cache[10]) took 6 seconds
> >> kmem_cache_destroy(zio_buf_cache[14]) took 2 seconds
> >> kmem_cache_destroy(zio_buf_cache[16]) took 136 seconds
> >> kmem_cache_destroy(zio_buf_cache[20]) took 31 seconds
> >> kmem_cache_destroy(zio_buf_cache[28]) took 303 seconds
> >> kmem_cache_destroy(zio_buf_cache[224]) took 89 seconds
> >> kmem_cache_destroy(zio_data_buf_cache[224]) took 31 seconds
> >>
> >> This is on a mostly idle server (well, apart from compiling the kernel code :-) and some snapshots being taken of all filesystems (once per hour).
> >>
> >>
> >> So now on to finding out why uma_destroy() is taking so long… :-).
> > 
> > uma_destroy() frees all of the memory cached in the zone back to the
> > page allocator.  This operation takes time proportional to the number of
> > cached items.  I would expect most of the time to be spent in
> > zone_reclaim(), called by zone_dtor().
> 
> But spending *minutes* there is really unexpected.
> I have never seen anything like that.
> I wonder if there is anything untypical about the system's hardware (like a very
> big number of processors) or configuration.

Indeed, looking at the vmstat -z output this doesn't really make sense.
It would be useful to confirm that we are in fact spending most of the
time reclaiming items.

From owner-freebsd-fs@freebsd.org  Tue Dec  3 17:03:05 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id D64A51B5FEC
 for <freebsd-fs@mailman.nyi.freebsd.org>; Tue,  3 Dec 2019 17:03:05 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (mail.lysator.liu.se
 [IPv6:2001:6b0:17:f0a0::3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 47S7bD2VFgz4dq6;
 Tue,  3 Dec 2019 17:03:04 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id AD20F4000A;
 Tue,  3 Dec 2019 18:03:00 +0100 (CET)
Received: from [IPv6:2001:6b0:17:f002:182f:9bd3:109e:7075] (unknown
 [IPv6:2001:6b0:17:f002:182f:9bd3:109e:7075])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id 7CAA140006;
 Tue,  3 Dec 2019 18:03:00 +0100 (CET)
From: Peter Eriksson <pen@lysator.liu.se>
Message-Id: <DD98024A-4DA5-45C1-953E-0C515A9F7DFB@lysator.liu.se>
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3601.0.10\))
Subject: Re: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
Date: Tue, 3 Dec 2019 18:03:00 +0100
In-Reply-To: <uh75zixfiqw.fsf@P142.sics.se>
Cc: Andriy Gapon <avg@FreeBSD.org>,
 Bengt Ahlgren <bengt.ahlgren@ri.se>
To: FreeBSD FS <freebsd-fs@freebsd.org>
References: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
 <D2A11CE9-9B24-4E40-A51A-8D318E0288C9@lysator.liu.se>
 <20191202225424.GG43802@raichu>
 <3b71fe37-c29f-e3e5-ff96-5dce15cc7553@FreeBSD.org>
 <DAD21732-AB7C-4B8D-99BF-25C7DD238A31@lysator.liu.se>
 <uh75zixfiqw.fsf@P142.sics.se>
X-Mailer: Apple Mail (2.3601.0.10)
X-Virus-Scanned: ClamAV using ClamSMTP
X-Rspamd-Queue-Id: 47S7bD2VFgz4dq6
X-Spamd-Bar: ---
Authentication-Results: mx1.freebsd.org; dkim=none;
 dmarc=pass (policy=none) header.from=liu.se;
 spf=pass (mx1.freebsd.org: domain of pen@lysator.liu.se designates
 2001:6b0:17:f0a0::3 as permitted sender) smtp.mailfrom=pen@lysator.liu.se
X-Spamd-Result: default: False [-3.42 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3];
 R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se]; MV_CASE(0.50)[];
 MIME_GOOD(-0.10)[multipart/alternative,text/plain];
 RCVD_TLS_LAST(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0];
 RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 TO_DN_ALL(0.00)[];
 RCVD_IN_DNSWL_MED(-0.20)[3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.a.0.f.7.1.0.0.0.b.6.0.1.0.0.2.list.dnswl.org
 : 127.0.11.2]; DMARC_POLICY_ALLOW(-0.50)[liu.se,none];
 IP_SCORE(-0.92)[ip: (-3.77), ipnet: 2001:6b0::/32(-0.46), asn: 1653(-0.38),
 country: EU(-0.00)]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~];
 ASN(0.00)[asn:1653, ipnet:2001:6b0::/32, country:EU];
 MID_RHS_MATCH_FROM(0.00)[]
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Dec 2019 17:03:05 -0000

We run without swap on our servers=E2=80=A6

I=E2=80=99m trying to instrument the uma_zdestroy() code now, should =
provide some more details in an hour or two=E2=80=A6 :-)

- Peter


> On 3 Dec 2019, at 17:59, Bengt Ahlgren <bengt.ahlgren@ri.se> wrote:
>=20
> Peter Eriksson <pen@lysator.liu.se <mailto:pen@lysator.liu.se>> =
writes:
>=20
>>>> uma_destroy() frees all of the memory cached in the zone back to =
the
>>>> page allocator.  This operation takes time proportional to the =
number of
>>>> cached items.  I would expect most of the time to be spent in
>>>> zone_reclaim(), called by zone_dtor().
>>>=20
>>> But spending *minutes* there is really unexpected.
>>> I have never seen anything like that.
>>> I wonder if there is anything untypical about the system's hardware
>>> (like a very
>>> big number of processors) or configuration.
>>>=20
>>> --=20
>>> Andriy Gapon
>>=20
>> It=E2=80=99s a fairly standard Dell PowerEdge R730xd server with =
Intel Xeon
>> E5-2620v4 CPUs & 256GB of RAM=E2=80=A6 (and an LSI SAS3 HBA and Intel =
10GE
>> ethernet)
>>=20
>> CPU: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz (2100.04-MHz K8-class =
CPU)
>>  Origin=3D"GenuineIntel"  Id=3D0x406f1  Family=3D0x6  Model=3D0x4f  =
Stepping=3D1
>>  =
Features=3D0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE=
,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>>  =
Features2=3D0x7ffefbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SS=
SE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSC=
DLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
>>  AMD Features=3D0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
>>  AMD Features2=3D0x121<LAHF,ABM,Prefetch>
>>  Structured Extended
>> =
Features=3D0x21cbfbb<FSGSBASE,TSCADJ,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,=
RTM,PQM,NFPUSG,PQE,RDSEED,ADX,SMAP,PROCTRACE>
>>  Structured Extended =
Features3=3D0x9c000400<MD_CLEAR,IBPB,STIBP,L1DFL,SSBD>
>>  XSAVE Features=3D0x1<XSAVEOPT>
>>  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
>>  TSC: P-state invariant, performance statistics
>> real memory  =3D 274869518336 (262136 MB)
>> avail memory =3D 267244859392 (254864 MB)
>> Event timer "LAPIC" quality 600
>> ACPI APIC Table: <DELL   PE_SC3  >
>> FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs
>> FreeBSD/SMP: 2 package(s) x 8 core(s) x 2 hardware threads
>>=20
>> - Peter
>=20
> I also have a Dell server, but older (R515, 128GB RAM, AMD 2x8 cores,
> ~140TB disk, ARC cap 64GB) that has experienced the same behaviour.  I
> get the impression the shutdown time is related to used swap, but I =
have
> not investigated further.  I think the system disk (with swap -
> mechanical!) activity light is on during this time.  Now running =
11.3-R,
> but I'm sure I saw this with late 10.x as well.
>=20
> Bengt


From owner-freebsd-fs@freebsd.org  Tue Dec  3 17:31:52 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 806E11B6E83
 for <freebsd-fs@mailman.nyi.freebsd.org>; Tue,  3 Dec 2019 17:31:52 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (mail.lysator.liu.se
 [IPv6:2001:6b0:17:f0a0::3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 47S8DR2h7lz3Bv5;
 Tue,  3 Dec 2019 17:31:51 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id 94B7440006;
 Tue,  3 Dec 2019 18:31:49 +0100 (CET)
Received: from [IPv6:2001:6b0:17:f002:182f:9bd3:109e:7075] (unknown
 [IPv6:2001:6b0:17:f002:182f:9bd3:109e:7075])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id 77FF540005;
 Tue,  3 Dec 2019 18:31:49 +0100 (CET)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3601.0.10\))
Subject: Re: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
From: Peter Eriksson <pen@lysator.liu.se>
In-Reply-To: <20191203162219.GI43802@raichu>
Date: Tue, 3 Dec 2019 18:31:49 +0100
Cc: Andriy Gapon <avg@freebsd.org>,
 Mark Johnston <markj@freebsd.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <247C6990-BC6E-4E3D-8CEF-5A861D8A25EC@lysator.liu.se>
References: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
 <D2A11CE9-9B24-4E40-A51A-8D318E0288C9@lysator.liu.se>
 <20191202225424.GG43802@raichu>
 <3b71fe37-c29f-e3e5-ff96-5dce15cc7553@FreeBSD.org>
 <20191203162219.GI43802@raichu>
To: FreeBSD FS <freebsd-fs@freebsd.org>
X-Mailer: Apple Mail (2.3601.0.10)
X-Virus-Scanned: ClamAV using ClamSMTP
X-Rspamd-Queue-Id: 47S8DR2h7lz3Bv5
X-Spamd-Bar: ---
Authentication-Results: mx1.freebsd.org; dkim=none;
 dmarc=pass (policy=none) header.from=liu.se;
 spf=pass (mx1.freebsd.org: domain of pen@lysator.liu.se designates
 2001:6b0:17:f0a0::3 as permitted sender) smtp.mailfrom=pen@lysator.liu.se
X-Spamd-Result: default: False [-3.67 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3];
 R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain];
 MV_CASE(0.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0];
 RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[];
 TO_DN_ALL(0.00)[];
 RCVD_IN_DNSWL_MED(-0.20)[3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.a.0.f.7.1.0.0.0.b.6.0.1.0.0.2.list.dnswl.org
 : 127.0.11.2]; DMARC_POLICY_ALLOW(-0.50)[liu.se,none];
 IP_SCORE(-1.17)[ip: (-5.00), ipnet: 2001:6b0::/32(-0.46), asn: 1653(-0.38),
 country: EU(-0.00)]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:1653, ipnet:2001:6b0::/32, country:EU];
 MID_RHS_MATCH_FROM(0.00)[]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Dec 2019 17:31:52 -0000

zio_fini: kmem_cache_destroy(zio_buf_cache & zio_data_buf_cache):

zone_free_item(zone=3DUMA Zones): zone->uz_dtor() took 16 seconds
uma_zdestroy(zio_buf_12288) took 16 seconds
kmem_cache_destroy: uma_zdestroy(0xfffff8034678e980) [zio_buf_12288] =
took 16 seconds
kmem_cache_destroy(zio_buf_cache[20]) took 16 seconds

zone_free_item(zone=3DUMA Zones): zone->uz_dtor() took 60 seconds
uma_zdestroy(zio_buf_16384) took 60 seconds
kmem_cache_destroy: uma_zdestroy(0xfffff8034678e880) [zio_buf_16384] =
took 60 seconds
kmem_cache_destroy(zio_buf_cache[28]) took 60 seconds


Looks like it is the call to zone->uz_dtor that is taking time (added =
time check around the various parts of zone_free_item().

- Peter

> Indeed, looking at the vmstat -z output this doesn't really make =
sense.
> It would be useful to confirm that we are in fact spending most of the
> time reclaiming items.


From owner-freebsd-fs@freebsd.org  Tue Dec  3 22:28:54 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 149671BFCA5
 for <freebsd-fs@mailman.nyi.freebsd.org>; Tue,  3 Dec 2019 22:28:54 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 47SGq72CSbz42rV
 for <freebsd-fs@freebsd.org>; Tue,  3 Dec 2019 22:28:50 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id 91CCA4000A
 for <freebsd-fs@freebsd.org>; Tue,  3 Dec 2019 23:28:48 +0100 (CET)
Received: from [192.168.1.132] (h-201-140.A785.priv.bahnhof.se
 [98.128.201.140])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id 5414A40006
 for <freebsd-fs@freebsd.org>; Tue,  3 Dec 2019 23:28:48 +0100 (CET)
From: Peter Eriksson <pen@lysator.liu.se>
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3601.0.10\))
Subject: Re: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
Date: Tue, 3 Dec 2019 23:28:48 +0100
References: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
 <D2A11CE9-9B24-4E40-A51A-8D318E0288C9@lysator.liu.se>
 <20191202225424.GG43802@raichu>
 <3b71fe37-c29f-e3e5-ff96-5dce15cc7553@FreeBSD.org>
 <20191203162219.GI43802@raichu>
 <247C6990-BC6E-4E3D-8CEF-5A861D8A25EC@lysator.liu.se>
To: FreeBSD FS <freebsd-fs@freebsd.org>
In-Reply-To: <247C6990-BC6E-4E3D-8CEF-5A861D8A25EC@lysator.liu.se>
Message-Id: <4D44B25E-2F15-4536-A653-DA242051C8A9@lysator.liu.se>
X-Mailer: Apple Mail (2.3601.0.10)
X-Virus-Scanned: ClamAV using ClamSMTP
X-Rspamd-Queue-Id: 47SGq72CSbz42rV
X-Spamd-Bar: -----
Authentication-Results: mx1.freebsd.org; dkim=none;
 dmarc=pass (policy=none) header.from=liu.se;
 spf=pass (mx1.freebsd.org: domain of pen@lysator.liu.se designates
 130.236.254.3 as permitted sender) smtp.mailfrom=pen@lysator.liu.se
X-Spamd-Result: default: False [-6.00 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 FROM_HAS_DN(0.00)[];
 R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; RCPT_COUNT_ONE(0.00)[1];
 RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[];
 TO_DN_ALL(0.00)[];
 RCVD_IN_DNSWL_MED(-0.20)[3.254.236.130.list.dnswl.org : 127.0.11.2];
 DMARC_POLICY_ALLOW(-0.50)[liu.se,none]; MV_CASE(0.50)[];
 IP_SCORE(-3.50)[ip: (-9.17), ipnet: 130.236.0.0/16(-4.60), asn: 2843(-3.68),
 country: SE(-0.03)]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:2843, ipnet:130.236.0.0/16, country:SE];
 MID_RHS_MATCH_FROM(0.00)[]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Dec 2019 22:28:54 -0000

=E2=80=A6 and X levels deeper it gives this (not all filesystems mounted =
- I rebooted fairly quickly):

keg_drain: while (SLIST_FIRST/SLIST_REMOVE)-loop took 14 seconds [20021 =
loops]
zone_drain_wait(): zone_foreach_keg(zone, &keg_drain) took 14 seconds
zone_dtor(): zone_drain_wait(zone, M_WAITOK) took 14 seconds
zone_free_item(zone=3DUMA Zones): zone->uz_dtor() took 14 seconds
uma_zdestroy(zio_buf_12288) took 14 seconds
kmem_cache_destroy: uma_zdestroy(0xfffff803467c8ac0) [zio_buf_12288] =
took 14 seconds
kmem_cache_destroy(zio_buf_cache[20]) took 14 seconds


Hmm=E2=80=A6 I wonder if it isn=E2=80=99t this code I =
/usr/src/sys/sys/queue.h:

> #define SLIST_REMOVE(head, elm, type, field) do {                      =
 \
>        QMD_SAVELINK(oldnext, (elm)->field.sle_next);                   =
\
>        if (SLIST_FIRST((head)) =3D=3D (elm)) {                         =
    \
>                SLIST_REMOVE_HEAD((head), field);                       =
\
>        }                                                               =
\
>        else {                                                          =
\
>                QUEUE_TYPEOF(type) *curelm =3D SLIST_FIRST(head);       =
  \
>                while (SLIST_NEXT(curelm, field) !=3D (elm))            =
  \
>                        curelm =3D SLIST_NEXT(curelm, field);           =
  \
>                SLIST_REMOVE_AFTER(curelm, field);                      =
\
>        }                                                               =
\
>        TRASHIT(*oldnext);                                              =
\
> } while (0)

Combined with this in /usr/src/sys/vm/uma_core.c: keg_drain():

> finished:
>        KEG_UNLOCK(keg);
>
>        while ((slab =3D SLIST_FIRST(&freeslabs)) !=3D NULL) {
>                SLIST_REMOVE(&freeslabs, slab, uma_slab, us_hlink);
>                keg_free_slab(keg, slab, keg->uk_ipers);
>        }

(The keg_drain print above is the time for the while() loop).

If I=E2=80=99m reading this right it looks like it=E2=80=99s potentially =
doing a linear search through a (long) linked list, for every entry in =
the free slabs list. 20000 entries in 14 seconds is 1400 entries per =
second.

Hmmm=E2=80=A6 It feels like this code could be optimised at bit.

- Peter


> On 3 Dec 2019, at 18:31, Peter Eriksson <pen@lysator.liu.se> wrote:
>=20
> zio_fini: kmem_cache_destroy(zio_buf_cache & zio_data_buf_cache):
>=20
> zone_free_item(zone=3DUMA Zones): zone->uz_dtor() took 16 seconds
> uma_zdestroy(zio_buf_12288) took 16 seconds
> kmem_cache_destroy: uma_zdestroy(0xfffff8034678e980) [zio_buf_12288] =
took 16 seconds
> kmem_cache_destroy(zio_buf_cache[20]) took 16 seconds
>=20
> zone_free_item(zone=3DUMA Zones): zone->uz_dtor() took 60 seconds
> uma_zdestroy(zio_buf_16384) took 60 seconds
> kmem_cache_destroy: uma_zdestroy(0xfffff8034678e880) [zio_buf_16384] =
took 60 seconds
> kmem_cache_destroy(zio_buf_cache[28]) took 60 seconds
>=20
>=20
> Looks like it is the call to zone->uz_dtor that is taking time (added =
time check around the various parts of zone_free_item().
>=20
> - Peter
>=20
>> Indeed, looking at the vmstat -z output this doesn't really make =
sense.
>> It would be useful to confirm that we are in fact spending most of =
the
>> time reclaiming items.
>=20
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@freebsd.org  Wed Dec  4 09:48:32 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 441E01CF410
 for <freebsd-fs@mailman.nyi.freebsd.org>; Wed,  4 Dec 2019 09:48:32 +0000 (UTC)
 (envelope-from agapon@gmail.com)
Received: from mail-lj1-f170.google.com (mail-lj1-f170.google.com
 [209.85.208.170])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 47SYvM1wNwz4f0s
 for <freebsd-fs@freebsd.org>; Wed,  4 Dec 2019 09:48:31 +0000 (UTC)
 (envelope-from agapon@gmail.com)
Received: by mail-lj1-f170.google.com with SMTP id j6so7344112lja.2
 for <freebsd-fs@freebsd.org>; Wed, 04 Dec 2019 01:48:31 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:subject:to:references:from:openpgp:autocrypt
 :message-id:date:user-agent:mime-version:in-reply-to
 :content-language:content-transfer-encoding;
 bh=pB/zAWZTSgoeeyV0AaGHYdDBdi0bWMiwwMr2wAGCAk4=;
 b=CRuX9EuBALqO/VblvJu/i2+6wYgaFETqlwn9NExsKs+m8tdK6LVg4E4jgm/pHQH2Ef
 iG0sfFEiT1VzqFiHTKPD8vvX5dH3c2I0KB6d1JgpILY5e+7zDHjv+ZxBdijh1egMtq4T
 Svv1xeSgWWWFRuW4GfZEWCQ+aNAU+eisslgRrx2/F0/oTL3hvUS0COiU6fqm6/87Ob8x
 4VwITOnRq5tP1aZ+PpuEWR3amVEZmMpHUVqL26+CkHO/TFv62JtWIVmINtQxNxbxoMHk
 js59YyfbknWjO7B8WNWegg0P7J+YZa1hRZccHu9nR6etYEVJPwXYlLnQBIDmCdFD5ueC
 qO2Q==
X-Gm-Message-State: APjAAAWrggIHyh8yG5hpW+qmMXcPBZtwcuUAsSKFa2DZRr6XOxwwlaQZ
 y/l1KG/yT1u1akWiD5O8QuAqDqhNY0g=
X-Google-Smtp-Source: APXvYqwvBECiCFHl+Tgizy9xP9nja/04hJFLJMX4GVVRcXtWMCBexIXvQVFee+I+qTvRUk1X5I5/PQ==
X-Received: by 2002:a2e:9855:: with SMTP id e21mr1379063ljj.170.1575452909195; 
 Wed, 04 Dec 2019 01:48:29 -0800 (PST)
Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96])
 by smtp.googlemail.com with ESMTPSA id n19sm2909290lfl.85.2019.12.04.01.48.27
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Wed, 04 Dec 2019 01:48:28 -0800 (PST)
Subject: Re: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
To: Peter Eriksson <pen@lysator.liu.se>, FreeBSD FS <freebsd-fs@freebsd.org>
References: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
 <D2A11CE9-9B24-4E40-A51A-8D318E0288C9@lysator.liu.se>
 <20191202225424.GG43802@raichu>
 <3b71fe37-c29f-e3e5-ff96-5dce15cc7553@FreeBSD.org>
 <20191203162219.GI43802@raichu>
 <247C6990-BC6E-4E3D-8CEF-5A861D8A25EC@lysator.liu.se>
 <4D44B25E-2F15-4536-A653-DA242051C8A9@lysator.liu.se>
From: Andriy Gapon <avg@FreeBSD.org>
Openpgp: preference=signencrypt
Autocrypt: addr=avg@FreeBSD.org; prefer-encrypt=mutual; keydata=
 mQINBFm4LIgBEADNB/3lT7f15UKeQ52xCFQx/GqHkSxEdVyLFZTmY3KyNPQGBtyvVyBfprJ7
 mAeXZWfhat6cKNRAGZcL5EmewdQuUfQfBdYmKjbw3a9GFDsDNuhDA2QwFt8BmkiVMRYyvI7l
 N0eVzszWCUgdc3qqM6qqcgBaqsVmJluwpvwp4ZBXmch5BgDDDb1MPO8AZ2QZfIQmplkj8Y6Z
 AiNMknkmgaekIINSJX8IzRzKD5WwMsin70psE8dpL/iBsA2cpJGzWMObVTtCxeDKlBCNqM1i
 gTXta1ukdUT7JgLEFZk9ceYQQMJJtUwzWu1UHfZn0Fs29HTqawfWPSZVbulbrnu5q55R4PlQ
 /xURkWQUTyDpqUvb4JK371zhepXiXDwrrpnyyZABm3SFLkk2bHlheeKU6Yql4pcmSVym1AS4
 dV8y0oHAfdlSCF6tpOPf2+K9nW1CFA8b/tw4oJBTtfZ1kxXOMdyZU5fiG7xb1qDgpQKgHUX8
 7Rd2T1UVLVeuhYlXNw2F+a2ucY+cMoqz3LtpksUiBppJhw099gEXehcN2JbUZ2TueJdt1FdS
 ztnZmsHUXLxrRBtGwqnFL7GSd6snpGIKuuL305iaOGODbb9c7ne1JqBbkw1wh8ci6vvwGlzx
 rexzimRaBzJxlkjNfMx8WpCvYebGMydNoeEtkWldtjTNVsUAtQARAQABtB5BbmRyaXkgR2Fw
 b24gPGF2Z0BGcmVlQlNELm9yZz6JAlQEEwEIAD4WIQS+LEO7ngQnXA4Bjr538m7TUc1yjwUC
 WbgsiAIbIwUJBaOagAULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAKCRB38m7TUc1yj+JAEACV
 l9AK/nOWAt/9cufV2fRj0hdOqB1aCshtSrwHk/exXsDa4/FkmegxXQGY+3GWX3deIyesbVRL
 rYdtdK0dqJyT1SBqXK1h3/at9rxr9GQA6KWOxTjUFURsU7ok/6SIlm8uLRPNKO+yq0GDjgaO
 LzN+xykuBA0FlhQAXJnpZLcVfPJdWv7sSHGedL5ln8P8rxR+XnmsA5TUaaPcbhTB+mG+iKFj
 GghASDSfGqLWFPBlX/fpXikBDZ1gvOr8nyMY9nXhgfXpq3B6QCRYKPy58ChrZ5weeJZ29b7/
 QdEO8NFNWHjSD9meiLdWQaqo9Y7uUxN3wySc/YUZxtS0bhAd8zJdNPsJYG8sXgKjeBQMVGuT
 eCAJFEYJqbwWvIXMfVWop4+O4xB+z2YE3jAbG/9tB/GSnQdVSj3G8MS80iLS58frnt+RSEw/
 psahrfh0dh6SFHttE049xYiC+cM8J27Aaf0i9RflyITq57NuJm+AHJoU9SQUkIF0nc6lfA+o
 JRiyRlHZHKoRQkIg4aiKaZSWjQYRl5Txl0IZUP1dSWMX4s3XTMurC/pnja45dge/4ESOtJ9R
 8XuIWg45Oq6MeIWdjKddGhRj3OohsltKgkEU3eLKYtB6qRTQypHHUawCXz88uYt5e3w4V16H
 lCpSTZV/EVHnNe45FVBlvK7k7HFfDDkryLkCDQRZuCyIARAAlq0slcsVboY/+IUJdcbEiJRW
 be9HKVz4SUchq0z9MZPX/0dcnvz/gkyYA+OuM78dNS7Mbby5dTvOqfpLJfCuhaNYOhlE0wY+
 1T6Tf1f4c/uA3U/YiadukQ3+6TJuYGAdRZD5EqYFIkreARTVWg87N9g0fT9BEqLw9lJtEGDY
 EWUE7L++B8o4uu3LQFEYxcrb4K/WKmgtmFcm77s0IKDrfcX4doV92QTIpLiRxcOmCC/OCYuO
 jB1oaaqXQzZrCutXRK0L5XN1Y1PYjIrEzHMIXmCDlLYnpFkK+itlXwlE2ZQxkfMruCWdQXye
 syl2fynAe8hvp7Mms9qU2r2K9EcJiR5N1t1C2/kTKNUhcRv7Yd/vwusK7BqJbhlng5ZgRx0m
 WxdntU/JLEntz3QBsBsWM9Y9wf2V4tLv6/DuDBta781RsCB/UrU2zNuOEkSixlUiHxw1dccI
 6CVlaWkkJBxmHX22GdDFrcjvwMNIbbyfQLuBq6IOh8nvu9vuItup7qemDG3Ms6TVwA7BD3j+
 3fGprtyW8Fd/RR2bW2+LWkMrqHffAr6Y6V3h5kd2G9Q8ZWpEJk+LG6Mk3fhZhmCnHhDu6CwN
 MeUvxXDVO+fqc3JjFm5OxhmfVeJKrbCEUJyM8ESWLoNHLqjywdZga4Q7P12g8DUQ1mRxYg/L
 HgZY3zfKOqcAEQEAAYkCPAQYAQgAJhYhBL4sQ7ueBCdcDgGOvnfybtNRzXKPBQJZuCyIAhsM
 BQkFo5qAAAoJEHfybtNRzXKPBVwQAKfFy9P7N3OsLDMB56A4Kf+ZT+d5cIx0Yiaf4n6w7m3i
 ImHHHk9FIetI4Xe54a2IXh4Bq5UkAGY0667eIs+Z1Ea6I2i27Sdo7DxGwq09Qnm/Y65ADvXs
 3aBvokCcm7FsM1wky395m8xUos1681oV5oxgqeRI8/76qy0hD9WR65UW+HQgZRIcIjSel9vR
 XDaD2HLGPTTGr7u4v00UeTMs6qvPsa2PJagogrKY8RXdFtXvweQFz78NbXhluwix2Tb9ETPk
 LIpDrtzV73CaE2aqBG/KrboXT2C67BgFtnk7T7Y7iKq4/XvEdDWscz2wws91BOXuMMd4c/c4
 OmGW9m3RBLufFrOag1q5yUS9QbFfyqL6dftJP3Zq/xe+mr7sbWbhPVCQFrH3r26mpmy841ym
 dwQnNcsbIGiBASBSKksOvIDYKa2Wy8htPmWFTEOPRpFXdGQ27awcjjnB42nngyCK5ukZDHi6
 w0qK5DNQQCkiweevCIC6wc3p67jl1EMFY5+z+zdTPb3h7LeVnGqW0qBQl99vVFgzLxchKcl0
 R/paSFgwqXCZhAKMuUHncJuynDOP7z5LirUeFI8qsBAJi1rXpQoLJTVcW72swZ42IdPiboqx
 NbTMiNOiE36GqMcTPfKylCbF45JNX4nF9ElM0E+Y8gi4cizJYBRr2FBJgay0b9Cp
Message-ID: <e15d66e2-dde4-aaff-87fb-cacd15c1e719@FreeBSD.org>
Date: Wed, 4 Dec 2019 11:48:26 +0200
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101
 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <4D44B25E-2F15-4536-A653-DA242051C8A9@lysator.liu.se>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-Rspamd-Queue-Id: 47SYvM1wNwz4f0s
X-Spamd-Bar: ---
Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none;
 spf=pass (mx1.freebsd.org: domain of agapon@gmail.com designates
 209.85.208.170 as permitted sender) smtp.mailfrom=agapon@gmail.com
X-Spamd-Result: default: False [-3.11 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_TLS_ALL(0.00)[];
 FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org];
 DMARC_NA(0.00)[FreeBSD.org];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_COUNT_THREE(0.00)[3];
 MIME_TRACE(0.00)[0:+]; TO_DN_ALL(0.00)[];
 RCPT_COUNT_TWO(0.00)[2];
 RCVD_IN_DNSWL_NONE(0.00)[170.208.85.209.list.dnswl.org : 127.0.5.0];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 IP_SCORE(-1.11)[ip: (-0.42), ipnet: 209.85.128.0/17(-3.15), asn: 15169(-1.93),
 country: US(-0.05)]; 
 FORGED_SENDER(0.30)[avg@FreeBSD.org,agapon@gmail.com];
 RWL_MAILSPIKE_POSSIBLE(0.00)[170.208.85.209.rep.mailspike.net : 127.0.0.17];
 R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com];
 ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US];
 FROM_NEQ_ENVFROM(0.00)[avg@FreeBSD.org,agapon@gmail.com];
 MID_RHS_MATCH_FROM(0.00)[];
 RECEIVED_SPAMHAUS_PBL(0.00)[96.151.72.93.khpj7ygk5idzvmvt5x4ziurxhy.zen.dq.spamhaus.net
 : 127.0.0.10]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Dec 2019 09:48:32 -0000

On 04/12/2019 00:28, Peter Eriksson wrote:
> … and X levels deeper it gives this (not all filesystems mounted - I rebooted fairly quickly):
> 
> keg_drain: while (SLIST_FIRST/SLIST_REMOVE)-loop took 14 seconds [20021 loops]
> zone_drain_wait(): zone_foreach_keg(zone, &keg_drain) took 14 seconds
> zone_dtor(): zone_drain_wait(zone, M_WAITOK) took 14 seconds
> zone_free_item(zone=UMA Zones): zone->uz_dtor() took 14 seconds
> uma_zdestroy(zio_buf_12288) took 14 seconds
> kmem_cache_destroy: uma_zdestroy(0xfffff803467c8ac0) [zio_buf_12288] took 14 seconds
> kmem_cache_destroy(zio_buf_cache[20]) took 14 seconds
> 
> 
> Hmm… I wonder if it isn’t this code I /usr/src/sys/sys/queue.h:
> 
>> #define SLIST_REMOVE(head, elm, type, field) do {                       \
>>        QMD_SAVELINK(oldnext, (elm)->field.sle_next);                   \
>>        if (SLIST_FIRST((head)) == (elm)) {                             \
>>                SLIST_REMOVE_HEAD((head), field);                       \
>>        }                                                               \
>>        else {                                                          \
>>                QUEUE_TYPEOF(type) *curelm = SLIST_FIRST(head);         \
>>                while (SLIST_NEXT(curelm, field) != (elm))              \
>>                        curelm = SLIST_NEXT(curelm, field);             \
>>                SLIST_REMOVE_AFTER(curelm, field);                      \
>>        }                                                               \
>>        TRASHIT(*oldnext);                                              \
>> } while (0)
> 
> Combined with this in /usr/src/sys/vm/uma_core.c: keg_drain():
> 
>> finished:
>>        KEG_UNLOCK(keg);
>>
>>        while ((slab = SLIST_FIRST(&freeslabs)) != NULL) {
>>                SLIST_REMOVE(&freeslabs, slab, uma_slab, us_hlink);
>>                keg_free_slab(keg, slab, keg->uk_ipers);
>>        }
> 
> (The keg_drain print above is the time for the while() loop).
> 
> If I’m reading this right it looks like it’s potentially doing a linear search through a (long) linked list, for every entry in the free slabs list. 20000 entries in 14 seconds is 1400 entries per second.

It's not a search.  The loop is removing *all* items.  There is no other way to
do it :-)
Expanding your calculations, it takes on the order of a millisecond to process a
single slab.  I bet that the time is spent in keg_free_slab() and it's a lot of
time.

I wonder if the OR should actually be AND in this condition:
                if (!uma_dbg_kskip(keg, slab->us_data + (keg->uk_rsize * i)) ||
                    keg->uk_fini != trash_fini)
Looks like there is a "ton" of memory reading done here.

-- 
Andriy Gapon

From owner-freebsd-fs@freebsd.org  Wed Dec  4 10:13:48 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 241C01CFBC7
 for <freebsd-fs@mailman.nyi.freebsd.org>; Wed,  4 Dec 2019 10:13:48 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 47SZSW66psz4gDc;
 Wed,  4 Dec 2019 10:13:47 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id 2859B40017;
 Wed,  4 Dec 2019 11:13:44 +0100 (CET)
Received: from [IPv6:2001:6b0:17:f002:8d12:4cc9:4751:6d43] (unknown
 [IPv6:2001:6b0:17:f002:8d12:4cc9:4751:6d43])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id 0846F4000D;
 Wed,  4 Dec 2019 11:13:44 +0100 (CET)
From: Peter Eriksson <pen@lysator.liu.se>
Message-Id: <BDE8C3F8-7D55-4C4F-8E40-0DF2FD732D5A@lysator.liu.se>
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3601.0.10\))
Subject: Re: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
Date: Wed, 4 Dec 2019 11:13:43 +0100
In-Reply-To: <e15d66e2-dde4-aaff-87fb-cacd15c1e719@FreeBSD.org>
Cc: Andriy Gapon <avg@FreeBSD.org>
To: FreeBSD FS <freebsd-fs@freebsd.org>
References: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
 <D2A11CE9-9B24-4E40-A51A-8D318E0288C9@lysator.liu.se>
 <20191202225424.GG43802@raichu>
 <3b71fe37-c29f-e3e5-ff96-5dce15cc7553@FreeBSD.org>
 <20191203162219.GI43802@raichu>
 <247C6990-BC6E-4E3D-8CEF-5A861D8A25EC@lysator.liu.se>
 <4D44B25E-2F15-4536-A653-DA242051C8A9@lysator.liu.se>
 <e15d66e2-dde4-aaff-87fb-cacd15c1e719@FreeBSD.org>
X-Mailer: Apple Mail (2.3601.0.10)
X-Virus-Scanned: ClamAV using ClamSMTP
X-Rspamd-Queue-Id: 47SZSW66psz4gDc
X-Spamd-Bar: -----
Authentication-Results: mx1.freebsd.org;
	none
X-Spamd-Result: default: False [-5.99 / 15.00];
 NEURAL_HAM_MEDIUM(-0.99)[-0.993,0];
 NEURAL_HAM_LONG(-1.00)[-0.998,0]; REPLY(-4.00)[]
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Dec 2019 10:13:48 -0000

Yeah, I noticed that after I sent that email (should send stuff just =
before you go to bed :-)

Anyway, some more timing/debug printfs point to the keg->uk_freef(mem) =
call in keg_free_slab() which sometimes take about a second to execute.
(Should probably switch to some other time source with greater =
resolution than 1s next :-)

keg_free_slab: keg->uk_freef(mem) took 1 seconds
keg_free_slab: keg->uk_freef(mem) took 1 seconds
keg_free_slab: keg->uk_freef(mem) took 1 seconds
keg_free_slab: keg->uk_freef(mem) took 1 seconds
keg_free_slab: keg->uk_freef(mem) took 1 seconds
keg_free_slab: keg->uk_freef(mem) took 1 seconds
keg_free_slab: keg->uk_freef(mem) took 1 seconds

- Peter

> On 4 Dec 2019, at 10:48, Andriy Gapon <avg@FreeBSD.org> wrote:
>=20
> On 04/12/2019 00:28, Peter Eriksson wrote:
>> =E2=80=A6 and X levels deeper it gives this (not all filesystems =
mounted - I rebooted fairly quickly):
>>=20
>> keg_drain: while (SLIST_FIRST/SLIST_REMOVE)-loop took 14 seconds =
[20021 loops]
>> zone_drain_wait(): zone_foreach_keg(zone, &keg_drain) took 14 seconds
>> zone_dtor(): zone_drain_wait(zone, M_WAITOK) took 14 seconds
>> zone_free_item(zone=3DUMA Zones): zone->uz_dtor() took 14 seconds
>> uma_zdestroy(zio_buf_12288) took 14 seconds
>> kmem_cache_destroy: uma_zdestroy(0xfffff803467c8ac0) [zio_buf_12288] =
took 14 seconds
>> kmem_cache_destroy(zio_buf_cache[20]) took 14 seconds
>>=20
>>=20
>> Hmm=E2=80=A6 I wonder if it isn=E2=80=99t this code I =
/usr/src/sys/sys/queue.h:
>>=20
>>> #define SLIST_REMOVE(head, elm, type, field) do {                    =
   \
>>>       QMD_SAVELINK(oldnext, (elm)->field.sle_next);                  =
 \
>>>       if (SLIST_FIRST((head)) =3D=3D (elm)) {                        =
     \
>>>               SLIST_REMOVE_HEAD((head), field);                      =
 \
>>>       }                                                              =
 \
>>>       else {                                                         =
 \
>>>               QUEUE_TYPEOF(type) *curelm =3D SLIST_FIRST(head);      =
   \
>>>               while (SLIST_NEXT(curelm, field) !=3D (elm))           =
   \
>>>                       curelm =3D SLIST_NEXT(curelm, field);          =
   \
>>>               SLIST_REMOVE_AFTER(curelm, field);                     =
 \
>>>       }                                                              =
 \
>>>       TRASHIT(*oldnext);                                             =
 \
>>> } while (0)
>>=20
>> Combined with this in /usr/src/sys/vm/uma_core.c: keg_drain():
>>=20
>>> finished:
>>>       KEG_UNLOCK(keg);
>>>=20
>>>       while ((slab =3D SLIST_FIRST(&freeslabs)) !=3D NULL) {
>>>               SLIST_REMOVE(&freeslabs, slab, uma_slab, us_hlink);
>>>               keg_free_slab(keg, slab, keg->uk_ipers);
>>>       }
>>=20
>> (The keg_drain print above is the time for the while() loop).
>>=20
>> If I=E2=80=99m reading this right it looks like it=E2=80=99s =
potentially doing a linear search through a (long) linked list, for =
every entry in the free slabs list. 20000 entries in 14 seconds is 1400 =
entries per second.
>=20
> It's not a search.  The loop is removing *all* items.  There is no =
other way to
> do it :-)
> Expanding your calculations, it takes on the order of a millisecond to =
process a
> single slab.  I bet that the time is spent in keg_free_slab() and it's =
a lot of
> time.
>=20
> I wonder if the OR should actually be AND in this condition:
>                if (!uma_dbg_kskip(keg, slab->us_data + (keg->uk_rsize =
* i)) ||
>                    keg->uk_fini !=3D trash_fini)
> Looks like there is a "ton" of memory reading done here.
>=20
> --=20
> Andriy Gapon


From owner-freebsd-fs@freebsd.org  Wed Dec  4 12:48:10 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 845CB1B6686
 for <freebsd-fs@mailman.nyi.freebsd.org>; Wed,  4 Dec 2019 12:48:10 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (mail.lysator.liu.se
 [IPv6:2001:6b0:17:f0a0::3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 47Sdtc69cjz3Jq1;
 Wed,  4 Dec 2019 12:48:08 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id 1AA4F4000D;
 Wed,  4 Dec 2019 13:48:06 +0100 (CET)
Received: from [IPv6:2001:6b0:17:f002:8d12:4cc9:4751:6d43] (unknown
 [IPv6:2001:6b0:17:f002:8d12:4cc9:4751:6d43])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id 998C54000E;
 Wed,  4 Dec 2019 13:48:05 +0100 (CET)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3601.0.10\))
Subject: Re: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
From: Peter Eriksson <pen@lysator.liu.se>
In-Reply-To: <BDE8C3F8-7D55-4C4F-8E40-0DF2FD732D5A@lysator.liu.se>
Date: Wed, 4 Dec 2019 13:48:05 +0100
Cc: Andriy Gapon <avg@freebsd.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <1119144A-C6DA-4CBF-8616-108FB50D2F8D@lysator.liu.se>
References: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
 <D2A11CE9-9B24-4E40-A51A-8D318E0288C9@lysator.liu.se>
 <20191202225424.GG43802@raichu>
 <3b71fe37-c29f-e3e5-ff96-5dce15cc7553@FreeBSD.org>
 <20191203162219.GI43802@raichu>
 <247C6990-BC6E-4E3D-8CEF-5A861D8A25EC@lysator.liu.se>
 <4D44B25E-2F15-4536-A653-DA242051C8A9@lysator.liu.se>
 <e15d66e2-dde4-aaff-87fb-cacd15c1e719@FreeBSD.org>
 <BDE8C3F8-7D55-4C4F-8E40-0DF2FD732D5A@lysator.liu.se>
To: FreeBSD FS <freebsd-fs@freebsd.org>
X-Mailer: Apple Mail (2.3601.0.10)
X-Virus-Scanned: ClamAV using ClamSMTP
X-Rspamd-Queue-Id: 47Sdtc69cjz3Jq1
X-Spamd-Bar: ---
Authentication-Results: mx1.freebsd.org; dkim=none;
 dmarc=pass (policy=none) header.from=liu.se;
 spf=pass (mx1.freebsd.org: domain of pen@lysator.liu.se designates
 2001:6b0:17:f0a0::3 as permitted sender) smtp.mailfrom=pen@lysator.liu.se
X-Spamd-Result: default: False [-3.91 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 FROM_HAS_DN(0.00)[];
 R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain];
 MV_CASE(0.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0];
 RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[];
 TO_DN_ALL(0.00)[];
 RCVD_IN_DNSWL_MED(-0.20)[3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.a.0.f.7.1.0.0.0.b.6.0.1.0.0.2.list.dnswl.org
 : 127.0.11.2]; RCPT_COUNT_TWO(0.00)[2];
 DMARC_POLICY_ALLOW(-0.50)[liu.se,none];
 IP_SCORE(-1.41)[ip: (-6.19), ipnet: 2001:6b0::/32(-0.46), asn: 1653(-0.38),
 country: EU(-0.00)]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:1653, ipnet:2001:6b0::/32, country:EU];
 MID_RHS_MATCH_FROM(0.00)[]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Dec 2019 12:48:10 -0000

Going deeper=E2=80=A6 now it=E2=80=99s pointing towards pmap_remove(), =
called from kmem_unback(), which is called from kmem_free(), which is =
called from page_free() which is called from keg_free_slab()...


kmem_unback: pmap_remove(kernel_pmap, 18446741884786954240, =
18446741884786970624) took 1 seconds
kmem_free: kmem_unback(kmem_object, 18446741884786954240, 16384) took 1 =
seconds
page_free: kmem_free(kmem_arena, 18446741884786954240, 16384) took 1 =
seconds
keg_free_slab: keg->uk_freef(mem) {page_free} took 1 seconds

kmem_unback: pmap_remove(kernel_pmap, 18446741884989739008, =
18446741884989755392) took 1 seconds
kmem_free: kmem_unback(kmem_object, 18446741884989739008, 16384) took 1 =
seconds
page_free: kmem_free(kmem_arena, 18446741884989739008, 16384) took 1 =
seconds
keg_free_slab: keg->uk_freef(mem) {page_free} took 1 seconds

kmem_unback: pmap_remove(kernel_pmap, 18446741885257895936, =
18446741885257912320) took 1 seconds
kmem_free: kmem_unback(kmem_object, 18446741885257895936, 16384) took 1 =
seconds
page_free: kmem_free(kmem_arena, 18446741885257895936, 16384) took 1 =
seconds
keg_free_slab: keg->uk_freef(mem) {page_free} took 1 seconds


- Peter

> On 4 Dec 2019, at 11:13, Peter Eriksson <pen@lysator.liu.se> wrote:
>=20
> Yeah, I noticed that after I sent that email (should send stuff just =
before you go to bed :-)
>=20
> Anyway, some more timing/debug printfs point to the keg->uk_freef(mem) =
call in keg_free_slab() which sometimes take about a second to execute.
> (Should probably switch to some other time source with greater =
resolution than 1s next :-)
>=20
> keg_free_slab: keg->uk_freef(mem) took 1 seconds
> keg_free_slab: keg->uk_freef(mem) took 1 seconds
> keg_free_slab: keg->uk_freef(mem) took 1 seconds
> keg_free_slab: keg->uk_freef(mem) took 1 seconds
> keg_free_slab: keg->uk_freef(mem) took 1 seconds
> keg_free_slab: keg->uk_freef(mem) took 1 seconds
> keg_free_slab: keg->uk_freef(mem) took 1 seconds
>=20
> - Peter
>=20
>> On 4 Dec 2019, at 10:48, Andriy Gapon <avg@FreeBSD.org> wrote:
>>=20
>> On 04/12/2019 00:28, Peter Eriksson wrote:
>>> =E2=80=A6 and X levels deeper it gives this (not all filesystems =
mounted - I rebooted fairly quickly):
>>>=20
>>> keg_drain: while (SLIST_FIRST/SLIST_REMOVE)-loop took 14 seconds =
[20021 loops]
>>> zone_drain_wait(): zone_foreach_keg(zone, &keg_drain) took 14 =
seconds
>>> zone_dtor(): zone_drain_wait(zone, M_WAITOK) took 14 seconds
>>> zone_free_item(zone=3DUMA Zones): zone->uz_dtor() took 14 seconds
>>> uma_zdestroy(zio_buf_12288) took 14 seconds
>>> kmem_cache_destroy: uma_zdestroy(0xfffff803467c8ac0) [zio_buf_12288] =
took 14 seconds
>>> kmem_cache_destroy(zio_buf_cache[20]) took 14 seconds
>>>=20
>>>=20
>>> Hmm=E2=80=A6 I wonder if it isn=E2=80=99t this code I =
/usr/src/sys/sys/queue.h:
>>>=20
>>>> #define SLIST_REMOVE(head, elm, type, field) do {                   =
    \
>>>>      QMD_SAVELINK(oldnext, (elm)->field.sle_next);                  =
 \
>>>>      if (SLIST_FIRST((head)) =3D=3D (elm)) {                        =
     \
>>>>              SLIST_REMOVE_HEAD((head), field);                      =
 \
>>>>      }                                                              =
 \
>>>>      else {                                                         =
 \
>>>>              QUEUE_TYPEOF(type) *curelm =3D SLIST_FIRST(head);      =
   \
>>>>              while (SLIST_NEXT(curelm, field) !=3D (elm))           =
   \
>>>>                      curelm =3D SLIST_NEXT(curelm, field);          =
   \
>>>>              SLIST_REMOVE_AFTER(curelm, field);                     =
 \
>>>>      }                                                              =
 \
>>>>      TRASHIT(*oldnext);                                             =
 \
>>>> } while (0)
>>>=20
>>> Combined with this in /usr/src/sys/vm/uma_core.c: keg_drain():
>>>=20
>>>> finished:
>>>>      KEG_UNLOCK(keg);
>>>>=20
>>>>      while ((slab =3D SLIST_FIRST(&freeslabs)) !=3D NULL) {
>>>>              SLIST_REMOVE(&freeslabs, slab, uma_slab, us_hlink);
>>>>              keg_free_slab(keg, slab, keg->uk_ipers);
>>>>      }
>>>=20
>>> (The keg_drain print above is the time for the while() loop).
>>>=20
>>> If I=E2=80=99m reading this right it looks like it=E2=80=99s =
potentially doing a linear search through a (long) linked list, for =
every entry in the free slabs list. 20000 entries in 14 seconds is 1400 =
entries per second.
>>=20
>> It's not a search.  The loop is removing *all* items.  There is no =
other way to
>> do it :-)
>> Expanding your calculations, it takes on the order of a millisecond =
to process a
>> single slab.  I bet that the time is spent in keg_free_slab() and =
it's a lot of
>> time.
>>=20
>> I wonder if the OR should actually be AND in this condition:
>>               if (!uma_dbg_kskip(keg, slab->us_data + (keg->uk_rsize =
* i)) ||
>>                   keg->uk_fini !=3D trash_fini)
>> Looks like there is a "ton" of memory reading done here.
>>=20
>> --=20
>> Andriy Gapon
>=20
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@freebsd.org  Wed Dec  4 15:20:10 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 03EA51C6FB8
 for <freebsd-fs@mailman.nyi.freebsd.org>; Wed,  4 Dec 2019 15:20:10 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 47SjFz1r8Hz3xYm;
 Wed,  4 Dec 2019 15:20:07 +0000 (UTC)
 (envelope-from pen@lysator.liu.se)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id 5344F4000E;
 Wed,  4 Dec 2019 16:20:04 +0100 (CET)
Received: from [IPv6:2001:6b0:17:f002:8d12:4cc9:4751:6d43] (unknown
 [IPv6:2001:6b0:17:f002:8d12:4cc9:4751:6d43])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id 1D96A4000D;
 Wed,  4 Dec 2019 16:20:04 +0100 (CET)
From: Peter Eriksson <pen@lysator.liu.se>
Message-Id: <F8C0E18B-98D1-4A03-B98D-A5D49F559675@lysator.liu.se>
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3601.0.10\))
Subject: Re: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()
Date: Wed, 4 Dec 2019 16:20:03 +0100
In-Reply-To: <1119144A-C6DA-4CBF-8616-108FB50D2F8D@lysator.liu.se>
Cc: Andriy Gapon <avg@freebsd.org>
To: FreeBSD FS <freebsd-fs@freebsd.org>
References: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
 <D2A11CE9-9B24-4E40-A51A-8D318E0288C9@lysator.liu.se>
 <20191202225424.GG43802@raichu>
 <3b71fe37-c29f-e3e5-ff96-5dce15cc7553@FreeBSD.org>
 <20191203162219.GI43802@raichu>
 <247C6990-BC6E-4E3D-8CEF-5A861D8A25EC@lysator.liu.se>
 <4D44B25E-2F15-4536-A653-DA242051C8A9@lysator.liu.se>
 <e15d66e2-dde4-aaff-87fb-cacd15c1e719@FreeBSD.org>
 <BDE8C3F8-7D55-4C4F-8E40-0DF2FD732D5A@lysator.liu.se>
 <1119144A-C6DA-4CBF-8616-108FB50D2F8D@lysator.liu.se>
X-Mailer: Apple Mail (2.3601.0.10)
X-Virus-Scanned: ClamAV using ClamSMTP
X-Rspamd-Queue-Id: 47SjFz1r8Hz3xYm
X-Spamd-Bar: ----
Authentication-Results: mx1.freebsd.org; dkim=none;
 dmarc=pass (policy=none) header.from=liu.se;
 spf=pass (mx1.freebsd.org: domain of pen@lysator.liu.se designates
 130.236.254.3 as permitted sender) smtp.mailfrom=pen@lysator.liu.se
X-Spamd-Result: default: False [-4.91 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 FROM_HAS_DN(0.00)[];
 R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se];
 TO_MATCH_ENVRCPT_ALL(0.00)[];
 MIME_GOOD(-0.10)[multipart/alternative,text/plain];
 MV_CASE(0.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0];
 URI_COUNT_ODD(1.00)[3]; RCVD_COUNT_THREE(0.00)[3];
 RCVD_TLS_LAST(0.00)[]; TO_DN_ALL(0.00)[];
 RCVD_IN_DNSWL_MED(-0.20)[3.254.236.130.list.dnswl.org : 127.0.11.2];
 RCPT_COUNT_TWO(0.00)[2];
 DMARC_POLICY_ALLOW(-0.50)[liu.se,none];
 IP_SCORE(-3.41)[ip: (-8.93), ipnet: 130.236.0.0/16(-4.51), asn: 2843(-3.61),
 country: SE(-0.03)]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~];
 ASN(0.00)[asn:2843, ipnet:130.236.0.0/16, country:SE];
 MID_RHS_MATCH_FROM(0.00)[]
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Dec 2019 15:20:10 -0000

Yep, it=E2=80=99s pmap_remove() that occasionally takes ~1s instead of =
being very quick. A test I ran:


kmem_unback: pmap_remove(kernel_pmap, 18446741878385815552, =
18446741878385827840) took 1 seconds
kmem_free: kmem_unback(kmem_object, 18446741878385815552, 12288) took 1 =
seconds
page_free: kmem_free(kmem_arena, 18446741878385815552, 12288) took 1 =
seconds
keg_free_slab: keg->uk_freef(mem) {page_free} took 1 seconds

keg_drain: while()-keg_free_slab-loop took 17 seconds [23136 loops, 17 =
slow calls]
zone_drain_wait(): zone_foreach_keg(zone, &keg_drain) took 17 seconds
zone_dtor(): zone_drain_wait(zone, M_WAITOK) took 17 seconds
zone_free_item(zone=3DUMA Zones): zone->uz_dtor() took 17 seconds
uma_zdestroy(zio_buf_12288) took 17 seconds
kmem_cache_destroy: uma_zdestroy(0xfffff803465dd980) [zio_buf_12288] =
took 17 seconds
kmem_cache_destroy(zio_buf_cache[20]) took 17 seconds


Ie, 17 out of 23136 calls to =
keg_free_slab()->page_free()->kmem_free()->kmem_unback()->pmap_remove() =
was slow (took atleast 1s).

I created a bug report (242427) on bugs.freebsd.org =
<http://bugs.freebsd.org/> for this now in case someone knows how to =
debug the PMAP stuff=E2=80=A6 :-)

- Peter