Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 26 May 2018 21:54:10 +0200
From:      Alexander Leidinger <Alexander@leidinger.net>
To:        freebsd-current@freebsd.org
Subject:   Re: Deadlocks / hangs in ZFS
Message-ID:  <20180526215410.Horde.TLpIgePvctlYUqw9QcqlgGR@webmail.leidinger.net>
In-Reply-To: <fa263af4-9bf7-88f8-8d23-21456daf7960@FreeBSD.org>
References:  <20180522101749.Horde.Wxz9gSxx1xArxkYMQqTL0iZ@webmail.leidinger.net> <fa263af4-9bf7-88f8-8d23-21456daf7960@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This message is in MIME format and has been PGP signed.

--=_qYusH3OX4idVmNdR4PqZf1V
Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable


Quoting Steve Wills <swills@freebsd.org> (from Tue, 22 May 2018=20=20
08:17:00=20-0400):

> I may be seeing similar issues. Have you tried leaving top -SHa=20=20
>=20running and seeing what threads are using CPU when it hangs? I did=20=
=20
>=20and saw pid 17 [zfskern{txg_thread_enter}] using lots of CPU but no=20=
=20
>=20disk activity happening. Do you see similar?

For me it is a different zfs process/kthread, l2arc_feed_thread.=20=20
Please=20note that there is still 31 GB free, so it doesn't look lie=20=20
resource=20exhaustion. What I consider strange is the swap usage. I=20=20
watched=20the system and it started to use swap while there were >30 GB=20=
=20
listed=20as free (in/out rates visible from time to time, and plenty of=20=
=20
RAM=20free... ???).

last pid: 93392;  load averages:  0.16,  0.44,  1.03=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20
=20                      up 1+15:36:34  22:35:45
1509 processes:17 running, 1392 sleeping, 3 zombie, 97 waiting
CPU:  0.1% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.9% idle
Mem: 597M Active, 1849M Inact, 6736K Laundry, 25G Wired, 31G Free
ARC: 20G Total, 9028M MFU, 6646M MRU, 2162M Anon, 337M Header, 1935M Other
      14G Compressed, 21G Uncompressed, 1.53:1 Ratio
Swap: 4096M Total, 1640M Used, 2455M Free, 40% Inuse

   PID     JID USERNAME      PRI NICE   SIZE    RES STATE   C   TIME=20=20=
=20=20
=20 WCPU COMMAND
    10       0 root          155 ki31     0K   256K CPU1    1  35.4H=20=20
100.00%=20[idle{idle: cpu1}]
    10       0 root          155 ki31     0K   256K CPU11  11  35.2H=20=20
100.00%=20[idle{idle: cpu11}]
    10       0 root          155 ki31     0K   256K CPU3    3  35.2H=20=20
100.00%=20[idle{idle: cpu3}]
    10       0 root          155 ki31     0K   256K CPU15  15  35.1H=20=20
100.00%=20[idle{idle: cpu15}]
    10       0 root          155 ki31     0K   256K RUN     9  35.1H=20=20
100.00%=20[idle{idle: cpu9}]
    10       0 root          155 ki31     0K   256K CPU5    5  35.0H=20=20
100.00%=20[idle{idle: cpu5}]
    10       0 root          155 ki31     0K   256K CPU14  14  35.0H=20=20
100.00%=20[idle{idle: cpu14}]
    10       0 root          155 ki31     0K   256K CPU0    0  35.8H=20=20=
=20
99.12%=20[idle{idle: cpu0}]
    10       0 root          155 ki31     0K   256K CPU6    6  35.3H=20=20=
=20
98.79%=20[idle{idle: cpu6}]
    10       0 root          155 ki31     0K   256K CPU8    8  35.1H=20=20=
=20
98.31%=20[idle{idle: cpu8}]
    10       0 root          155 ki31     0K   256K CPU12  12  35.0H=20=20=
=20
97.24%=20[idle{idle: cpu12}]
    10       0 root          155 ki31     0K   256K CPU4    4  35.4H=20=20=
=20
96.71%=20[idle{idle: cpu4}]
    10       0 root          155 ki31     0K   256K CPU10  10  35.0H=20=20=
=20
92.37%=20[idle{idle: cpu10}]
    10       0 root          155 ki31     0K   256K CPU7    7  35.2H=20=20=
=20
92.20%=20[idle{idle: cpu7}]
    10       0 root          155 ki31     0K   256K CPU13  13  35.1H=20=20=
=20
91.90%=20[idle{idle: cpu13}]
    10       0 root          155 ki31     0K   256K CPU2    2  35.4H=20=20=
=20
90.97%=20[idle{idle: cpu2}]
    11       0 root          -60    -     0K   816K WAIT    0  15:08=20=20=
=20=20
0.82%=20[intr{swi4: clock (0)}]
    31       0 root          -16    -     0K    80K pwait   0  44:54=20=20=
=20=20
0.60%=20[pagedaemon{dom0}]
45453       0 root           20    0 16932K  7056K CPU9    9   4:12=20=20=
=20=20
0.24%=20top -SHaj
    24       0 root           -8    -     0K   256K l2arc_  0   4:12=20=20=
=20=20
0.21%=20[zfskern{l2arc_feed_thread}]
  2375       0 root           20    0 16872K  6868K select 11   3:52=20=20=
=20=20
0.20%=20top -SHua
  7007      12    235         20    0 18017M   881M uwait  12   0:00=20=20=
=20=20
0.19%=20[java{ESH-thingHandler-35}]
    32       0 root          -16    -     0K    16K psleep 15   5:03=20=20=
=20=20
0.11%=20[vmdaemon]
41037       0 netchild       27    0 18036K  9136K select  4   2:20=20=20=
=20=20
0.09%=20tmux: server (/tmp/tmux-1001/default) (t
    36       0 root          -16    -     0K    16K -       6   2:02=20=20=
=20=20
0.09%=20[racctd]
  7007      12    235         20    0 18017M   881M uwait   9   1:24=20=20=
=20=20
0.07%=20[java{java}]
  4746       0 root           20    0 13020K  3792K nanslp  8   0:52=20=20=
=20=20
0.05%=20zpool iostat space 1
     0       0 root          -76    -     0K 10304K -       4   0:16=20=20=
=20=20
0.05%=20[kernel{if_io_tqg_4}]
  5550       8    933         20    0  2448M   607M uwait   8   0:41=20=20=
=20=20
0.03%=20[java{java}]
  5550       8    933         20    0  2448M   607M uwait  13   0:03=20=20=
=20=20
0.03%=20[java{Timer-1}]
  7007      12    235         20    0 18017M   881M uwait   0   0:39=20=20=
=20=20
0.02%=20[java{java}]
  5655       8    560         20    0 21524K  4840K select  6   0:21=20=20=
=20=20
0.02%=20/usr/local/sbin/hald{hald}
    30       0 root          -16    -     0K    16K -       4   0:25=20=20=
=20=20
0.01%=20[rand_harvestq]
  1259       0 root           20    0 18780K 18860K select 14   0:19=20=20=
=20=20
0.01%=20/usr/sbin/ntpd -c /etc/ntp.conf -p /var/
     0       0 root          -76    -     0K 10304K -      12   0:19=20=20=
=20=20
0.01%=20[kernel{if_config_tqg_0}]
    31       0 root          -16    -     0K    80K psleep  0   0:38=20=20=
=20=20
0.01%=20[pagedaemon{dom1}]
     0       0 root          -76    -     0K 10304K -       5   0:04=20=20=
=20=20
0.01%=20[kernel{if_io_tqg_5}]
  7007      12    235         20    0 18017M   881M uwait   1   0:16=20=20=
=20=20
0.01%=20[java{Karaf Lock Monitor }]
12622       2     88         20    0  1963M   247M uwait   7   0:13=20=20=
=20=20
0.01%=20[mysqld{mysqld}]
27043       0 netchild       20    0 18964K  9124K select  6   0:01=20=20=
=20=20
0.01%=20sshd: netchild@pts/0 (sshd)
  7007      12    235         20    0 18017M   881M uwait   8   0:10=20=20=
=20=20
0.01%=20[java{openHAB-job-schedul}]
  7007      12    235         20    0 18017M   881M uwait   6   0:10=20=20=
=20=20
0.01%=20[java{openHAB-job-schedul}]


> On 05/22/18 04:17, Alexander Leidinger wrote:
>> Hi,
>>
>> does someone else experience deadlocks / hangs in ZFS?
>>
>> What I see is that if on a 2 socket / 4 cores -> 16 threads system=20=20
>>=20I do a lot in parallel (e.g. updating ports in several jails), then=20=
=20
>>=20the system may get into a state were I can login, but any exit=20=20
>>=20(e.g. from top) or logout of shell blocks somewhere. Sometimes it=20=
=20
>>=20helps to CTRL-C all updates to get the system into a good shape=20=20
>>=20again, but most of the times it doesn't.
>>
>> On another system at the same rev (333966) with a lot less CPUs=20=20
>>=20(and AMD instead of Intel), I don't see such a behavior.
>>
>> Bye,
>> Alexander.
>>
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org=
"


--=20
http://www.Leidinger.net=20Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF

--=_qYusH3OX4idVmNdR4PqZf1V
Content-Type: application/pgp-signature
Content-Description: Digitale PGP-Signatur
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAABAgAGBQJbCbtiAAoJEKrxQhqFIICEHsMP/3tDHv4Om5O5TalIvB2sItP3
G7xt1pHfOgCEnMwhbnXkKu5NlcN8FH0k8tdFHkJqE8w9j8GyrXkS6n69SFaaohCF
wryx9ASeTFVvShPKCM84wy/Xeke25E2AHNf28m7fkBM9GDOOSoFnL+vkvPJ3Wv7W
+oYVSwahuRC5yUY0mfQJdqwyfyA57UJBpa5tOCfoinNm+jDtxV9batg5a9Ph81Qw
XaGpZCQ/SY0RTrSospQrzHa6Y6dqPJffsQrfntYS+iaZRs/0my9OOaTbrJ4K0FVw
IoOoy1wS4Bp6Sikf5j5TEnpaTsdfX7UK2TODpY4oz6vw/iREzlxVhCNgvn4xAqhG
N7Ubp/ZNxwLrWgBvEc+aXRF9HxapCLC0dyLqzzGio3z29Zb4XGOFCpcENsSbJh5k
JPsZqua+KH5j7poZ2f3wC0+OS7dqBGhb9ot7eCXE7cvUbGpwAXSUaOC3DmE7PLto
zgNIx1u8n6OvPPiayyw9AWNE8fpr5A82G8np7ThIKeMqX+TKFvx5ugbJEiu2X7rT
25cAQREa3+rZNZLhpV2HHggXCUK/Qo2NXIEeVNkOCzyy8Ev/3VRbpLoU8btpvMWr
L8bG7cYGrFk+Dt+DEjr8j3fVImjgd5yFSrL6Vlcfm7lBpnVRA+yNXNWHzHXnMEt1
Wk8vLWkIqgsQRmJTv8xI
=u3NS
-----END PGP SIGNATURE-----

--=_qYusH3OX4idVmNdR4PqZf1V--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180526215410.Horde.TLpIgePvctlYUqw9QcqlgGR>