From owner-freebsd-current@freebsd.org Sat May 26 19:54:39 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E9CACEF46D3 for ; Sat, 26 May 2018 19:54:38 +0000 (UTC) (envelope-from Alexander@leidinger.net) Received: from mailgate.Leidinger.net (bastille.leidinger.net [89.238.82.207]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 773507F16E for ; Sat, 26 May 2018 19:54:37 +0000 (UTC) (envelope-from Alexander@leidinger.net) Date: Sat, 26 May 2018 21:54:10 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=leidinger.net; s=outgoing-alex; t=1527364469; bh=w+DFwPyoPCBM0By5Jbj14WN/ZZ1nOaoAAWnWZrmnpfw=; h=Date:From:To:Subject:References:In-Reply-To; b=Q4njxMnzVVpO7Lethv76JohUXQoxm5ekd2l7eZv7nwXdQDA0v2dHIcfIgcy6POwrH EwS+X3ESp9du6Avp+EpxPr9w2DfzzhhViqtDVxmncd8mUYJ+Z5iZXSRfo512M0Ys2R RfXrtvtkqUQUZ9cVGlmc8UR7nPe0D0KhZxjjaJm6V1yki0YOeBOFZA1W5x3cAPdZgY ssOd7enw6Vfkq31f76Qzhni1suR27/nCfBnrMPLMxEymtIcCIXO/ubAoEzdmYXkQqt 21Wy0gN0PDGV0wTxgWJEdz40veCoSBTI5Wo3smYxW+wEyh/5ijJJs/3TWTbd0M68/N VffZqdOrD/+dA== Message-ID: <20180526215410.Horde.TLpIgePvctlYUqw9QcqlgGR@webmail.leidinger.net> From: Alexander Leidinger To: freebsd-current@freebsd.org Subject: Re: Deadlocks / hangs in ZFS References: <20180522101749.Horde.Wxz9gSxx1xArxkYMQqTL0iZ@webmail.leidinger.net> In-Reply-To: User-Agent: Horde Application Framework 5 Content-Type: multipart/signed; boundary="=_qYusH3OX4idVmNdR4PqZf1V"; protocol="application/pgp-signature"; micalg=pgp-sha1 MIME-Version: 1.0 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 May 2018 19:54:39 -0000 This message is in MIME format and has been PGP signed. --=_qYusH3OX4idVmNdR4PqZf1V Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Quoting Steve Wills (from Tue, 22 May 2018=20=20 08:17:00=20-0400): > I may be seeing similar issues. Have you tried leaving top -SHa=20=20 >=20running and seeing what threads are using CPU when it hangs? I did=20= =20 >=20and saw pid 17 [zfskern{txg_thread_enter}] using lots of CPU but no=20= =20 >=20disk activity happening. Do you see similar? For me it is a different zfs process/kthread, l2arc_feed_thread.=20=20 Please=20note that there is still 31 GB free, so it doesn't look lie=20=20 resource=20exhaustion. What I consider strange is the swap usage. I=20=20 watched=20the system and it started to use swap while there were >30 GB=20= =20 listed=20as free (in/out rates visible from time to time, and plenty of=20= =20 RAM=20free... ???). last pid: 93392; load averages: 0.16, 0.44, 1.03=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20 =20 up 1+15:36:34 22:35:45 1509 processes:17 running, 1392 sleeping, 3 zombie, 97 waiting CPU: 0.1% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.9% idle Mem: 597M Active, 1849M Inact, 6736K Laundry, 25G Wired, 31G Free ARC: 20G Total, 9028M MFU, 6646M MRU, 2162M Anon, 337M Header, 1935M Other 14G Compressed, 21G Uncompressed, 1.53:1 Ratio Swap: 4096M Total, 1640M Used, 2455M Free, 40% Inuse PID JID USERNAME PRI NICE SIZE RES STATE C TIME=20=20= =20=20 =20 WCPU COMMAND 10 0 root 155 ki31 0K 256K CPU1 1 35.4H=20=20 100.00%=20[idle{idle: cpu1}] 10 0 root 155 ki31 0K 256K CPU11 11 35.2H=20=20 100.00%=20[idle{idle: cpu11}] 10 0 root 155 ki31 0K 256K CPU3 3 35.2H=20=20 100.00%=20[idle{idle: cpu3}] 10 0 root 155 ki31 0K 256K CPU15 15 35.1H=20=20 100.00%=20[idle{idle: cpu15}] 10 0 root 155 ki31 0K 256K RUN 9 35.1H=20=20 100.00%=20[idle{idle: cpu9}] 10 0 root 155 ki31 0K 256K CPU5 5 35.0H=20=20 100.00%=20[idle{idle: cpu5}] 10 0 root 155 ki31 0K 256K CPU14 14 35.0H=20=20 100.00%=20[idle{idle: cpu14}] 10 0 root 155 ki31 0K 256K CPU0 0 35.8H=20=20= =20 99.12%=20[idle{idle: cpu0}] 10 0 root 155 ki31 0K 256K CPU6 6 35.3H=20=20= =20 98.79%=20[idle{idle: cpu6}] 10 0 root 155 ki31 0K 256K CPU8 8 35.1H=20=20= =20 98.31%=20[idle{idle: cpu8}] 10 0 root 155 ki31 0K 256K CPU12 12 35.0H=20=20= =20 97.24%=20[idle{idle: cpu12}] 10 0 root 155 ki31 0K 256K CPU4 4 35.4H=20=20= =20 96.71%=20[idle{idle: cpu4}] 10 0 root 155 ki31 0K 256K CPU10 10 35.0H=20=20= =20 92.37%=20[idle{idle: cpu10}] 10 0 root 155 ki31 0K 256K CPU7 7 35.2H=20=20= =20 92.20%=20[idle{idle: cpu7}] 10 0 root 155 ki31 0K 256K CPU13 13 35.1H=20=20= =20 91.90%=20[idle{idle: cpu13}] 10 0 root 155 ki31 0K 256K CPU2 2 35.4H=20=20= =20 90.97%=20[idle{idle: cpu2}] 11 0 root -60 - 0K 816K WAIT 0 15:08=20=20= =20=20 0.82%=20[intr{swi4: clock (0)}] 31 0 root -16 - 0K 80K pwait 0 44:54=20=20= =20=20 0.60%=20[pagedaemon{dom0}] 45453 0 root 20 0 16932K 7056K CPU9 9 4:12=20=20= =20=20 0.24%=20top -SHaj 24 0 root -8 - 0K 256K l2arc_ 0 4:12=20=20= =20=20 0.21%=20[zfskern{l2arc_feed_thread}] 2375 0 root 20 0 16872K 6868K select 11 3:52=20=20= =20=20 0.20%=20top -SHua 7007 12 235 20 0 18017M 881M uwait 12 0:00=20=20= =20=20 0.19%=20[java{ESH-thingHandler-35}] 32 0 root -16 - 0K 16K psleep 15 5:03=20=20= =20=20 0.11%=20[vmdaemon] 41037 0 netchild 27 0 18036K 9136K select 4 2:20=20=20= =20=20 0.09%=20tmux: server (/tmp/tmux-1001/default) (t 36 0 root -16 - 0K 16K - 6 2:02=20=20= =20=20 0.09%=20[racctd] 7007 12 235 20 0 18017M 881M uwait 9 1:24=20=20= =20=20 0.07%=20[java{java}] 4746 0 root 20 0 13020K 3792K nanslp 8 0:52=20=20= =20=20 0.05%=20zpool iostat space 1 0 0 root -76 - 0K 10304K - 4 0:16=20=20= =20=20 0.05%=20[kernel{if_io_tqg_4}] 5550 8 933 20 0 2448M 607M uwait 8 0:41=20=20= =20=20 0.03%=20[java{java}] 5550 8 933 20 0 2448M 607M uwait 13 0:03=20=20= =20=20 0.03%=20[java{Timer-1}] 7007 12 235 20 0 18017M 881M uwait 0 0:39=20=20= =20=20 0.02%=20[java{java}] 5655 8 560 20 0 21524K 4840K select 6 0:21=20=20= =20=20 0.02%=20/usr/local/sbin/hald{hald} 30 0 root -16 - 0K 16K - 4 0:25=20=20= =20=20 0.01%=20[rand_harvestq] 1259 0 root 20 0 18780K 18860K select 14 0:19=20=20= =20=20 0.01%=20/usr/sbin/ntpd -c /etc/ntp.conf -p /var/ 0 0 root -76 - 0K 10304K - 12 0:19=20=20= =20=20 0.01%=20[kernel{if_config_tqg_0}] 31 0 root -16 - 0K 80K psleep 0 0:38=20=20= =20=20 0.01%=20[pagedaemon{dom1}] 0 0 root -76 - 0K 10304K - 5 0:04=20=20= =20=20 0.01%=20[kernel{if_io_tqg_5}] 7007 12 235 20 0 18017M 881M uwait 1 0:16=20=20= =20=20 0.01%=20[java{Karaf Lock Monitor }] 12622 2 88 20 0 1963M 247M uwait 7 0:13=20=20= =20=20 0.01%=20[mysqld{mysqld}] 27043 0 netchild 20 0 18964K 9124K select 6 0:01=20=20= =20=20 0.01%=20sshd: netchild@pts/0 (sshd) 7007 12 235 20 0 18017M 881M uwait 8 0:10=20=20= =20=20 0.01%=20[java{openHAB-job-schedul}] 7007 12 235 20 0 18017M 881M uwait 6 0:10=20=20= =20=20 0.01%=20[java{openHAB-job-schedul}] > On 05/22/18 04:17, Alexander Leidinger wrote: >> Hi, >> >> does someone else experience deadlocks / hangs in ZFS? >> >> What I see is that if on a 2 socket / 4 cores -> 16 threads system=20=20 >>=20I do a lot in parallel (e.g. updating ports in several jails), then=20= =20 >>=20the system may get into a state were I can login, but any exit=20=20 >>=20(e.g. from top) or logout of shell blocks somewhere. Sometimes it=20= =20 >>=20helps to CTRL-C all updates to get the system into a good shape=20=20 >>=20again, but most of the times it doesn't. >> >> On another system at the same rev (333966) with a lot less CPUs=20=20 >>=20(and AMD instead of Intel), I don't see such a behavior. >> >> Bye, >> Alexander. >> > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org= " --=20 http://www.Leidinger.net=20Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF --=_qYusH3OX4idVmNdR4PqZf1V Content-Type: application/pgp-signature Content-Description: Digitale PGP-Signatur Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJbCbtiAAoJEKrxQhqFIICEHsMP/3tDHv4Om5O5TalIvB2sItP3 G7xt1pHfOgCEnMwhbnXkKu5NlcN8FH0k8tdFHkJqE8w9j8GyrXkS6n69SFaaohCF wryx9ASeTFVvShPKCM84wy/Xeke25E2AHNf28m7fkBM9GDOOSoFnL+vkvPJ3Wv7W +oYVSwahuRC5yUY0mfQJdqwyfyA57UJBpa5tOCfoinNm+jDtxV9batg5a9Ph81Qw XaGpZCQ/SY0RTrSospQrzHa6Y6dqPJffsQrfntYS+iaZRs/0my9OOaTbrJ4K0FVw IoOoy1wS4Bp6Sikf5j5TEnpaTsdfX7UK2TODpY4oz6vw/iREzlxVhCNgvn4xAqhG N7Ubp/ZNxwLrWgBvEc+aXRF9HxapCLC0dyLqzzGio3z29Zb4XGOFCpcENsSbJh5k JPsZqua+KH5j7poZ2f3wC0+OS7dqBGhb9ot7eCXE7cvUbGpwAXSUaOC3DmE7PLto zgNIx1u8n6OvPPiayyw9AWNE8fpr5A82G8np7ThIKeMqX+TKFvx5ugbJEiu2X7rT 25cAQREa3+rZNZLhpV2HHggXCUK/Qo2NXIEeVNkOCzyy8Ev/3VRbpLoU8btpvMWr L8bG7cYGrFk+Dt+DEjr8j3fVImjgd5yFSrL6Vlcfm7lBpnVRA+yNXNWHzHXnMEt1 Wk8vLWkIqgsQRmJTv8xI =u3NS -----END PGP SIGNATURE----- --=_qYusH3OX4idVmNdR4PqZf1V--