Date: Mon, 4 Mar 2019 10:44:22 +0100 From: Ole <ole@free.de> To: freebsd-questions@freebsd.org Subject: Re: ZFS deadlock with virtio (was: ZFS deadlock on parallel ZFS operations FreeBSD 11.2 and 12.0) Message-ID: <20190304104422.443a8c20.ole@free.de> In-Reply-To: <20190219101717.61526ab1.ole@free.de> References: <20190215113423.01edabe9.ole@free.de> <20190219101717.61526ab1.ole@free.de>
next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/DCOjDCL5C54O7HpcW38p1mj Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Hello, I have done some investigations. I think that there a two different problems, so lets focus on the bhyve VM. I can now reproduce the behaviour very well. It seems to be connected to the virtio disks. The disk stack is: Geli-encryption Zpool (mirror) Zvol virtio Zpool - Hostsystem is FreeBSD 11.2 - VM is FreeBSD 12.0 (VM-Raw image + additional disk for zpool) - VM is controlled by vm-bhyve - inside the VM there are 5 to 10 running jails (managed with iocage) If I start the Bhyve VM and let the Backups run (~10 operations per hour) the Zpool inside the VM will crash after 1 to 2 days. If I change the Disk from irtio-blk to ahci-hd, the VM keeps stable. regards Ole Tue, 19 Feb 2019 10:17:17 +0100 - Ole <ole@free.de>: > Hi, >=20 > ok now I got a again unkillable ZFS process. It is only one 'zfs send' > command. Any Idea how to kill this process without powering off the > machine? >=20 > oot@jails1:/usr/home/admin # ps aux | grep 'zfs send' > root 17617 0.0 0.0 12944 3856 - Is Sat04 0:00.00 > sudo zfs send -e -I > cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019- > root 17618 0.0 0.0 12980 4036 - D Sat04 0:00.01 > zfs send -e -I > cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019-02-16 > root 19299 0.0 0.0 11320 2588 3 S+ 09:53 0:00.00 > grep zfs send root@jails1:/usr/home/admin # kill -9 17618 > root@jails1:/usr/home/admin # ps aux | grep 'zfs send' root > 17617 0.0 0.0 12944 3856 - Is Sat04 0:00.00 sudo zfs > send -e -I > cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019- > root 17618 0.0 0.0 12980 4036 - D Sat04 0:00.01 > zfs send -e -I > cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019-02-16 > root 19304 0.0 0.0 11320 2588 3 S+ 09:53 0:00.00 > grep zfs send >=20 > It is a FreeBSD 12.0 VM-Image running in a Bhyve VM. There is basicly > only py36-iocage installed, and there are 7 running Jails.=20 >=20 > There is 30G RAM and sysctl vfs.zfs.arc_max ist set to 20G. It seems > that the whole zpool is in some kind of deadlock. All Jails are > crashed, unkillable and I can not run any command inside.=20 >=20 > regards > Ole >=20 >=20 > Fri, 15 Feb 2019 11:34:23 +0100 - Ole <ole@free.de>: >=20 > > Hi, > >=20 > > I observed that FreeBSD Systems with ZFS will run into a deadlock if > > there are many parallel zfs send/receive/snapshot processes. > >=20 > > I observed this on bare metal and virtual machines with FreeBSD 11.2 > > and 12.0. With RAM from 20 to 64G. > >=20 > > If the system is also on ZFS the whole system crashes. With only > > jails on ZFS they freeze, but the Host system stays stable. But you > > can't kill -9 the zfs processes. Only a poweroff stops the machine. > >=20 > > On a FreeBSD 12.0 VM (bhyve), 30G RAM, 5 CPUs, about 30 zfs > > operations, mostly send and receive will crash the system. > >=20 > > There is no heavy load on the machine: > >=20 > > # top | head -8 > > last pid: 91503; load averages: 0.34, 0.31, 0.29 up 0+22:50:47 > > 11:24:00 536 processes: 1 running, 529 sleeping, 6 zombie > > CPU: 0.9% user, 0.0% nice, 1.5% system, 0.2% interrupt, 97.4% > > idle Mem: 165M Active, 872M Inact, 19G Wired, 264M Buf, 9309M Free > > ARC: 11G Total, 2450M MFU, 7031M MRU, 216M Anon, 174M Header, 1029M > > Other 8423M Compressed, 15G Uncompressed, 1.88:1 Ratio > > Swap: 1024M Total, 1024M Free > >=20 > > I wonder if this is a BUG or normal behaviour. I could live with a > > limited amount of parallel ZFS operation, but I don't want the whole > > system to crash.=20 > >=20 > > Reducing the vfs.zfs.arc_max wont help. > >=20 > > Any Idea to handle with this? > >=20 > > regards > > Ole =20 --Sig_/DCOjDCL5C54O7HpcW38p1mj Content-Type: application/pgp-signature Content-Description: Digitale Signatur von OpenPGP -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEE60BGd7KVfL83NXCUJZaRRqjklFAFAlx883YACgkQJZaRRqjk lFDTxQ//b2xW+wqwrX3j94xGl2UEFgVDoRTuS974IJKIjZhmwTeSJlFn/UGf/C9D stY5LZugHeB3gORzg38xygwaHp33km+v69ENXl3ACfCj/Tl6dALJT/K4UtEjbiaH qrHkMj27wM1cjJyDnFX9bqejALca+66AMtsOStvFo2Ukv8SOJ52zgLPBsb55QY6P z7oXWSkktFbI7k5sJBcfkFv6Z2bhz2B8LPsQQQaFcQSoU4t6I9FwM9e7oMbE6SvP 0GX0m7kJwcNYtRd+cg/3BEXTF8ZYmYOLEqbrW1NSbWOg/aQn3DUkq1jU2m1X1qgs KTCWVIBXqxmpvKlTobje4u4ZbOJEr4HsTqaF0OQzs6f2lE+ZXiZcjA2W62JN8iTi HpWlpqlrdZmb47eJcm51eXQkeYgqZouNleTwstVe3NAJqmDgk4GwZLHOIj+KT+E9 lkXm7dx3ffgaeZGYG5G/wzLcYqBx7mvnaOZqM7/6zHe5pigUNIdo6QZ+YopP5Wj6 PwA+en5V0WYfM+8MhRMTh/dd5hNBZewODTBivefFIIWxbr9TdQINlWv77bavBi5C emRKruURHDbnkPmZA8zCaTYS6SFSQQNTBy6leynflntXdhcc1Di0sdQTeELmmF1x FI6uK/OgkHkIgkQkaOFnqFBQ8G1M5NWgQ5I4WpgEGQAmM0isxoM= =b47W -----END PGP SIGNATURE----- --Sig_/DCOjDCL5C54O7HpcW38p1mj--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190304104422.443a8c20.ole>