Date: Mon, 4 Mar 2019 12:39:18 +0200 From: Runer <run00er@gmail.com> To: freebsd-questions@freebsd.org Subject: Re: ZFS deadlock with virtio Message-ID: <029a4f4f-ab7c-137a-22f3-bdd9d906d7ba@gmail.com> In-Reply-To: <20190304104422.443a8c20.ole@free.de> References: <20190215113423.01edabe9.ole@free.de> <20190219101717.61526ab1.ole@free.de> <20190304104422.443a8c20.ole@free.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Most likely you are right! I noticed the same bhyve behavior with zfs. My searches led Me here to these links: https://smartos.org/bugview/OS-7300 https://smartos.org/bugview/OS-7314 https://smartos.org/bugview/OS-6912 Most likely Illumos should roll out the patches. But when these changes fall into the FreeBsd branch, I could not understand. Good luck! 04.03.2019 11:44, Ole пишет: > Hello, > > I have done some investigations. I think that there a two different > problems, so lets focus on the bhyve VM. I can now reproduce the > behaviour very well. It seems to be connected to the virtio disks. > > The disk stack is: > > Geli-encryption > Zpool (mirror) > Zvol > virtio > Zpool > > - Hostsystem is FreeBSD 11.2 > - VM is FreeBSD 12.0 (VM-Raw image + additional disk for zpool) > - VM is controlled by vm-bhyve > - inside the VM there are 5 to 10 running jails (managed with iocage) > > If I start the Bhyve VM and let the Backups run (~10 operations per > hour) the Zpool inside the VM will crash after 1 to 2 days. > > If I change the Disk from irtio-blk to ahci-hd, the VM keeps stable. > > regards > Ole > > Tue, 19 Feb 2019 10:17:17 +0100 - Ole <ole@free.de>: > >> Hi, >> >> ok now I got a again unkillable ZFS process. It is only one 'zfs send' >> command. Any Idea how to kill this process without powering off the >> machine? >> >> oot@jails1:/usr/home/admin # ps aux | grep 'zfs send' >> root 17617 0.0 0.0 12944 3856 - Is Sat04 0:00.00 >> sudo zfs send -e -I >> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019- >> root 17618 0.0 0.0 12980 4036 - D Sat04 0:00.01 >> zfs send -e -I >> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019-02-16 >> root 19299 0.0 0.0 11320 2588 3 S+ 09:53 0:00.00 >> grep zfs send root@jails1:/usr/home/admin # kill -9 17618 >> root@jails1:/usr/home/admin # ps aux | grep 'zfs send' root >> 17617 0.0 0.0 12944 3856 - Is Sat04 0:00.00 sudo zfs >> send -e -I >> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019- >> root 17618 0.0 0.0 12980 4036 - D Sat04 0:00.01 >> zfs send -e -I >> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019-02-16 >> root 19304 0.0 0.0 11320 2588 3 S+ 09:53 0:00.00 >> grep zfs send >> >> It is a FreeBSD 12.0 VM-Image running in a Bhyve VM. There is basicly >> only py36-iocage installed, and there are 7 running Jails. >> >> There is 30G RAM and sysctl vfs.zfs.arc_max ist set to 20G. It seems >> that the whole zpool is in some kind of deadlock. All Jails are >> crashed, unkillable and I can not run any command inside. >> >> regards >> Ole >> >> >> Fri, 15 Feb 2019 11:34:23 +0100 - Ole <ole@free.de>: >> >>> Hi, >>> >>> I observed that FreeBSD Systems with ZFS will run into a deadlock if >>> there are many parallel zfs send/receive/snapshot processes. >>> >>> I observed this on bare metal and virtual machines with FreeBSD 11.2 >>> and 12.0. With RAM from 20 to 64G. >>> >>> If the system is also on ZFS the whole system crashes. With only >>> jails on ZFS they freeze, but the Host system stays stable. But you >>> can't kill -9 the zfs processes. Only a poweroff stops the machine. >>> >>> On a FreeBSD 12.0 VM (bhyve), 30G RAM, 5 CPUs, about 30 zfs >>> operations, mostly send and receive will crash the system. >>> >>> There is no heavy load on the machine: >>> >>> # top | head -8 >>> last pid: 91503; load averages: 0.34, 0.31, 0.29 up 0+22:50:47 >>> 11:24:00 536 processes: 1 running, 529 sleeping, 6 zombie >>> CPU: 0.9% user, 0.0% nice, 1.5% system, 0.2% interrupt, 97.4% >>> idle Mem: 165M Active, 872M Inact, 19G Wired, 264M Buf, 9309M Free >>> ARC: 11G Total, 2450M MFU, 7031M MRU, 216M Anon, 174M Header, 1029M >>> Other 8423M Compressed, 15G Uncompressed, 1.88:1 Ratio >>> Swap: 1024M Total, 1024M Free >>> >>> I wonder if this is a BUG or normal behaviour. I could live with a >>> limited amount of parallel ZFS operation, but I don't want the whole >>> system to crash. >>> >>> Reducing the vfs.zfs.arc_max wont help. >>> >>> Any Idea to handle with this? >>> >>> regards >>> Ole
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?029a4f4f-ab7c-137a-22f3-bdd9d906d7ba>
