From owner-freebsd-questions@freebsd.org Mon Mar 4 10:39:23 2019 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 95AF5150B58B for ; Mon, 4 Mar 2019 10:39:23 +0000 (UTC) (envelope-from run00er@gmail.com) Received: from mail-lj1-x22b.google.com (mail-lj1-x22b.google.com [IPv6:2a00:1450:4864:20::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 55D00923CC for ; Mon, 4 Mar 2019 10:39:22 +0000 (UTC) (envelope-from run00er@gmail.com) Received: by mail-lj1-x22b.google.com with SMTP id g80so3867058ljg.6 for ; Mon, 04 Mar 2019 02:39:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language; bh=3nMJM9AN4ZcVmn1dw5FhuAhpxf0Jc+8FAzwEaHxNqSY=; b=nOti5ynOzQngww3tGCFqlteKKHxx87l+0znLSZi8c5L9Rc3eUkq7wyIrAUIW+vp59r YfHUjDoPunSu5VQGILfSDxdNG1GyuxCaVEqxNA+NyfBnDCnR566bvJ8ZCy20kXHITPxq rzGx9JGKeNrCE1F174cpqiXQKk7GXtv/sNFUuRU3iudhCEe0MGo7/AdvjqtuDrHWL25o /yMmnfNWD9AxQOVF4iw5FHPznXSVF1m5YDVZMOGuO4jX6MnbSwRlfEuRu3FwUaLMiaWo V/stGa4IC2BmHpb3QiGYY+XVA7r6Q0HpA695isBCE8T+z5Dk2aiqoKX8QeXygoQWGkAG YlTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=3nMJM9AN4ZcVmn1dw5FhuAhpxf0Jc+8FAzwEaHxNqSY=; b=IMlZPO0Z1y8/tZDBNCVKOwDzkg75rt4bqrQciwAD9g3vC009x2heAW8WF77XcT3AoW 71ZnQchXYL1ptn7ozumRCO2aHPit1Y9scIlFpdvOF6QTu2IAQq6Z08Z7w/fNW2XJ2dyV KNTQeKI3ye+TAFmWsbTTLZ/j6aW9dpzCEUhe8FIN+IyoRW2AOIZuaEia5kzeUyvTy2qD YbY+fl8sd2zIjzOk+tsCEM9viTe0nLKK5WQf8BXm37bYNk8g1hqIH2Z9p91zGptyNrmJ I8BSLZAAt9pKQik8Q/EFKQTHKjDtjPJI/HhefJ5tfXngvekNj8BDfKlwbnn5HwK/9Fvx dl6A== X-Gm-Message-State: APjAAAUYNgCiWg3nC0gqSvB81odM8GWUyVhTZMMIe/qmCtttZYcgI3Bw RA9Xx34S3Wf11ofQJhEcEnJHRTJI X-Google-Smtp-Source: APXvYqxy8DF5os6vfdCfzN6w00FpyM3OowjVpu4qYBQAQKPPCuqaqdZQAr0DQ3rXjmXDiffmWKqYjw== X-Received: by 2002:a2e:47c4:: with SMTP id u187mr10281878lja.10.1551695960611; Mon, 04 Mar 2019 02:39:20 -0800 (PST) Received: from [10.0.0.59] (mail.pstu.edu. [193.111.156.142]) by smtp.gmail.com with ESMTPSA id v11sm1448530lfb.46.2019.03.04.02.39.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Mar 2019 02:39:20 -0800 (PST) Subject: Re: ZFS deadlock with virtio To: freebsd-questions@freebsd.org References: <20190215113423.01edabe9.ole@free.de> <20190219101717.61526ab1.ole@free.de> <20190304104422.443a8c20.ole@free.de> From: Runer Message-ID: <029a4f4f-ab7c-137a-22f3-bdd9d906d7ba@gmail.com> Date: Mon, 4 Mar 2019 12:39:18 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 In-Reply-To: <20190304104422.443a8c20.ole@free.de> Content-Language: ru X-Rspamd-Queue-Id: 55D00923CC X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=nOti5ynO; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of run00er@gmail.com designates 2a00:1450:4864:20::22b as permitted sender) smtp.mailfrom=run00er@gmail.com X-Spamd-Result: default: False [-6.61 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; FREEMAIL_FROM(0.00)[gmail.com]; TO_DN_NONE(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[gmail.com:+]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; NEURAL_HAM_SHORT(-0.85)[-0.848,0]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+,1:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-questions@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; IP_SCORE(-2.76)[ip: (-9.39), ipnet: 2a00:1450::/32(-2.29), asn: 15169(-2.03), country: US(-0.07)]; RCVD_IN_DNSWL_NONE(0.00)[b.2.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org : 127.0.5.0] Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 10:39:24 -0000 Most likely you are right! I noticed the same bhyve behavior with zfs. My searches led Me here to these links: https://smartos.org/bugview/OS-7300 https://smartos.org/bugview/OS-7314 https://smartos.org/bugview/OS-6912 Most likely Illumos should roll out the patches. But when these changes fall into the FreeBsd branch, I could not understand. Good luck! 04.03.2019 11:44, Ole пишет: > Hello, > > I have done some investigations. I think that there a two different > problems, so lets focus on the bhyve VM. I can now reproduce the > behaviour very well. It seems to be connected to the virtio disks. > > The disk stack is: > > Geli-encryption > Zpool (mirror) > Zvol > virtio > Zpool > > - Hostsystem is FreeBSD 11.2 > - VM is FreeBSD 12.0 (VM-Raw image + additional disk for zpool) > - VM is controlled by vm-bhyve > - inside the VM there are 5 to 10 running jails (managed with iocage) > > If I start the Bhyve VM and let the Backups run (~10 operations per > hour) the Zpool inside the VM will crash after 1 to 2 days. > > If I change the Disk from irtio-blk to ahci-hd, the VM keeps stable. > > regards > Ole > > Tue, 19 Feb 2019 10:17:17 +0100 - Ole : > >> Hi, >> >> ok now I got a again unkillable ZFS process. It is only one 'zfs send' >> command. Any Idea how to kill this process without powering off the >> machine? >> >> oot@jails1:/usr/home/admin # ps aux | grep 'zfs send' >> root 17617 0.0 0.0 12944 3856 - Is Sat04 0:00.00 >> sudo zfs send -e -I >> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019- >> root 17618 0.0 0.0 12980 4036 - D Sat04 0:00.01 >> zfs send -e -I >> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019-02-16 >> root 19299 0.0 0.0 11320 2588 3 S+ 09:53 0:00.00 >> grep zfs send root@jails1:/usr/home/admin # kill -9 17618 >> root@jails1:/usr/home/admin # ps aux | grep 'zfs send' root >> 17617 0.0 0.0 12944 3856 - Is Sat04 0:00.00 sudo zfs >> send -e -I >> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019- >> root 17618 0.0 0.0 12980 4036 - D Sat04 0:00.01 >> zfs send -e -I >> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019-02-16 >> root 19304 0.0 0.0 11320 2588 3 S+ 09:53 0:00.00 >> grep zfs send >> >> It is a FreeBSD 12.0 VM-Image running in a Bhyve VM. There is basicly >> only py36-iocage installed, and there are 7 running Jails. >> >> There is 30G RAM and sysctl vfs.zfs.arc_max ist set to 20G. It seems >> that the whole zpool is in some kind of deadlock. All Jails are >> crashed, unkillable and I can not run any command inside. >> >> regards >> Ole >> >> >> Fri, 15 Feb 2019 11:34:23 +0100 - Ole : >> >>> Hi, >>> >>> I observed that FreeBSD Systems with ZFS will run into a deadlock if >>> there are many parallel zfs send/receive/snapshot processes. >>> >>> I observed this on bare metal and virtual machines with FreeBSD 11.2 >>> and 12.0. With RAM from 20 to 64G. >>> >>> If the system is also on ZFS the whole system crashes. With only >>> jails on ZFS they freeze, but the Host system stays stable. But you >>> can't kill -9 the zfs processes. Only a poweroff stops the machine. >>> >>> On a FreeBSD 12.0 VM (bhyve), 30G RAM, 5 CPUs, about 30 zfs >>> operations, mostly send and receive will crash the system. >>> >>> There is no heavy load on the machine: >>> >>> # top | head -8 >>> last pid: 91503; load averages: 0.34, 0.31, 0.29 up 0+22:50:47 >>> 11:24:00 536 processes: 1 running, 529 sleeping, 6 zombie >>> CPU: 0.9% user, 0.0% nice, 1.5% system, 0.2% interrupt, 97.4% >>> idle Mem: 165M Active, 872M Inact, 19G Wired, 264M Buf, 9309M Free >>> ARC: 11G Total, 2450M MFU, 7031M MRU, 216M Anon, 174M Header, 1029M >>> Other 8423M Compressed, 15G Uncompressed, 1.88:1 Ratio >>> Swap: 1024M Total, 1024M Free >>> >>> I wonder if this is a BUG or normal behaviour. I could live with a >>> limited amount of parallel ZFS operation, but I don't want the whole >>> system to crash. >>> >>> Reducing the vfs.zfs.arc_max wont help. >>> >>> Any Idea to handle with this? >>> >>> regards >>> Ole