From owner-freebsd-stable@freebsd.org Wed Oct 11 19:37:18 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0258AE350BE; Wed, 11 Oct 2017 19:37:18 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9444B74741; Wed, 11 Oct 2017 19:37:17 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (mh0.gentlemail.de [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v9BJbExb023061; Wed, 11 Oct 2017 21:37:14 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (s1.omnilan.de [217.91.127.234]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id B13F3268; Wed, 11 Oct 2017 21:37:13 +0200 (CEST) Message-ID: <59DE72E9.1050006@omnilan.de> Date: Wed, 11 Oct 2017 21:37:13 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: freebsd-stable@freebsd.org, FreeBSD virtualization Subject: bhyve ppt usage can cause severe RAM corruption [Was: Re: panic: Memory modified after free in zio_create, passthru in use] References: <59369A15.2010901@omnilan.de> <593D1D5C.907@omnilan.de> In-Reply-To: <593D1D5C.907@omnilan.de> Content-Type: multipart/mixed; boundary="------------090102070804030709070902" X-Greylist: ACL 129 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Wed, 11 Oct 2017 21:37:14 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Oct 2017 19:37:18 -0000 This is a multi-part message in MIME format. --------------090102070804030709070902 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Bezüglich Harry Schmalzbauer's Nachricht vom 11.06.2017 12:37 (localtime): > Bezüglich Harry Schmalzbauer's Nachricht vom 06.06.2017 14:03 (localtime): >> Hello, >> >> suddenly, I'm getting this error: >> /lib/libc.so.7: Undefined symbol "xdr_accepted_reply" >> >> Very mysterious: It showed up on a running system, which worked >> flawlessly for some hours. And that host has root-fs (/) mounted >> readonly from a memorydisk. So to my understanding, it's completely >> impossible that /lib/libc.so.7 is corrupted since last boot. >> >> I'm completely out of ideas what could cause this strange error during >> "normal" operation. >> >> Normal operation in this case is serving as a bhyve test machine. >> I first noticed that error after one guest - with passthru device >> attached - was shut down. >> >> My suspicion is some undiscovered passthru interference... Since I >> noticed one other _very_ strange passthru-effect: >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=215740 > Hello, > > this time I caught a panic with a debuging kernel under 11.1-BETA1, > which again occured after shuting down a VM which had ppt in use: > … > Please, can anybody of the xperts add a comment? It turned out that it's a problem with PCIe cards which don't support FLR or cards, which are not PCIe, even if they have FLR capabilitiy. jhb@ helped me to diagnose this. Unfortunately I once forgot to manually bring down the passthrough-nics in question, which resulted in a completely destroyed ZFS pool. That hurted, so I won't rely on manual intervention before shutting down (I had to recreate the complete (system) pool). Unfortunately my skills don't allow me to help fixing the root cause, so I created a little rc(8) script, which should protect reliably. Please see also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222937 Since it's quite small overhead, I'll also attach it here (to be copied to /etc/rc.d). -harry --------------090102070804030709070902 Content-Type: text/plain; name="pciptdetach" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="pciptdetach" #!/bin/sh # # PROVIDE: pciptdetach # REQUIRE: swap # BEFORE: devd # KEYWORD: shutdown . /etc/rc.subr name=pciptdetach rcvar=pciptdetach_enable load_rc_config ${name} : ${pciptdetach_enable:="YES"} start_cmd="true" stop_cmd="${name}" pciptdetach() { sysctl -n hw.hv_vendor | grep -q bhyve || return 0 echo "Disabling passthrough adapters:" pptcandidate=`pciconf -l | grep -v -E \ "^([[:blank:]]|hostb|virtio|isab)[^@]+" | sed -n -E \ 's/^[[:blank:]]*(^[[:alnum:]]+)@([^[:blank:]]+)(:[[:blank:]]).*$/\2/p'` for pcidev in ${pptcandidate}; do drv_class=`pciconf -lv | grep -A 3 "@${pcidev}" | sed -n -E -e \ 's/^[[:blank:]]*class[[:blank:]]+=[[:blank:]]+([^[:blank:]].*)$/\1/p' \ -e 's/^([[:alnum:]]+)@.*$/\1/p' | tr '\n' ' '` # Don't disable mass storage devices, might be busy for shutdown [ X"${drv_class}" = X"${drv_class%mass storage*}" ] || continue # Make sure network adapters don't have active vlan(4) clones. if [ -z "${netstoped}" ] && [ X"${drv_class}" != X"${drv_class%network*}" ] then /etc/rc.d/netif stop >/dev/null 2>&1 && netstoped=y fi # Non-PCIe devices and PCIe devices without FLR support are # known to cause RAM corruption. if ! pciconf -lc ${pcidev} | grep -A 20 PCI-Express | grep -q "[[:blank:]]FLR" then devctl disable ${pcidev} >/dev/null 2>&1 || echo " ${drv_class%% *}:FAILED" fi done } run_rc_command "$1" --------------090102070804030709070902--