Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Oct 2017 21:37:13 +0200
From:      Harry Schmalzbauer <freebsd@omnilan.de>
To:        freebsd-stable@freebsd.org, FreeBSD virtualization <freebsd-virtualization@freebsd.org>
Subject:   bhyve ppt usage can cause severe RAM corruption [Was: Re: panic: Memory modified after free in zio_create, passthru in use]
Message-ID:  <59DE72E9.1050006@omnilan.de>
In-Reply-To: <593D1D5C.907@omnilan.de>
References:  <59369A15.2010901@omnilan.de> <593D1D5C.907@omnilan.de>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------090102070804030709070902
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit

 Bezüglich Harry Schmalzbauer's Nachricht vom 11.06.2017 12:37 (localtime):
>  Bezüglich Harry Schmalzbauer's Nachricht vom 06.06.2017 14:03 (localtime):
>>  Hello,
>>
>> suddenly, I'm getting this error:
>> /lib/libc.so.7: Undefined symbol "xdr_accepted_reply"
>>
>> Very mysterious: It showed up on a running system, which worked
>> flawlessly for some hours. And that host has root-fs (/) mounted
>> readonly from a memorydisk. So to my understanding, it's completely
>> impossible that /lib/libc.so.7 is corrupted since last boot.
>>
>> I'm completely out of ideas what could cause this strange error during
>> "normal" operation.
>>
>> Normal operation in this case is serving as a bhyve test machine.
>> I first noticed that error after one guest - with passthru device
>> attached - was shut down.
>>
>> My suspicion is some undiscovered passthru interference... Since I
>> noticed one other _very_ strange passthru-effect:
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=215740
> Hello,
>
> this time I caught a panic with a debuging kernel under 11.1-BETA1,
> which again occured after shuting down a VM which had ppt in use:
>
…
> Please, can anybody of the xperts add a comment?

It turned out that it's a problem with PCIe cards which don't support
FLR or cards, which are not PCIe, even if they have FLR capabilitiy.

jhb@ helped me to diagnose this.

Unfortunately I once forgot to manually bring down the passthrough-nics
in question, which resulted in a completely destroyed ZFS pool.
That hurted, so I won't rely on manual intervention before shutting down
(I had to recreate the complete (system) pool).
Unfortunately my skills don't allow me to help fixing the root cause, so
I created a little rc(8) script, which should protect reliably.
Please see also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222937

Since it's quite small overhead, I'll also attach it here (to be copied
to /etc/rc.d).

-harry


--------------090102070804030709070902
Content-Type: text/plain;
 name="pciptdetach"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="pciptdetach"

#!/bin/sh
#

# PROVIDE: pciptdetach
# REQUIRE: swap
# BEFORE: devd
# KEYWORD: shutdown

. /etc/rc.subr

name=pciptdetach
rcvar=pciptdetach_enable

load_rc_config ${name}

: ${pciptdetach_enable:="YES"}

start_cmd="true"
stop_cmd="${name}"

pciptdetach()
{
	sysctl -n hw.hv_vendor | grep -q bhyve || return 0

	echo "Disabling passthrough adapters:"

        pptcandidate=`pciconf -l | grep -v -E \
	  "^([[:blank:]]|hostb|virtio|isab)[^@]+" | sed -n -E \
	 's/^[[:blank:]]*(^[[:alnum:]]+)@([^[:blank:]]+)(:[[:blank:]]).*$/\2/p'`

	for pcidev in ${pptcandidate}; do

        	drv_class=`pciconf -lv | grep -A 3 "@${pcidev}" | sed -n -E -e \
	 's/^[[:blank:]]*class[[:blank:]]+=[[:blank:]]+([^[:blank:]].*)$/\1/p' \
				-e 's/^([[:alnum:]]+)@.*$/\1/p' | tr '\n' ' '`

		# Don't disable mass storage devices, might be busy for shutdown
		[ X"${drv_class}" = X"${drv_class%mass storage*}" ] || continue

		# Make sure network adapters don't have active vlan(4) clones.
		if [ -z "${netstoped}" ] &&
				[ X"${drv_class}" != X"${drv_class%network*}" ]
		then
			/etc/rc.d/netif stop >/dev/null 2>&1 && netstoped=y
		fi

		# Non-PCIe devices and PCIe devices without FLR support are
		# known to cause RAM corruption.
		if ! pciconf -lc ${pcidev} | grep -A 20 PCI-Express |
							grep -q "[[:blank:]]FLR"
		then
		    devctl disable ${pcidev} >/dev/null 2>&1 ||
			echo " ${drv_class%% *}:FAILED"
		fi

	done
}

run_rc_command "$1"

--------------090102070804030709070902--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?59DE72E9.1050006>