Date: Wed, 9 Dec 2020 02:31:36 +0100 From: Peter <pmc@citylink.dinoex.sub.org> To: Kristof Provost <kp@freebsd.org> Cc: freebsd-stable@freebsd.org Subject: Re: Panic: 12.2 fails to use VIMAGE jails Message-ID: <X9Ao%2BBKDXADds36A@gate.oper.dinoex.org> In-Reply-To: <1AAE98C9-ADF9-4869-B863-601542CEBB67@FreeBSD.org> References: <20201207125451.GA11406@gate.oper.dinoex.org> <39DBEA53-960F-4D70-86D7-847E6DFA437D@FreeBSD.org> <20201207233449.GA11025@gate.oper.dinoex.org> <DDDE7802-1C8C-4EB7-AA0C-DFCD7E5D2BAB@FreeBSD.org> <X8/Kr0td1cxI%2BP%2BV@gate.oper.dinoex.org> <1AAE98C9-ADF9-4869-B863-601542CEBB67@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote: ! > Sorry for the bad news. ! >=20 ! You appear to be triggering two or three different bugs there. That is possible. Then there are two or three different bugs in the production code. In any case, my current workaround, i.e. delaying in the exec.poststop > exec.poststop =3D " > sleep 6 ; > /usr/sbin/ngctl shutdown ${ifname1l}: ; > "; helps for it all and makes the system behave solid. This is true with and without Your patch. ! Can you reduce your netgraph use case to a small test case that can trigg= er ! the problem? I'm sorry, I fear I don't get Your point. Assumed there are actually two or three bugs here, You are asking me to reduce config so that it will trigger only one of them? Is that correct? Then let me put this different: assuming this is the OS for the life support system of the manned Jupiter mission. Then, which one of the bugs do You want to get fixed, and which would You prefer to keep and make Your oxygen supply cut off? https://www.youtube.com/watch?v=3DBEo2g-w545A ! I=E2=80=99m not likely to be able to do anything unless I can reproduce ! the problem(s). I understand that. =46rom Your former mail I get the impression that you prefer to rely on tests. I consider this a bad habit[1] and prefer logical thinking. So lets try that: We know that there is a problem with taking down an interface from a VIMAGE, in the way it is done by "jail -r". We know this problem can be solidly workarounded by delaying the interface takedown for a short time. Now with Your patch, we do not get the typical crash at interface takedown. Instead, all of a sudden, there are strange crashes from various other places. And, interestingly, we get these also when STARTING a jail. I think this is not an additional problem, it is instead a valuable information (albeit not the one You might like to get). Furthermore, we get these new crashes always invoked by "ifconfig", and they seem to have in common that somebody tries to obtain information about some interface configuration and receives some bogus. I might conclude, just out of the belly without looking into details, that either - your patch achieves to garble some internal interface data, instead of what it is intended to do, or - the original problem manages to garble internal interface data (leading to the usual crash), and Your patch does not achieve to solve this, but only protects from the immediate consequence. It might also be worth consideration, that, while the problem may be more easy to reproduce with epair, this effect may or may not be a netgraph specific one[2]. Now lets keep in mind that a successful test means EXACTLY NOTHING. By which other means can we confirm that Your patch fully achieves what it is intended for? (E.g. something like dumping and verifying the respective internal tables in-vivo) (Background: It is not that I would be unwilling to create clean and precisely reproducible scenarious, But, one of my problems is currently, I only have two machines availabe: the graphical one where I'm just typing, and the backend server with the jails that does practically everything. Therefore, experimenting on any of them creates considerable pain. I'm working on that issue, trying to get a real server board for the backend so to get the current one free for testing - but what I would like to use, e.g. ASUS Z10PE+cores+regECC, is not something one would easily find on yardsales - and seldom for an acceptable price.) cheerio, PMc [1] Rationale: a failing test tells us that either the test or the application has a bug (50/50 chance). A succeeding test tells us that 1 equals 1, which we knew already before. In fact, tests tell us *nothing at all* about the state of our code, and specifically, 'successful' outcomes do NOT mean that things are all correct. The only true usefulness of tests is to protect against re-introducing a fault that was already fixed before, i.e. regressions. [2] My netgraph configuration consists of bringing up some bridges and then attaching the jails to them. Here is the bridge starter (only respective component, there are more of these populated, but probably not influencing the issue): ------------------------------------------------ #! /bin/sh # PROVIDE: netgraphs # REQUIRE: netwait # BEFORE: NETWORKING =2E /etc/rc.subr name=3D"netgraphs" start_cmd=3D"${name}_start" stop_cmd=3D"${name}_stop" load_rc_config $name netgraphs_graphs=3D"svc" netgraphs_svc_if1_name=3D"nge_svc_1u" netgraphs_svc_if1_mac=3D"00:1d:92:01:02:01" netgraphs_svc_if1_addr=3D"***.***.***.***/29" netgraphs_svc_start() { local _ifname if ngctl info svcswitch: > /dev/null 2>&1; then netgraphs_svc_stop fi =20 echo "Creating SVC Switch" ngctl -f - <<EOF mkpeer bridge crhook link16 name .:crhook svcswitch mkpeer svcswitch: eiface link0 ether name svcswitch:link0 $netgraphs_svc_if1_name EOF _ifname=3D`ngctl msg ${netgraphs_svc_if1_name}: getifname | \ awk '$1 =3D=3D "Args:" { print substr($2, 2, length($2)-2)}= '` ifconfig $_ifname name $netgraphs_svc_if1_name ifconfig $netgraphs_svc_if1_name link $netgraphs_svc_if1_mac ifconfig $netgraphs_svc_if1_name inet $netgraphs_svc_if1_addr } netgraphs_svc_stop() { echo "Shutting down SVC switch" ngctl shutdown svcswitch: ngctl shutdown ${netgraphs_svc_if1_name}: } netgraphs_start() { local _cmd for i in "$@"; do eval _cmd=3Dnetgraphs_${i}_start if type $_cmd > /dev/null 2>&1; then $_cmd else echo "netgraphs-start: object $i not found" >&2 fi done } netgraphs_stop() { local _cmd for i in "$@"; do eval _cmd=3Dnetgraphs_${i}_stop if type $_cmd > /dev/null 2>&1; then $_cmd else echo "netgraphs-stop: object $i not found" >&2 fi done } netgraphs_tasks=3D"" if test $# -eq 1; then if test "$1" =3D "stop"; then for i in $netgraphs_graphs; do netgraphs_tasks=3D"$i $netgraphs_tasks" done else for i in $netgraphs_graphs; do netgraphs_tasks=3D"$netgraphs_tasks $i" done fi fi =20 run_rc_command "$@" "$netgraphs_tasks" ------------------------------------------------ And here is the full jail config (only respective jail: ------------------------------------------------ allow.set_hostname =3D "false"; allow.mount.procfs =3D "false"; allow.mount.devfs =3D "false"; allow.raw_sockets =3D "false"; enforce_statfs =3D 1; devfs_ruleset =3D 4; securelevel =3D 2; mount.devfs; exec.start =3D "/bin/sh /etc/rc"; exec.stop =3D "/bin/sh /etc/rc.shutdown"; exec.consolelog =3D "/var/log/jail_${name}_console.log"; path =3D "/j/$name"; interface =3D "lo0"; ip4.saddrsel =3D "false"; rail { jid =3D 10; devfs_ruleset =3D 11; host.hostname =3D "rail.***********.org"; vnet =3D "new"; sysvshm; $ifname1l =3D nge_${name}_1l; $ifname1l_mac =3D 00:1d:92:01:01:0a; vnet.interface =3D "$ifname1l"; exec.prestart =3D " echo -e \"mkpeer eiface crhook ether\nname .:crhook $ifname1l\"= \ | /usr/sbin/ngctl -f - /usr/sbin/ngctl connect ${ifname1l}: svcswitch: ether link2 ifname=3D`/usr/sbin/ngctl msg ${ifname1l}: getifname | \ awk '$1 =3D=3D \"Args:\" { print substr($2, 2, length($2)-2= )}'` /sbin/ifconfig \$ifname name $ifname1l /sbin/ifconfig $ifname1l link $ifname1l_mac "; exec.poststart =3D " /usr/sbin/jexec $name /sbin/sysctl kern.securelevel=3D3 ; "; exec.poststop =3D " # sleep 6 ; /usr/sbin/ngctl shutdown ${ifname1l}: ; "; exec.start =3D "/bin/sleep 4 &";=09 } ------------------------------------------------
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?X9Ao%2BBKDXADds36A>