From owner-freebsd-stable@freebsd.org Wed Dec 9 01:45:24 2020 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 7A6FE4B254A for ; Wed, 9 Dec 2020 01:45:24 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Received: from uucp.dinoex.sub.de (uucp.dinoex.sub.de [IPv6:2001:1440:5001:1::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "uucp.dinoex.sub.de", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CrKch1Ycyz3hfv; Wed, 9 Dec 2020 01:45:23 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Received: from uucp.dinoex.sub.de (uucp.dinoex.org [185.220.148.12]) by uucp.dinoex.org (8.16.0.50/8.16.0.50) with ESMTPS id 0B91j4g1039246 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Wed, 9 Dec 2020 02:45:05 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) X-Authentication-Warning: uucp.dinoex.sub.de: Host uucp.dinoex.org [185.220.148.12] claimed to be uucp.dinoex.sub.de Received: (from uucp@localhost) by uucp.dinoex.sub.de (8.16.0.50/8.16.0.50/Submit) with UUCP id 0B91j4Xt039243; Wed, 9 Dec 2020 02:45:04 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: from gate.oper.dinoex.org (gate-e [192.168.98.2]) by citylink.dinoex.sub.de (8.16.1/8.16.1) with ESMTP id 0B91YKni035114; Wed, 9 Dec 2020 02:34:20 +0100 (CET) (envelope-from peter@gate.oper.dinoex.org) Received: from gate.oper.dinoex.org (gate-e [192.168.98.2]) by gate.oper.dinoex.org (8.16.1/8.16.1) with ESMTP id 0B91VaK4034715; Wed, 9 Dec 2020 02:31:37 +0100 (CET) (envelope-from peter@gate.oper.dinoex.org) Received: (from peter@localhost) by gate.oper.dinoex.org (8.16.1/8.16.1/Submit) id 0B91VaoZ034714; Wed, 9 Dec 2020 02:31:36 +0100 (CET) (envelope-from peter) Date: Wed, 9 Dec 2020 02:31:36 +0100 From: Peter Sender: li-fbsd@citylink.dinoex.sub.org To: Kristof Provost Cc: freebsd-stable@freebsd.org Subject: Re: Panic: 12.2 fails to use VIMAGE jails Message-ID: References: <20201207125451.GA11406@gate.oper.dinoex.org> <39DBEA53-960F-4D70-86D7-847E6DFA437D@FreeBSD.org> <20201207233449.GA11025@gate.oper.dinoex.org> <1AAE98C9-ADF9-4869-B863-601542CEBB67@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <1AAE98C9-ADF9-4869-B863-601542CEBB67@FreeBSD.org> X-Milter: Spamilter (Reciever: uucp.dinoex.sub.de; Sender-ip: 185.220.148.12; Sender-helo: uucp.dinoex.sub.de; ) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (uucp.dinoex.org [185.220.148.12]); Wed, 09 Dec 2020 02:45:08 +0100 (CET) X-Rspamd-Queue-Id: 4CrKch1Ycyz3hfv X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Dec 2020 01:45:24 -0000 On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote: ! > Sorry for the bad news. ! >=20 ! You appear to be triggering two or three different bugs there. That is possible. Then there are two or three different bugs in the production code. In any case, my current workaround, i.e. delaying in the exec.poststop > exec.poststop =3D " > sleep 6 ; > /usr/sbin/ngctl shutdown ${ifname1l}: ; > "; helps for it all and makes the system behave solid. This is true with and without Your patch. ! Can you reduce your netgraph use case to a small test case that can trigg= er ! the problem? I'm sorry, I fear I don't get Your point. Assumed there are actually two or three bugs here, You are asking me to reduce config so that it will trigger only one of them? Is that correct? Then let me put this different: assuming this is the OS for the life support system of the manned Jupiter mission. Then, which one of the bugs do You want to get fixed, and which would You prefer to keep and make Your oxygen supply cut off? https://www.youtube.com/watch?v=3DBEo2g-w545A ! I=E2=80=99m not likely to be able to do anything unless I can reproduce ! the problem(s). I understand that. =46rom Your former mail I get the impression that you prefer to rely on tests. I consider this a bad habit[1] and prefer logical thinking. So lets try that: We know that there is a problem with taking down an interface from a VIMAGE, in the way it is done by "jail -r". We know this problem can be solidly workarounded by delaying the interface takedown for a short time. Now with Your patch, we do not get the typical crash at interface takedown. Instead, all of a sudden, there are strange crashes from various other places. And, interestingly, we get these also when STARTING a jail. I think this is not an additional problem, it is instead a valuable information (albeit not the one You might like to get). Furthermore, we get these new crashes always invoked by "ifconfig", and they seem to have in common that somebody tries to obtain information about some interface configuration and receives some bogus. I might conclude, just out of the belly without looking into details, that either - your patch achieves to garble some internal interface data, instead of what it is intended to do, or - the original problem manages to garble internal interface data (leading to the usual crash), and Your patch does not achieve to solve this, but only protects from the immediate consequence. It might also be worth consideration, that, while the problem may be more easy to reproduce with epair, this effect may or may not be a netgraph specific one[2]. Now lets keep in mind that a successful test means EXACTLY NOTHING. By which other means can we confirm that Your patch fully achieves what it is intended for? (E.g. something like dumping and verifying the respective internal tables in-vivo) (Background: It is not that I would be unwilling to create clean and precisely reproducible scenarious, But, one of my problems is currently, I only have two machines availabe: the graphical one where I'm just typing, and the backend server with the jails that does practically everything. Therefore, experimenting on any of them creates considerable pain. I'm working on that issue, trying to get a real server board for the backend so to get the current one free for testing - but what I would like to use, e.g. ASUS Z10PE+cores+regECC, is not something one would easily find on yardsales - and seldom for an acceptable price.) cheerio, PMc [1] Rationale: a failing test tells us that either the test or the application has a bug (50/50 chance). A succeeding test tells us that 1 equals 1, which we knew already before. In fact, tests tell us *nothing at all* about the state of our code, and specifically, 'successful' outcomes do NOT mean that things are all correct. The only true usefulness of tests is to protect against re-introducing a fault that was already fixed before, i.e. regressions. [2] My netgraph configuration consists of bringing up some bridges and then attaching the jails to them. Here is the bridge starter (only respective component, there are more of these populated, but probably not influencing the issue): ------------------------------------------------ #! /bin/sh # PROVIDE: netgraphs # REQUIRE: netwait # BEFORE: NETWORKING =2E /etc/rc.subr name=3D"netgraphs" start_cmd=3D"${name}_start" stop_cmd=3D"${name}_stop" load_rc_config $name netgraphs_graphs=3D"svc" netgraphs_svc_if1_name=3D"nge_svc_1u" netgraphs_svc_if1_mac=3D"00:1d:92:01:02:01" netgraphs_svc_if1_addr=3D"***.***.***.***/29" netgraphs_svc_start() { local _ifname if ngctl info svcswitch: > /dev/null 2>&1; then netgraphs_svc_stop fi =20 echo "Creating SVC Switch" ngctl -f - <