From owner-freebsd-stable@freebsd.org Sat Nov 9 13:56:50 2019 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 8C7271AA923 for ; Sat, 9 Nov 2019 13:56:50 +0000 (UTC) (envelope-from bennett@sdf.org) Received: from mx.sdf.org (mx.sdf.org [205.166.94.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mx.sdf.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 479JbP2mjDz3Hsr for ; Sat, 9 Nov 2019 13:56:48 +0000 (UTC) (envelope-from bennett@sdf.org) Received: from sdf.org (IDENT:bennett@miku.sdf.org [205.166.94.6]) by mx.sdf.org (8.15.2/8.14.5) with ESMTPS id xA9Dumrl002855 (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256 bits) verified NO); Sat, 9 Nov 2019 13:56:48 GMT Received: (from bennett@localhost) by sdf.org (8.15.2/8.12.8/Submit) id xA9DumXl007459; Sat, 9 Nov 2019 07:56:48 -0600 (CST) From: Scott Bennett Message-Id: <201911091356.xA9DumXl007459@sdf.org> Date: Sat, 09 Nov 2019 07:56:48 -0600 To: freebsd-stable@freebsd.org, eugen@grosbein.net Subject: Re: kernel bug in 11.3-STABLE causes frequent crashes References: <201911091245.xA9Cj1lo019826@sdf.org> In-Reply-To: User-Agent: Heirloom mailx 12.5 6/20/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 479JbP2mjDz3Hsr X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of bennett@sdf.org has no SPF policy when checking 205.166.94.20) smtp.mailfrom=bennett@sdf.org X-Spamd-Result: default: False [-1.43 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.99)[-0.992,0]; FROM_HAS_DN(0.00)[]; IP_SCORE(-0.34)[ip: (-1.07), ipnet: 205.166.94.0/24(-0.53), asn: 14361(-0.03), country: US(-0.05)]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; DMARC_NA(0.00)[sdf.org]; AUTH_NA(1.00)[]; NEURAL_HAM_LONG(-1.00)[-0.998,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[20.94.166.205.list.dnswl.org : 127.0.10.0]; R_SPF_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:14361, ipnet:205.166.94.0/24, country:US]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Nov 2019 13:56:50 -0000 Eugene, Thank you very much for the fast reply! Eugene Grosbein wrote: > 09.11.2019 19:45, Scott Bennett ?????: > > The rest of this message was posted a little while ago to the > > freebsd-questions list by mistake. It was intended for freebsd-stable, > > so I am posting it here now after posting a brief apology on the other > > list. > > I have had to waste a great deal of time lately in recovering my > > system from crashes due to a kernel bug. At present, my system is > > > > FreeBSD hellas 11.3-STABLE FreeBSD 11.3-STABLE #12 r352571: Sat Sep 21 11:39:52 CDT 2019 bennett@hellas:/usr/obj/usr/src/sys/hellas amd64 > > > > There are actually at least two problems, but this particular one has been > > causing a large portion of my forced reboots. It usually fails to produce > > a dump and freezes right after the panic and backtrace messages, as it did > > earlier tonight, but Wednesday night it did create a dump, which I am > > keeping in case it should prove helpful in getting the bug identified and > > solved. I copied the console messages to paper painstakingly by hand. > > They appear to be identical each time, except, of course, for the messages > > that a dump is produced when, indeed, it does produce one. I am omitting > > those fairly standard messages. > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 2; apic id = 02 > > fault virtual address = 0x3b8 > > fault code = supervisor read data, page not present > > instruction pointer = 0x20:0xffffffff80a4b14c > > stack pointer = 0x0:0xfffffe012a60ea50 > > frame pointer = 0x0:0xfffffe012a60eae0 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags = interrupt enabled, resume, IOPL = 0 > > current process = 28 (flowcleaner) > > trap number = 12 > > panic: page fault > > cpuid = 2 > > KDB: stack backtrace: > > #0 0xffffffff80a94707 at kdb_backtrace+0x67 > > #1 0xffffffff80a4fa2e at vpanic+0x17e > > #2 0xffffffff80a4f8a3 at panic+0x43 > > #3 0xffffffff80f3a4d0 at trap_pfault+0 > > #4 0xffffffff80f3a519 at trap_pfault+0x49 > > #5 0xffffffff80f39bad at trap+0x29 > > #6 0xffffffff80f19f33 at calltrap+0x8 > > #7 0xffffffff80b3bb8d at flowtable_clean_vnet+0x43d > > #8 0xffffffff80b3c758 at flowtable_cleaner+0xc8 > > #9 0xffffffff80a12ea2 at fork_exit+0x82 > > #10 0xffffffff80flaf4e at fork_trampoline+0xe > > > > The machine is ancient. The CPU is a QX9650 (last group of Core 2 > > Quads) with 8 GB of DDR3 memory. > > If this can be identified as a known bug and a clue provided to a > > patch or a safer version to upgrade to, I would be grateful. I am getting > > very, very tired of these crashes. > > The other forced reboots I will describe in a separate message, but > > that problem has existed since the time of 11.2-RELEASE and apparently was > > never investigated, much less fixed, although people began complaining on > > this list and possibly -questions within the first few days after the > > release date. > > Thanks in advance for any help with this problem! > > It seems you have custom kernel with options FLOWTABLE. The code it includes > is known to be buggy, this options was removed from GENERIC many releases ago. > Remove it from your kernel configuration, rebuild kernel and you will be fine. > Wonderful. I have a comment on that line, saying I added it for 8.x, so I probably found it in 8.1's GENERIC configuration file when I was preparing to upgrade from 7.3. It is interesting that it only started hitting me (hard enough to make me notice it, at least) in 11.3 and maybe a bit earlier in 11.2. Anyway, that will be easy enough to fix, but will require rolling /usr/src back to the revision I am running, which is probably also no big deal. I don't seem to be able to build it at the current source revision because 11-STABLE's buildworld began failing during the libc build two or three weeks ago. I just tried "svn update /usr/src" again, followed by "make -j6 buildworld", and it still fails with this ending. --- libc_pic.a --- ranlib -D libc_pic.a --- libc.a --- ranlib -D libc.a --- libc.so.7.full --- cc: error: unable to execute command: posix_spawn failed: Permission denied cc: error: linker command failed with exit code 1 (use -v to see invocation) *** [libc.so.7.full] Error code 1 make[4]: stopped in /usr/src/lib/libc 1 error make[4]: stopped in /usr/src/lib/libc *** [lib/libc__L] Error code 2 make[3]: stopped in /usr/src 1 error make[3]: stopped in /usr/src *** [libraries] Error code 2 make[2]: stopped in /usr/src 1 error make[2]: stopped in /usr/src *** [_libraries] Error code 2 make[1]: stopped in /usr/src 1 error make[1]: stopped in /usr/src *** [buildworld] Error code 2 make: stopped in /usr/src 1 error make: stopped in /usr/src Oh, well. During the intervening weeks, I haven't seen any src updates that appear to have anything to do with fixing the virtual memory management bug(s) that is/are the other thing wasting my time. I'll start a separate thread for that, but first I want to do the rollback and get the buildworld started. Oh, wait a minute...ah, yes! I also have a snapshot of /usr/obj to the same revision, so I won't even need the buildworld, only the buildkernel. This should be quite quick then. Thanks a bundle for your help. Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * **********************************************************************