Date: Sat, 09 Nov 2019 07:56:48 -0600 From: Scott Bennett <bennett@sdf.org> To: freebsd-stable@freebsd.org, eugen@grosbein.net Subject: Re: kernel bug in 11.3-STABLE causes frequent crashes Message-ID: <201911091356.xA9DumXl007459@sdf.org> In-Reply-To: <b85c2ab4-608d-22c2-b075-215d5c6a6d36@grosbein.net> References: <201911091245.xA9Cj1lo019826@sdf.org> <b85c2ab4-608d-22c2-b075-215d5c6a6d36@grosbein.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Eugene, Thank you very much for the fast reply! Eugene Grosbein <eugen@grosbein.net> wrote: > 09.11.2019 19:45, Scott Bennett ?????: > > The rest of this message was posted a little while ago to the > > freebsd-questions list by mistake. It was intended for freebsd-stable, > > so I am posting it here now after posting a brief apology on the other > > list. > > I have had to waste a great deal of time lately in recovering my > > system from crashes due to a kernel bug. At present, my system is > > > > FreeBSD hellas 11.3-STABLE FreeBSD 11.3-STABLE #12 r352571: Sat Sep 21 11:39:52 CDT 2019 bennett@hellas:/usr/obj/usr/src/sys/hellas amd64 > > > > There are actually at least two problems, but this particular one has been > > causing a large portion of my forced reboots. It usually fails to produce > > a dump and freezes right after the panic and backtrace messages, as it did > > earlier tonight, but Wednesday night it did create a dump, which I am > > keeping in case it should prove helpful in getting the bug identified and > > solved. I copied the console messages to paper painstakingly by hand. > > They appear to be identical each time, except, of course, for the messages > > that a dump is produced when, indeed, it does produce one. I am omitting > > those fairly standard messages. > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 2; apic id = 02 > > fault virtual address = 0x3b8 > > fault code = supervisor read data, page not present > > instruction pointer = 0x20:0xffffffff80a4b14c > > stack pointer = 0x0:0xfffffe012a60ea50 > > frame pointer = 0x0:0xfffffe012a60eae0 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags = interrupt enabled, resume, IOPL = 0 > > current process = 28 (flowcleaner) > > trap number = 12 > > panic: page fault > > cpuid = 2 > > KDB: stack backtrace: > > #0 0xffffffff80a94707 at kdb_backtrace+0x67 > > #1 0xffffffff80a4fa2e at vpanic+0x17e > > #2 0xffffffff80a4f8a3 at panic+0x43 > > #3 0xffffffff80f3a4d0 at trap_pfault+0 > > #4 0xffffffff80f3a519 at trap_pfault+0x49 > > #5 0xffffffff80f39bad at trap+0x29 > > #6 0xffffffff80f19f33 at calltrap+0x8 > > #7 0xffffffff80b3bb8d at flowtable_clean_vnet+0x43d > > #8 0xffffffff80b3c758 at flowtable_cleaner+0xc8 > > #9 0xffffffff80a12ea2 at fork_exit+0x82 > > #10 0xffffffff80flaf4e at fork_trampoline+0xe > > > > The machine is ancient. The CPU is a QX9650 (last group of Core 2 > > Quads) with 8 GB of DDR3 memory. > > If this can be identified as a known bug and a clue provided to a > > patch or a safer version to upgrade to, I would be grateful. I am getting > > very, very tired of these crashes. > > The other forced reboots I will describe in a separate message, but > > that problem has existed since the time of 11.2-RELEASE and apparently was > > never investigated, much less fixed, although people began complaining on > > this list and possibly -questions within the first few days after the > > release date. > > Thanks in advance for any help with this problem! > > It seems you have custom kernel with options FLOWTABLE. The code it includes > is known to be buggy, this options was removed from GENERIC many releases ago. > Remove it from your kernel configuration, rebuild kernel and you will be fine. > Wonderful. I have a comment on that line, saying I added it for 8.x, so I probably found it in 8.1's GENERIC configuration file when I was preparing to upgrade from 7.3. It is interesting that it only started hitting me (hard enough to make me notice it, at least) in 11.3 and maybe a bit earlier in 11.2. Anyway, that will be easy enough to fix, but will require rolling /usr/src back to the revision I am running, which is probably also no big deal. I don't seem to be able to build it at the current source revision because 11-STABLE's buildworld began failing during the libc build two or three weeks ago. I just tried "svn update /usr/src" again, followed by "make -j6 buildworld", and it still fails with this ending. --- libc_pic.a --- ranlib -D libc_pic.a --- libc.a --- ranlib -D libc.a --- libc.so.7.full --- cc: error: unable to execute command: posix_spawn failed: Permission denied cc: error: linker command failed with exit code 1 (use -v to see invocation) *** [libc.so.7.full] Error code 1 make[4]: stopped in /usr/src/lib/libc 1 error make[4]: stopped in /usr/src/lib/libc *** [lib/libc__L] Error code 2 make[3]: stopped in /usr/src 1 error make[3]: stopped in /usr/src *** [libraries] Error code 2 make[2]: stopped in /usr/src 1 error make[2]: stopped in /usr/src *** [_libraries] Error code 2 make[1]: stopped in /usr/src 1 error make[1]: stopped in /usr/src *** [buildworld] Error code 2 make: stopped in /usr/src 1 error make: stopped in /usr/src Oh, well. During the intervening weeks, I haven't seen any src updates that appear to have anything to do with fixing the virtual memory management bug(s) that is/are the other thing wasting my time. I'll start a separate thread for that, but first I want to do the rollback and get the buildworld started. Oh, wait a minute...ah, yes! I also have a snapshot of /usr/obj to the same revision, so I won't even need the buildworld, only the buildkernel. This should be quite quick then. Thanks a bundle for your help. Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * **********************************************************************
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201911091356.xA9DumXl007459>