Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Nov 2016 10:28:51 -0800
From:      Mark Johnston <markj@FreeBSD.org>
To:        "O. Hartmann" <ohartmann@walstatt.org>
Cc:        FreeBSD CURRENT <freebsd-current@freebsd.org>, glebius@FreeBSD.org, ae@FreeBSD.org
Subject:   Re: was: CURRENT [r308087] still crashing: Backtrace provided
Message-ID:  <20161108182829.GA62725@wkstn-mjohnston.west.isilon.com>
In-Reply-To: <20161106111356.39850d7e@thor.walstatt.dynvpn.de>
References:  <alpine.GSO.1.10.1610231515170.5272@multics.mit.edu> <20161029163336.46bb24c4.ohartman@zedat.fu-berlin.de> <20161030013345.GC67644@raichu> <20161030082525.6fb6d8a4.ohartman@zedat.fu-berlin.de> <20161030163934.GA49633@raichu> <20161030185500.64e57233.ohartman@zedat.fu-berlin.de> <20161030182509.GA1491@charmander> <20161105184509.28d162f1@thor.walstatt.dynvpn.de> <20161105203748.GD63972@wkstn-mjohnston.west.isilon.com> <20161106111356.39850d7e@thor.walstatt.dynvpn.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Nov 06, 2016 at 11:13:56AM +0100, O. Hartmann wrote:
> Yesterday, I ran the whole day (> 9 hours) without problems r307233 without the reported
> crash.
> 
> Today's morning I got brave and tried r307234 - and had a crash within an hour.

Thanks for confirming - I cc'ed glebius@ and ae@, who can provide more
insight than me. I was just trying to narrow down the problem to a
specific commit.

> 
> > 
> > > 
> > > Attached, you'll find the backtrace report as last time. I had to type in "dump"
> > > blindly on the system as a dark screen or a stuck X11 screen blocked the console (I
> > > use vt() and nVidia BLOB with my nVidia GPUs - and this is still broken on FBSD).
> > > 
> > > Please let me know how I can assist further. I saved both the core AND this time the
> > > culprit kernel.  
> > 
> > Great, thank you. I would first like to confirm that r307234 is indeed
> > causing the crash - since it appears to be easy to trigger, that should
> > be faster. If not, the core will help track down the real problem.
> 
> Although I was under the impression the in-kernel-config option
> 
> makeoptions    DEBUG=-g
> 
> would make debugging symbols available, I'm proved wrong.
> 
> I tried also on 
> 
> FreeBSD 12.0-CURRENT #15 r308329: Sat Nov  5 08:52:24 CET 2016
>  
> and crashed, from which I picked up kernel and vmcore as well as
> the text of the backtrace as provided in an earlier mail (see below at [core.txt.0], and
> if I perform this suggested command sequence:
> 
> ohartmann@thor [kernel_debug]: kgdb ./kernel vmcore.0 
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...
> Attempt to extract a component of a value that is not a structure pointer.
> Attempt to extract a component of a value that is not a structure pointer.
> #0  0xffffffff807b8d83 in doadump ()
> (kgdb) frame 12
> #12 0xffffffff80923a74 in ip_output ()
> (kgdb) p *ifp
> No symbol table is loaded.  Use the "file" command.
> (kgdb) p *ro
> No symbol table is loaded.  Use the "file" command.
> (kgdb)
> 
> Again, I'm doing this kind of debugging the very first time and I miss something here,
> apologizes for that.

Hm, I'm not sure what the problem is. When a kernel is installed and
WITHOUT_KERNEL_SYMBOLS is not set in src.conf, debug symbols should
automatically be installed to /usr/lib/debug/boot/kernel.

> 
> Sorry about the redundancy.
> 
> The curious thing to me is that this bug is triggered on systems with Intel CPU
> architectures older or equal than IvyBridge. The very same /etc/make.conf
> and /etc/src.conf as well as the very same kernel config apart from some local hardware
> dependend modifications are spread around my servers and workstations and especially my
> bureau's box is a sHaswell XEON with almost the exact same confict running on CURRENT
> (recent as of Thursday) without problems while the box I'm reporting this error from is
> crashing (i3-3220, the server, also crashing here, is a E3-1245 V2. Another crashing
> system is a 2009 C2D XEON 5XXX, two socket server, crashing the same way, but with a
> different kernel config.
> I tried on the crashing systems with GENERIC as well with the same results.
> 
> I'm using IPFW as the firewall on all systems.
> 
> Please tell me if you revert some commits, I'll then checkout the sources up to recent
> CURRENT and try again.
> 
> This just for addition and completion.
> 
> 
> Kind regards and thanks in advance,
> 
> Oliver
> 
> [...]
> [core.txt.0]
> ...
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 0; apic id = 00
> instruction pointer     = 0x20:0xffffffff807b44fb
> stack pointer           = 0x28:0xfffffe0238f7c290
> frame pointer           = 0x28:0xfffffe0238f7c310
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 521 (nslcd)
> 
> Reading symbols from /boot/modules/nvidia-modeset.ko...done.
> Loaded symbols for /boot/modules/nvidia-modeset.ko
> Reading symbols from /boot/modules/nvidia.ko...done.
> Loaded symbols for /boot/modules/nvidia.ko
> #0  doadump (textdump=0) at pcpu.h:222
> 222     pcpu.h: No such file or directory.
>         in pcpu.h
> (kgdb) #0  doadump (textdump=0) at pcpu.h:222
> #1  0xffffffff8049e1eb in db_dump (dummy=<value optimized out>, dummy2=false, 
>     dummy3=0, dummy4=0x0) at /usr/src/sys/ddb/db_command.c:546
> #2  0xffffffff8049dfe9 in db_command (cmd_table=<value optimized out>)
>     at /usr/src/sys/ddb/db_command.c:453
> #3  0xffffffff8049dd44 in db_command_loop ()
>     at /usr/src/sys/ddb/db_command.c:506
> #4  0xffffffff804a11af in db_trap (type=<value optimized out>, 
>     code=<value optimized out>) at /usr/src/sys/ddb/db_main.c:248
> #5  0xffffffff807fd3e3 in kdb_trap (type=<value optimized out>, 
>     code=<value optimized out>, tf=<value optimized out>)
>     at /usr/src/sys/kern/subr_kdb.c:654
> #6  0xffffffff80afeaf1 in trap_fatal (frame=0xfffffe0238f7c1d0, eva=0)
>     at /usr/src/sys/amd64/amd64/trap.c:796
> #7  0xffffffff80afe7df in trap (frame=0xfffffe0238f7c1d0)
>     at /usr/src/sys/amd64/amd64/trap.c:198
> #8  0xffffffff80adf4a1 in calltrap ()
>     at /usr/src/sys/amd64/amd64/exception.S:236
> #9  0xffffffff807b44fb in __rw_wlock_hard (c=<value optimized out>, 
>     tid=<value optimized out>, file=<value optimized out>, 
>     line=<value optimized out>) at /usr/src/sys/kern/kern_rwlock.c:830
> #10 0xffffffff807b437c in _rw_wlock_cookie (c=0xfffff80070538310, 
>     file=0xffffffff80ca31b2 "/usr/src/sys/net/if_ethersubr.c", line=304)
>     at /usr/src/sys/kern/kern_rwlock.c:296
> #11 0xffffffff808d1e07 in ether_output (ifp=0xfffff800036e7800, 
>     m=<value optimized out>, dst=0xfffff8003d980e60, ro=0xfffff8003d980e40)
>     at /usr/src/sys/net/if_ethersubr.c:304
> #12 0xffffffff80923a74 in ip_output (m=0xfffff8000a24a500, 
>     opt=<value optimized out>, ro=<value optimized out>, flags=0, imo=0x0, 
>     inp=<value optimized out>) at /usr/src/sys/netinet/ip_output.c:664
> #13 0xffffffff8099a7ee in tcp_output (tp=<value optimized out>)
>     at /usr/src/sys/netinet/tcp_output.c:1432
> #14 0xffffffff809a7c88 in tcp_usr_send (so=<value optimized out>, 
>     flags=<value optimized out>, m=0xfffff8003d837800, nam=0x0, 
>     control=<value optimized out>, td=0xfffff8000a24a500)
>     at /usr/src/sys/netinet/tcp_usrreq.c:956
> #15 0xffffffff808567b4 in sosend_generic (so=<value optimized out>, 
>     addr=<value optimized out>, uio=<value optimized out>, 
>     top=0xfffff8003d837800, control=<value optimized out>, 
>     flags=<value optimized out>, td=<value optimized out>)
>     at /usr/src/sys/kern/uipc_socket.c:1359
> #16 0xffffffff8082d672 in soo_write (fp=<value optimized out>, 
>     uio=0xfffffe0238f7c900, active_cred=<value optimized out>, 
>     flags=<value optimized out>, td=<value optimized out>)
>     at /usr/src/sys/kern/sys_socket.c:146
> #17 0xffffffff80823d84 in dofilewrite (td=0xfffff8000a24a500, fd=7, 
>     fp=0xfffff8000a0421e0, auio=0xfffffe0238f7c900, 
>     offset=<value optimized out>, flags=0) at file.h:311
> #18 0xffffffff80823ac8 in kern_writev (td=0xfffff8000a24a500, fd=7, 
>     auio=0xfffffe0238f7c900) at /usr/src/sys/kern/sys_generic.c:508
> #19 0xffffffff80823a54 in sys_write (td=0xfffff800705382f8, 
>     uap=<value optimized out>) at /usr/src/sys/kern/sys_generic.c:421
> #20 0xffffffff80aff33f in amd64_syscall (td=0xfffff8000a24a500, 
>     traced=<value optimized out>) at subr_syscall.c:135
> #21 0xffffffff80adf78b in Xfast_syscall ()
>     at /usr/src/sys/amd64/amd64/exception.S:396
> #22 0x0000000801261f5a in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> Current language:  auto; currently minimal
> (kgdb) 
> [...]
> -- 
> O. Hartmann
> 
> Ich widerspreche der Nutzung oder Übermittlung meiner Daten für
> Werbezwecke oder für die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG).



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161108182829.GA62725>