Date: Sun, 6 Nov 2016 11:13:56 +0100 From: "O. Hartmann" <ohartmann@walstatt.org> To: Mark Johnston <markj@FreeBSD.org> Cc: ohartmann@walstatt.org, FreeBSD CURRENT <freebsd-current@freebsd.org> Subject: Re: was: CURRENT [r308087] still crashing: Backtrace provided Message-ID: <20161106111356.39850d7e@thor.walstatt.dynvpn.de> In-Reply-To: <20161105203748.GD63972@wkstn-mjohnston.west.isilon.com> References: <20161015121321.25007de8.ohartman@zedat.fu-berlin.de> <20161023182436.4d3bac4f.ohartman@zedat.fu-berlin.de> <alpine.GSO.1.10.1610231515170.5272@multics.mit.edu> <20161029163336.46bb24c4.ohartman@zedat.fu-berlin.de> <20161030013345.GC67644@raichu> <20161030082525.6fb6d8a4.ohartman@zedat.fu-berlin.de> <20161030163934.GA49633@raichu> <20161030185500.64e57233.ohartman@zedat.fu-berlin.de> <20161030182509.GA1491@charmander> <20161105184509.28d162f1@thor.walstatt.dynvpn.de> <20161105203748.GD63972@wkstn-mjohnston.west.isilon.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/dNxrmLBaTcMtCezL5KKhXq7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Am Sat, 5 Nov 2016 13:37:48 -0700 Mark Johnston <markj@FreeBSD.org> schrieb: > On Sat, Nov 05, 2016 at 06:45:09PM +0100, O. Hartmann wrote: > > Am Sun, 30 Oct 2016 11:25:09 -0700 > > Mark Johnston <markj@FreeBSD.org> schrieb: > > =20 > > > On Sun, Oct 30, 2016 at 06:55:00PM +0100, O. Hartmann wrote: =20 > > > > Am Sun, 30 Oct 2016 09:39:34 -0700 > > > > Mark Johnston <markj@FreeBSD.org> schrieb: =20 > > > > > Based on the stack trace and affected range of revisions, it may = be that > > > > > reverting r307887 or r307234 helps, but I have no specific eviden= ce for > > > > > this without the requested output. =20 > > > >=20 > > > > I had the crashing also with > r307300 until now, so that leaves me= with > > > > r307233 ... I will go further with that revision and report so far.= =20 > > >=20 > > > Hm, I don't see why this excludes r307887? In any case, r307234 looks= to > > > be the more likely culprit. =20 > >=20 > > Here I'm again. > >=20 > > This time, it was r308329 or r308331. WITHOUT the debug stuff compiled = into the > > kernel, it took approximately 5 minutes to provoke the crash. WITH the = debug options > > set, it took more than 45 minutes to let the system dump the core. I re= ally hope this > > time we can fix the problem, this moment, I have put the system back to= r307233 to > > see whether 3072034 is causing the crash as you suspect. =20 >=20 > Sorry, I don't quite follow - are you able to provoke the crash at > r307233? Or are you still testing that revision? Yesterday, I ran the whole day (> 9 hours) without problems r307233 without= the reported crash. Today's morning I got brave and tried r307234 - and had a crash within an h= our. >=20 > >=20 > > Attached, you'll find the backtrace report as last time. I had to type = in "dump" > > blindly on the system as a dark screen or a stuck X11 screen blocked th= e console (I > > use vt() and nVidia BLOB with my nVidia GPUs - and this is still broken= on FBSD). > >=20 > > Please let me know how I can assist further. I saved both the core AND = this time the > > culprit kernel. =20 >=20 > Great, thank you. I would first like to confirm that r307234 is indeed > causing the crash - since it appears to be easy to trigger, that should > be faster. If not, the core will help track down the real problem. Although I was under the impression the in-kernel-config option makeoptions DEBUG=3D-g would make debugging symbols available, I'm proved wrong. I tried also on=20 FreeBSD 12.0-CURRENT #15 r308329: Sat Nov 5 08:52:24 CET 2016 =20 and crashed, from which I picked up kernel and vmcore as well as the text of the backtrace as provided in an earlier mail (see below at [cor= e.txt.0], and if I perform this suggested command sequence: ohartmann@thor [kernel_debug]: kgdb ./kernel vmcore.0=20 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain condition= s. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols f= ound)... Attempt to extract a component of a value that is not a structure pointer. Attempt to extract a component of a value that is not a structure pointer. #0 0xffffffff807b8d83 in doadump () (kgdb) frame 12 #12 0xffffffff80923a74 in ip_output () (kgdb) p *ifp No symbol table is loaded. Use the "file" command. (kgdb) p *ro No symbol table is loaded. Use the "file" command. (kgdb) Again, I'm doing this kind of debugging the very first time and I miss some= thing here, apologizes for that. Sorry about the redundancy. The curious thing to me is that this bug is triggered on systems with Intel= CPU architectures older or equal than IvyBridge. The very same /etc/make.conf and /etc/src.conf as well as the very same kernel config apart from some lo= cal hardware dependend modifications are spread around my servers and workstations and e= specially my bureau's box is a sHaswell XEON with almost the exact same confict running = on CURRENT (recent as of Thursday) without problems while the box I'm reporting this e= rror from is crashing (i3-3220, the server, also crashing here, is a E3-1245 V2. Another= crashing system is a 2009 C2D XEON 5XXX, two socket server, crashing the same way, b= ut with a different kernel config. I tried on the crashing systems with GENERIC as well with the same results. I'm using IPFW as the firewall on all systems. Please tell me if you revert some commits, I'll then checkout the sources u= p to recent CURRENT and try again. This just for addition and completion. Kind regards and thanks in advance, Oliver [...] [core.txt.0] ... Fatal trap 9: general protection fault while in kernel mode cpuid =3D 0; apic id =3D 00 instruction pointer =3D 0x20:0xffffffff807b44fb stack pointer =3D 0x28:0xfffffe0238f7c290 frame pointer =3D 0x28:0xfffffe0238f7c310 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 521 (nslcd) Reading symbols from /boot/modules/nvidia-modeset.ko...done. Loaded symbols for /boot/modules/nvidia-modeset.ko Reading symbols from /boot/modules/nvidia.ko...done. Loaded symbols for /boot/modules/nvidia.ko #0 doadump (textdump=3D0) at pcpu.h:222 222 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump (textdump=3D0) at pcpu.h:222 #1 0xffffffff8049e1eb in db_dump (dummy=3D<value optimized out>, dummy2=3D= false,=20 dummy3=3D0, dummy4=3D0x0) at /usr/src/sys/ddb/db_command.c:546 #2 0xffffffff8049dfe9 in db_command (cmd_table=3D<value optimized out>) at /usr/src/sys/ddb/db_command.c:453 #3 0xffffffff8049dd44 in db_command_loop () at /usr/src/sys/ddb/db_command.c:506 #4 0xffffffff804a11af in db_trap (type=3D<value optimized out>,=20 code=3D<value optimized out>) at /usr/src/sys/ddb/db_main.c:248 #5 0xffffffff807fd3e3 in kdb_trap (type=3D<value optimized out>,=20 code=3D<value optimized out>, tf=3D<value optimized out>) at /usr/src/sys/kern/subr_kdb.c:654 #6 0xffffffff80afeaf1 in trap_fatal (frame=3D0xfffffe0238f7c1d0, eva=3D0) at /usr/src/sys/amd64/amd64/trap.c:796 #7 0xffffffff80afe7df in trap (frame=3D0xfffffe0238f7c1d0) at /usr/src/sys/amd64/amd64/trap.c:198 #8 0xffffffff80adf4a1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #9 0xffffffff807b44fb in __rw_wlock_hard (c=3D<value optimized out>,=20 tid=3D<value optimized out>, file=3D<value optimized out>,=20 line=3D<value optimized out>) at /usr/src/sys/kern/kern_rwlock.c:830 #10 0xffffffff807b437c in _rw_wlock_cookie (c=3D0xfffff80070538310,=20 file=3D0xffffffff80ca31b2 "/usr/src/sys/net/if_ethersubr.c", line=3D304) at /usr/src/sys/kern/kern_rwlock.c:296 #11 0xffffffff808d1e07 in ether_output (ifp=3D0xfffff800036e7800,=20 m=3D<value optimized out>, dst=3D0xfffff8003d980e60, ro=3D0xfffff8003d9= 80e40) at /usr/src/sys/net/if_ethersubr.c:304 #12 0xffffffff80923a74 in ip_output (m=3D0xfffff8000a24a500,=20 opt=3D<value optimized out>, ro=3D<value optimized out>, flags=3D0, imo= =3D0x0,=20 inp=3D<value optimized out>) at /usr/src/sys/netinet/ip_output.c:664 #13 0xffffffff8099a7ee in tcp_output (tp=3D<value optimized out>) at /usr/src/sys/netinet/tcp_output.c:1432 #14 0xffffffff809a7c88 in tcp_usr_send (so=3D<value optimized out>,=20 flags=3D<value optimized out>, m=3D0xfffff8003d837800, nam=3D0x0,=20 control=3D<value optimized out>, td=3D0xfffff8000a24a500) at /usr/src/sys/netinet/tcp_usrreq.c:956 #15 0xffffffff808567b4 in sosend_generic (so=3D<value optimized out>,=20 addr=3D<value optimized out>, uio=3D<value optimized out>,=20 top=3D0xfffff8003d837800, control=3D<value optimized out>,=20 flags=3D<value optimized out>, td=3D<value optimized out>) at /usr/src/sys/kern/uipc_socket.c:1359 #16 0xffffffff8082d672 in soo_write (fp=3D<value optimized out>,=20 uio=3D0xfffffe0238f7c900, active_cred=3D<value optimized out>,=20 flags=3D<value optimized out>, td=3D<value optimized out>) at /usr/src/sys/kern/sys_socket.c:146 #17 0xffffffff80823d84 in dofilewrite (td=3D0xfffff8000a24a500, fd=3D7,=20 fp=3D0xfffff8000a0421e0, auio=3D0xfffffe0238f7c900,=20 offset=3D<value optimized out>, flags=3D0) at file.h:311 #18 0xffffffff80823ac8 in kern_writev (td=3D0xfffff8000a24a500, fd=3D7,=20 auio=3D0xfffffe0238f7c900) at /usr/src/sys/kern/sys_generic.c:508 #19 0xffffffff80823a54 in sys_write (td=3D0xfffff800705382f8,=20 uap=3D<value optimized out>) at /usr/src/sys/kern/sys_generic.c:421 #20 0xffffffff80aff33f in amd64_syscall (td=3D0xfffff8000a24a500,=20 traced=3D<value optimized out>) at subr_syscall.c:135 #21 0xffffffff80adf78b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #22 0x0000000801261f5a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb)=20 [...] --=20 O. Hartmann Ich widerspreche der Nutzung oder =C3=9Cbermittlung meiner Daten f=C3=BCr Werbezwecke oder f=C3=BCr die Markt- oder Meinungsforschung (=C2=A7 28 Abs.= 4 BDSG). --Sig_/dNxrmLBaTcMtCezL5KKhXq7 Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iJ4EARMKAAYFAlgfAmQACgkQ0udvH8hYWJRxkwH+LMT+KEGPezDsIqkzfbBLSKDw AA5SoJpdM5pgd9z/f+IvElz4U7KJH5jCsp+TlZI0mtir7On40/c+qoOLR2ZZoQIA lLowPjokqqXAknPpIwV6eZ8OmTL+5DUs0fIdnAMrjMsVxdNVdPKAl9rRLoP2RDNy y/k0fS1hR/2PbcN11TpDoQ== =F3Dr -----END PGP SIGNATURE----- --Sig_/dNxrmLBaTcMtCezL5KKhXq7--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161106111356.39850d7e>