From owner-freebsd-current@freebsd.org Sun Nov 6 10:14:13 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 22290C339D4 for ; Sun, 6 Nov 2016 10:14:13 +0000 (UTC) (envelope-from ohartmann@walstatt.org) Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8F2BF2F7; Sun, 6 Nov 2016 10:14:11 +0000 (UTC) (envelope-from ohartmann@walstatt.org) Received: from thor.walstatt.dynvpn.de ([92.225.15.100]) by mail.gmx.com (mrgmx102) with ESMTPSA (Nemesis) id 0M82zV-1cpfxa1BqL-00viTu; Sun, 06 Nov 2016 11:14:03 +0100 Date: Sun, 6 Nov 2016 11:13:56 +0100 From: "O. Hartmann" To: Mark Johnston Cc: ohartmann@walstatt.org, FreeBSD CURRENT Subject: Re: was: CURRENT [r308087] still crashing: Backtrace provided Message-ID: <20161106111356.39850d7e@thor.walstatt.dynvpn.de> In-Reply-To: <20161105203748.GD63972@wkstn-mjohnston.west.isilon.com> References: <20161015121321.25007de8.ohartman@zedat.fu-berlin.de> <20161023182436.4d3bac4f.ohartman@zedat.fu-berlin.de> <20161029163336.46bb24c4.ohartman@zedat.fu-berlin.de> <20161030013345.GC67644@raichu> <20161030082525.6fb6d8a4.ohartman@zedat.fu-berlin.de> <20161030163934.GA49633@raichu> <20161030185500.64e57233.ohartman@zedat.fu-berlin.de> <20161030182509.GA1491@charmander> <20161105184509.28d162f1@thor.walstatt.dynvpn.de> <20161105203748.GD63972@wkstn-mjohnston.west.isilon.com> X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.29; amd64-portbld-freebsd12.0) MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; boundary="Sig_/dNxrmLBaTcMtCezL5KKhXq7"; protocol="application/pgp-signature" X-Provags-ID: V03:K0:Tk8DUbpjMQOJNvGY9fQ+a/YGHCTLCfdEmtc30iEkSG5xH/2WRbH QzMqjpdZ51lkUsJ2ILEaym1mHqZ8KeC0YOuT9EADbHs+E8cGxoYh7opftorfEtzs4v/gw5O VI7dcoWO1P/6DHEiUaTah+f8oi9+PROCDH1fOyJafkHzGhDM6QLP/WCDjItnGgLGC0XGkRY icP4ZD7yFTBDztIaNJqaw== X-UI-Out-Filterresults: notjunk:1;V01:K0:PKBJgpV3dT0=:P6M9Kd+Rya2Erlqy6GvRuc sszoB5e1A9Bgr7m46lRwS9qlfMwlgsf663o9eHBNS9jLAeWJ2lq38qaMuqPBCv2gpkDhRanlo ARVk9shoXmMr2K0sFnfwg4lx6+yZIKbpFsHNtAUyCnTL2uYVQ/s9az9eIXVSsHMo2eFWRbRup rr+KoI2wFfn6NVKSl8yAhm8MTgJX5YA+prsl2PHsFpp+4sF6xNPIHlIANg1NwNzwtWmFj/H5f DmwCKvDpAWWP58g8m1DFHVBTX9LkBLnZ3UUq4l9L6lFq6NM+A6wbUj0/UC8bp+H4KUgjtk4jJ aXVuJOtNv2PTy+qaNh6SDB1U4BhIirEWdFBooauMtQjg+lpUcZpBcgZ0d47q5oK7G2gHIt620 E8qwn8y3++oUSNQhSW3n6dVbZHlpKZicE299TvDLCVSpZOwl3bzTnyyWbJwwNv+ZkHOP4Fsjv upCYnsKy4axA+shHhBdHiFldaZDINZrOOP6JHltrQIc2OFAt7PklCuoA8FrDWl4vo1y62EXWv 6DjLzWxgW+5K6yPHdUmWTVeZcJkdkeuEQHO6U0cttnr1VP51GWylv2cL1JwZ4V9HQdpZJeYNd 12aMj6Dh1COlKFA0TKeThbW0xHM1NzrKCz94F46WnHEXvxn5EmgmROdIb28Wo01QJRckPJlnz Xj/EMQmZ1ni5W1azDScIoDM6cxaJP/dcGmC/Zxwug8OuS5AiiCjs9gBmGluEE6pGjuCQGmTzu 9dpTp961KweOcBvhV8Ku5MWlnX4eoNmOcee96unXAJxpQuKMIlC9IfYr3rI= X-Mailman-Approved-At: Sun, 06 Nov 2016 12:35:39 +0000 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Nov 2016 10:14:13 -0000 --Sig_/dNxrmLBaTcMtCezL5KKhXq7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Am Sat, 5 Nov 2016 13:37:48 -0700 Mark Johnston schrieb: > On Sat, Nov 05, 2016 at 06:45:09PM +0100, O. Hartmann wrote: > > Am Sun, 30 Oct 2016 11:25:09 -0700 > > Mark Johnston schrieb: > > =20 > > > On Sun, Oct 30, 2016 at 06:55:00PM +0100, O. Hartmann wrote: =20 > > > > Am Sun, 30 Oct 2016 09:39:34 -0700 > > > > Mark Johnston schrieb: =20 > > > > > Based on the stack trace and affected range of revisions, it may = be that > > > > > reverting r307887 or r307234 helps, but I have no specific eviden= ce for > > > > > this without the requested output. =20 > > > >=20 > > > > I had the crashing also with > r307300 until now, so that leaves me= with > > > > r307233 ... I will go further with that revision and report so far.= =20 > > >=20 > > > Hm, I don't see why this excludes r307887? In any case, r307234 looks= to > > > be the more likely culprit. =20 > >=20 > > Here I'm again. > >=20 > > This time, it was r308329 or r308331. WITHOUT the debug stuff compiled = into the > > kernel, it took approximately 5 minutes to provoke the crash. WITH the = debug options > > set, it took more than 45 minutes to let the system dump the core. I re= ally hope this > > time we can fix the problem, this moment, I have put the system back to= r307233 to > > see whether 3072034 is causing the crash as you suspect. =20 >=20 > Sorry, I don't quite follow - are you able to provoke the crash at > r307233? Or are you still testing that revision? Yesterday, I ran the whole day (> 9 hours) without problems r307233 without= the reported crash. Today's morning I got brave and tried r307234 - and had a crash within an h= our. >=20 > >=20 > > Attached, you'll find the backtrace report as last time. I had to type = in "dump" > > blindly on the system as a dark screen or a stuck X11 screen blocked th= e console (I > > use vt() and nVidia BLOB with my nVidia GPUs - and this is still broken= on FBSD). > >=20 > > Please let me know how I can assist further. I saved both the core AND = this time the > > culprit kernel. =20 >=20 > Great, thank you. I would first like to confirm that r307234 is indeed > causing the crash - since it appears to be easy to trigger, that should > be faster. If not, the core will help track down the real problem. Although I was under the impression the in-kernel-config option makeoptions DEBUG=3D-g would make debugging symbols available, I'm proved wrong. I tried also on=20 FreeBSD 12.0-CURRENT #15 r308329: Sat Nov 5 08:52:24 CET 2016 =20 and crashed, from which I picked up kernel and vmcore as well as the text of the backtrace as provided in an earlier mail (see below at [cor= e.txt.0], and if I perform this suggested command sequence: ohartmann@thor [kernel_debug]: kgdb ./kernel vmcore.0=20 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain condition= s. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols f= ound)... Attempt to extract a component of a value that is not a structure pointer. Attempt to extract a component of a value that is not a structure pointer. #0 0xffffffff807b8d83 in doadump () (kgdb) frame 12 #12 0xffffffff80923a74 in ip_output () (kgdb) p *ifp No symbol table is loaded. Use the "file" command. (kgdb) p *ro No symbol table is loaded. Use the "file" command. (kgdb) Again, I'm doing this kind of debugging the very first time and I miss some= thing here, apologizes for that. Sorry about the redundancy. The curious thing to me is that this bug is triggered on systems with Intel= CPU architectures older or equal than IvyBridge. The very same /etc/make.conf and /etc/src.conf as well as the very same kernel config apart from some lo= cal hardware dependend modifications are spread around my servers and workstations and e= specially my bureau's box is a sHaswell XEON with almost the exact same confict running = on CURRENT (recent as of Thursday) without problems while the box I'm reporting this e= rror from is crashing (i3-3220, the server, also crashing here, is a E3-1245 V2. Another= crashing system is a 2009 C2D XEON 5XXX, two socket server, crashing the same way, b= ut with a different kernel config. I tried on the crashing systems with GENERIC as well with the same results. I'm using IPFW as the firewall on all systems. Please tell me if you revert some commits, I'll then checkout the sources u= p to recent CURRENT and try again. This just for addition and completion. Kind regards and thanks in advance, Oliver [...] [core.txt.0] ... Fatal trap 9: general protection fault while in kernel mode cpuid =3D 0; apic id =3D 00 instruction pointer =3D 0x20:0xffffffff807b44fb stack pointer =3D 0x28:0xfffffe0238f7c290 frame pointer =3D 0x28:0xfffffe0238f7c310 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 521 (nslcd) Reading symbols from /boot/modules/nvidia-modeset.ko...done. Loaded symbols for /boot/modules/nvidia-modeset.ko Reading symbols from /boot/modules/nvidia.ko...done. Loaded symbols for /boot/modules/nvidia.ko #0 doadump (textdump=3D0) at pcpu.h:222 222 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump (textdump=3D0) at pcpu.h:222 #1 0xffffffff8049e1eb in db_dump (dummy=3D, dummy2=3D= false,=20 dummy3=3D0, dummy4=3D0x0) at /usr/src/sys/ddb/db_command.c:546 #2 0xffffffff8049dfe9 in db_command (cmd_table=3D) at /usr/src/sys/ddb/db_command.c:453 #3 0xffffffff8049dd44 in db_command_loop () at /usr/src/sys/ddb/db_command.c:506 #4 0xffffffff804a11af in db_trap (type=3D,=20 code=3D) at /usr/src/sys/ddb/db_main.c:248 #5 0xffffffff807fd3e3 in kdb_trap (type=3D,=20 code=3D, tf=3D) at /usr/src/sys/kern/subr_kdb.c:654 #6 0xffffffff80afeaf1 in trap_fatal (frame=3D0xfffffe0238f7c1d0, eva=3D0) at /usr/src/sys/amd64/amd64/trap.c:796 #7 0xffffffff80afe7df in trap (frame=3D0xfffffe0238f7c1d0) at /usr/src/sys/amd64/amd64/trap.c:198 #8 0xffffffff80adf4a1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #9 0xffffffff807b44fb in __rw_wlock_hard (c=3D,=20 tid=3D, file=3D,=20 line=3D) at /usr/src/sys/kern/kern_rwlock.c:830 #10 0xffffffff807b437c in _rw_wlock_cookie (c=3D0xfffff80070538310,=20 file=3D0xffffffff80ca31b2 "/usr/src/sys/net/if_ethersubr.c", line=3D304) at /usr/src/sys/kern/kern_rwlock.c:296 #11 0xffffffff808d1e07 in ether_output (ifp=3D0xfffff800036e7800,=20 m=3D, dst=3D0xfffff8003d980e60, ro=3D0xfffff8003d9= 80e40) at /usr/src/sys/net/if_ethersubr.c:304 #12 0xffffffff80923a74 in ip_output (m=3D0xfffff8000a24a500,=20 opt=3D, ro=3D, flags=3D0, imo= =3D0x0,=20 inp=3D) at /usr/src/sys/netinet/ip_output.c:664 #13 0xffffffff8099a7ee in tcp_output (tp=3D) at /usr/src/sys/netinet/tcp_output.c:1432 #14 0xffffffff809a7c88 in tcp_usr_send (so=3D,=20 flags=3D, m=3D0xfffff8003d837800, nam=3D0x0,=20 control=3D, td=3D0xfffff8000a24a500) at /usr/src/sys/netinet/tcp_usrreq.c:956 #15 0xffffffff808567b4 in sosend_generic (so=3D,=20 addr=3D, uio=3D,=20 top=3D0xfffff8003d837800, control=3D,=20 flags=3D, td=3D) at /usr/src/sys/kern/uipc_socket.c:1359 #16 0xffffffff8082d672 in soo_write (fp=3D,=20 uio=3D0xfffffe0238f7c900, active_cred=3D,=20 flags=3D, td=3D) at /usr/src/sys/kern/sys_socket.c:146 #17 0xffffffff80823d84 in dofilewrite (td=3D0xfffff8000a24a500, fd=3D7,=20 fp=3D0xfffff8000a0421e0, auio=3D0xfffffe0238f7c900,=20 offset=3D, flags=3D0) at file.h:311 #18 0xffffffff80823ac8 in kern_writev (td=3D0xfffff8000a24a500, fd=3D7,=20 auio=3D0xfffffe0238f7c900) at /usr/src/sys/kern/sys_generic.c:508 #19 0xffffffff80823a54 in sys_write (td=3D0xfffff800705382f8,=20 uap=3D) at /usr/src/sys/kern/sys_generic.c:421 #20 0xffffffff80aff33f in amd64_syscall (td=3D0xfffff8000a24a500,=20 traced=3D) at subr_syscall.c:135 #21 0xffffffff80adf78b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #22 0x0000000801261f5a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb)=20 [...] --=20 O. Hartmann Ich widerspreche der Nutzung oder =C3=9Cbermittlung meiner Daten f=C3=BCr Werbezwecke oder f=C3=BCr die Markt- oder Meinungsforschung (=C2=A7 28 Abs.= 4 BDSG). --Sig_/dNxrmLBaTcMtCezL5KKhXq7 Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iJ4EARMKAAYFAlgfAmQACgkQ0udvH8hYWJRxkwH+LMT+KEGPezDsIqkzfbBLSKDw AA5SoJpdM5pgd9z/f+IvElz4U7KJH5jCsp+TlZI0mtir7On40/c+qoOLR2ZZoQIA lLowPjokqqXAknPpIwV6eZ8OmTL+5DUs0fIdnAMrjMsVxdNVdPKAl9rRLoP2RDNy y/k0fS1hR/2PbcN11TpDoQ== =F3Dr -----END PGP SIGNATURE----- --Sig_/dNxrmLBaTcMtCezL5KKhXq7--