Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 6 Nov 2016 11:13:56 +0100
From:      "O. Hartmann" <ohartmann@walstatt.org>
To:        Mark Johnston <markj@FreeBSD.org>
Cc:        ohartmann@walstatt.org, FreeBSD CURRENT <freebsd-current@freebsd.org>
Subject:   Re: was: CURRENT [r308087] still crashing: Backtrace provided
Message-ID:  <20161106111356.39850d7e@thor.walstatt.dynvpn.de>
In-Reply-To: <20161105203748.GD63972@wkstn-mjohnston.west.isilon.com>
References:  <20161015121321.25007de8.ohartman@zedat.fu-berlin.de> <20161023182436.4d3bac4f.ohartman@zedat.fu-berlin.de> <alpine.GSO.1.10.1610231515170.5272@multics.mit.edu> <20161029163336.46bb24c4.ohartman@zedat.fu-berlin.de> <20161030013345.GC67644@raichu> <20161030082525.6fb6d8a4.ohartman@zedat.fu-berlin.de> <20161030163934.GA49633@raichu> <20161030185500.64e57233.ohartman@zedat.fu-berlin.de> <20161030182509.GA1491@charmander> <20161105184509.28d162f1@thor.walstatt.dynvpn.de> <20161105203748.GD63972@wkstn-mjohnston.west.isilon.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/dNxrmLBaTcMtCezL5KKhXq7
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Am Sat, 5 Nov 2016 13:37:48 -0700
Mark Johnston <markj@FreeBSD.org> schrieb:

> On Sat, Nov 05, 2016 at 06:45:09PM +0100, O. Hartmann wrote:
> > Am Sun, 30 Oct 2016 11:25:09 -0700
> > Mark Johnston <markj@FreeBSD.org> schrieb:
> >  =20
> > > On Sun, Oct 30, 2016 at 06:55:00PM +0100, O. Hartmann wrote: =20
> > > > Am Sun, 30 Oct 2016 09:39:34 -0700
> > > > Mark Johnston <markj@FreeBSD.org> schrieb: =20
> > > > > Based on the stack trace and affected range of revisions, it may =
be that
> > > > > reverting r307887 or r307234 helps, but I have no specific eviden=
ce for
> > > > > this without the requested output.   =20
> > > >=20
> > > > I had the crashing also with > r307300 until now, so that leaves me=
 with
> > > > r307233 ... I will go further with that revision and report so far.=
    =20
> > >=20
> > > Hm, I don't see why this excludes r307887? In any case, r307234 looks=
 to
> > > be the more likely culprit. =20
> >=20
> > Here I'm again.
> >=20
> > This time, it was r308329 or r308331. WITHOUT the debug stuff compiled =
into the
> > kernel, it took approximately 5 minutes to provoke the crash. WITH the =
debug options
> > set, it took more than 45 minutes to let the system dump the core. I re=
ally hope this
> > time we can fix the problem, this moment, I have put the system back to=
 r307233 to
> > see whether 3072034 is causing the crash as you suspect. =20
>=20
> Sorry, I don't quite follow - are you able to provoke the crash at
> r307233? Or are you still testing that revision?

Yesterday, I ran the whole day (> 9 hours) without problems r307233 without=
 the reported
crash.

Today's morning I got brave and tried r307234 - and had a crash within an h=
our.

>=20
> >=20
> > Attached, you'll find the backtrace report as last time. I had to type =
in "dump"
> > blindly on the system as a dark screen or a stuck X11 screen blocked th=
e console (I
> > use vt() and nVidia BLOB with my nVidia GPUs - and this is still broken=
 on FBSD).
> >=20
> > Please let me know how I can assist further. I saved both the core AND =
this time the
> > culprit kernel. =20
>=20
> Great, thank you. I would first like to confirm that r307234 is indeed
> causing the crash - since it appears to be easy to trigger, that should
> be faster. If not, the core will help track down the real problem.

Although I was under the impression the in-kernel-config option

makeoptions    DEBUG=3D-g

would make debugging symbols available, I'm proved wrong.

I tried also on=20

FreeBSD 12.0-CURRENT #15 r308329: Sat Nov  5 08:52:24 CET 2016
=20
and crashed, from which I picked up kernel and vmcore as well as
the text of the backtrace as provided in an earlier mail (see below at [cor=
e.txt.0], and
if I perform this suggested command sequence:

ohartmann@thor [kernel_debug]: kgdb ./kernel vmcore.0=20
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain condition=
s.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols f=
ound)...
Attempt to extract a component of a value that is not a structure pointer.
Attempt to extract a component of a value that is not a structure pointer.
#0  0xffffffff807b8d83 in doadump ()
(kgdb) frame 12
#12 0xffffffff80923a74 in ip_output ()
(kgdb) p *ifp
No symbol table is loaded.  Use the "file" command.
(kgdb) p *ro
No symbol table is loaded.  Use the "file" command.
(kgdb)

Again, I'm doing this kind of debugging the very first time and I miss some=
thing here,
apologizes for that.

Sorry about the redundancy.

The curious thing to me is that this bug is triggered on systems with Intel=
 CPU
architectures older or equal than IvyBridge. The very same /etc/make.conf
and /etc/src.conf as well as the very same kernel config apart from some lo=
cal hardware
dependend modifications are spread around my servers and workstations and e=
specially my
bureau's box is a sHaswell XEON with almost the exact same confict running =
on CURRENT
(recent as of Thursday) without problems while the box I'm reporting this e=
rror from is
crashing (i3-3220, the server, also crashing here, is a E3-1245 V2. Another=
 crashing
system is a 2009 C2D XEON 5XXX, two socket server, crashing the same way, b=
ut with a
different kernel config.
I tried on the crashing systems with GENERIC as well with the same results.

I'm using IPFW as the firewall on all systems.

Please tell me if you revert some commits, I'll then checkout the sources u=
p to recent
CURRENT and try again.

This just for addition and completion.


Kind regards and thanks in advance,

Oliver

[...]
[core.txt.0]
...
Fatal trap 9: general protection fault while in kernel mode
cpuid =3D 0; apic id =3D 00
instruction pointer     =3D 0x20:0xffffffff807b44fb
stack pointer           =3D 0x28:0xfffffe0238f7c290
frame pointer           =3D 0x28:0xfffffe0238f7c310
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D 521 (nslcd)

Reading symbols from /boot/modules/nvidia-modeset.ko...done.
Loaded symbols for /boot/modules/nvidia-modeset.ko
Reading symbols from /boot/modules/nvidia.ko...done.
Loaded symbols for /boot/modules/nvidia.ko
#0  doadump (textdump=3D0) at pcpu.h:222
222     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) #0  doadump (textdump=3D0) at pcpu.h:222
#1  0xffffffff8049e1eb in db_dump (dummy=3D<value optimized out>, dummy2=3D=
false,=20
    dummy3=3D0, dummy4=3D0x0) at /usr/src/sys/ddb/db_command.c:546
#2  0xffffffff8049dfe9 in db_command (cmd_table=3D<value optimized out>)
    at /usr/src/sys/ddb/db_command.c:453
#3  0xffffffff8049dd44 in db_command_loop ()
    at /usr/src/sys/ddb/db_command.c:506
#4  0xffffffff804a11af in db_trap (type=3D<value optimized out>,=20
    code=3D<value optimized out>) at /usr/src/sys/ddb/db_main.c:248
#5  0xffffffff807fd3e3 in kdb_trap (type=3D<value optimized out>,=20
    code=3D<value optimized out>, tf=3D<value optimized out>)
    at /usr/src/sys/kern/subr_kdb.c:654
#6  0xffffffff80afeaf1 in trap_fatal (frame=3D0xfffffe0238f7c1d0, eva=3D0)
    at /usr/src/sys/amd64/amd64/trap.c:796
#7  0xffffffff80afe7df in trap (frame=3D0xfffffe0238f7c1d0)
    at /usr/src/sys/amd64/amd64/trap.c:198
#8  0xffffffff80adf4a1 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:236
#9  0xffffffff807b44fb in __rw_wlock_hard (c=3D<value optimized out>,=20
    tid=3D<value optimized out>, file=3D<value optimized out>,=20
    line=3D<value optimized out>) at /usr/src/sys/kern/kern_rwlock.c:830
#10 0xffffffff807b437c in _rw_wlock_cookie (c=3D0xfffff80070538310,=20
    file=3D0xffffffff80ca31b2 "/usr/src/sys/net/if_ethersubr.c", line=3D304)
    at /usr/src/sys/kern/kern_rwlock.c:296
#11 0xffffffff808d1e07 in ether_output (ifp=3D0xfffff800036e7800,=20
    m=3D<value optimized out>, dst=3D0xfffff8003d980e60, ro=3D0xfffff8003d9=
80e40)
    at /usr/src/sys/net/if_ethersubr.c:304
#12 0xffffffff80923a74 in ip_output (m=3D0xfffff8000a24a500,=20
    opt=3D<value optimized out>, ro=3D<value optimized out>, flags=3D0, imo=
=3D0x0,=20
    inp=3D<value optimized out>) at /usr/src/sys/netinet/ip_output.c:664
#13 0xffffffff8099a7ee in tcp_output (tp=3D<value optimized out>)
    at /usr/src/sys/netinet/tcp_output.c:1432
#14 0xffffffff809a7c88 in tcp_usr_send (so=3D<value optimized out>,=20
    flags=3D<value optimized out>, m=3D0xfffff8003d837800, nam=3D0x0,=20
    control=3D<value optimized out>, td=3D0xfffff8000a24a500)
    at /usr/src/sys/netinet/tcp_usrreq.c:956
#15 0xffffffff808567b4 in sosend_generic (so=3D<value optimized out>,=20
    addr=3D<value optimized out>, uio=3D<value optimized out>,=20
    top=3D0xfffff8003d837800, control=3D<value optimized out>,=20
    flags=3D<value optimized out>, td=3D<value optimized out>)
    at /usr/src/sys/kern/uipc_socket.c:1359
#16 0xffffffff8082d672 in soo_write (fp=3D<value optimized out>,=20
    uio=3D0xfffffe0238f7c900, active_cred=3D<value optimized out>,=20
    flags=3D<value optimized out>, td=3D<value optimized out>)
    at /usr/src/sys/kern/sys_socket.c:146
#17 0xffffffff80823d84 in dofilewrite (td=3D0xfffff8000a24a500, fd=3D7,=20
    fp=3D0xfffff8000a0421e0, auio=3D0xfffffe0238f7c900,=20
    offset=3D<value optimized out>, flags=3D0) at file.h:311
#18 0xffffffff80823ac8 in kern_writev (td=3D0xfffff8000a24a500, fd=3D7,=20
    auio=3D0xfffffe0238f7c900) at /usr/src/sys/kern/sys_generic.c:508
#19 0xffffffff80823a54 in sys_write (td=3D0xfffff800705382f8,=20
    uap=3D<value optimized out>) at /usr/src/sys/kern/sys_generic.c:421
#20 0xffffffff80aff33f in amd64_syscall (td=3D0xfffff8000a24a500,=20
    traced=3D<value optimized out>) at subr_syscall.c:135
#21 0xffffffff80adf78b in Xfast_syscall ()
    at /usr/src/sys/amd64/amd64/exception.S:396
#22 0x0000000801261f5a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
(kgdb)=20
[...]
--=20
O. Hartmann

Ich widerspreche der Nutzung oder =C3=9Cbermittlung meiner Daten f=C3=BCr
Werbezwecke oder f=C3=BCr die Markt- oder Meinungsforschung (=C2=A7 28 Abs.=
 4 BDSG).

--Sig_/dNxrmLBaTcMtCezL5KKhXq7
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----

iJ4EARMKAAYFAlgfAmQACgkQ0udvH8hYWJRxkwH+LMT+KEGPezDsIqkzfbBLSKDw
AA5SoJpdM5pgd9z/f+IvElz4U7KJH5jCsp+TlZI0mtir7On40/c+qoOLR2ZZoQIA
lLowPjokqqXAknPpIwV6eZ8OmTL+5DUs0fIdnAMrjMsVxdNVdPKAl9rRLoP2RDNy
y/k0fS1hR/2PbcN11TpDoQ==
=F3Dr
-----END PGP SIGNATURE-----

--Sig_/dNxrmLBaTcMtCezL5KKhXq7--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161106111356.39850d7e>