Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 31 Oct 2005 19:54:53 +0100
From:      Max Laier <max@love2party.net>
To:        freebsd-current@freebsd.org
Subject:   Re: CURRENT + amd64 + user-ppp = panic
Message-ID:  <200510311955.13137.max@love2party.net>
In-Reply-To: <200510281404.33462.jhb@freebsd.org>
References:  <20051027022313.R675@kushnir1.kiev.ua> <43602F2F.7080500@samsco.org> <200510281404.33462.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart1298973.9I7Ud9geTj
Content-Type: multipart/mixed;
  boundary="Boundary-01=_MimZDv6xdU755fY"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

--Boundary-01=_MimZDv6xdU755fY
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On Friday 28 October 2005 20:04, John Baldwin wrote:
> On Wednesday 26 October 2005 09:36 pm, Scott Long wrote:
> > Vladimir Kushnir wrote:
> > > Hello,
> > > For a couple of days already my -CURRENT amd64 reliably panicks
> > > whenever I'm trying to connect via ppp (nothing fancy - playn dialup,
> > > no firewall). It's 100% reproducible both with custom kernel and with
> > > GENERIC. A typescript of kgdb is attached.
> > >
> > > I'm running now on the kernel from Oct 19 which also panicks, BTW, wi=
th
> > > "kmem_map too small" on an attempt to run something like Linux
> > > OpenOffice or Mathematica (neither kern.ipc.nmbclusters nor
> > > vm.kmem_size_max tweaking helps; besides, I've only 512 MB RAM)
> > >
> > > Regards,
> > > Vladimir
> >
> > I think that this is a result of the interrupt handler changes that John
> > Baldwin made yesterday.  Can you step your source back in time and see
> > where it stops panicing?
>
> Actually, it can't be if softclock() is called directly from
> ithread_loop(). In the new code ithread_loop() calls
> ithread_execute_handlers() which would call softclock().
>
> > > #0  doadump () at pcpu.h:172
> > >
> > > 172	pcpu.h: No such file or directory.
> > >
> > > 	in pcpu.h
> > >
> > > (kgdb) where
> > >
> > > #0  doadump () at pcpu.h:172
> > > #1  0xffffffff803c65fc in boot (howto=3D260)
> > >     at /usr/src/sys/kern/kern_shutdown.c:399
> > > #2  0xffffffff803c609b in panic (fmt=3D0xffffffff805f2f46 "from
> > > debugger") at /usr/src/sys/kern/kern_shutdown.c:555
> > > #3  0xffffffff801a8a32 in db_panic (addr=3D0, have_addr=3D0, count=3D=
0,
> > > modif=3D0x0)
> > >     at /usr/src/sys/ddb/db_command.c:435
> > > #4  0xffffffff801a8f75 in db_command_loop ()
> > >     at /usr/src/sys/ddb/db_command.c:404
> > > #5  0xffffffff801aae83 in db_trap (type=3D-1794574032, code=3D0)
> > >     at /usr/src/sys/ddb/db_main.c:221
> > > #6  0xffffffff803e5279 in kdb_trap (type=3D9, code=3D0,
> > > tf=3D0xffffffff9508fb10)
> > >     at /usr/src/sys/kern/subr_kdb.c:445
> > > #7  0xffffffff8058d84e in trap_fatal (frame=3D0xffffffff9508fb10,
> > >     eva=3D18446742974715243568) at /usr/src/sys/amd64/amd64/trap.c:672
> > > #8  0xffffffff8058ddb1 in trap (frame=3D
> > >       {tf_rdi =3D 1, tf_rsi =3D 70876, tf_rdx =3D -240105096286740457=
8,
> > > tf_rcx =3D 70876, tf_r8 =3D 0, tf_r9 =3D 1, tf_rax =3D 5340, tf_rbx =
=3D 1, tf_rbp
> > > =3D -1794573296, tf_r10 =3D 1, tf_r11 =3D 4, tf_r12 =3D -109951114368=
0, tf_r13
> > > =3D -1099035903488, tf_r14 =3D -1964245152, tf_r15 =3D 2, tf_trapno =
=3D 9,
> > > tf_addr =3D 0, tf_flags =3D 0, tf_err =3D 0, tf_rip =3D -2143462195, =
tf_cs =3D 8,
> > > tf_rflags =3D 65538, tf_rsp =3D -1794573360, tf_ss =3D 16}) at
> > > /usr/src/sys/amd64/amd64/trap.c:488
> > > #9  0xffffffff8057b3bb in calltrap ()
> > >     at /usr/src/sys/amd64/amd64/exception.S:168
>
> This looks like a page fault rather than a 'kmem_map too small' panic.
>
> > > ---Type <return> to continue, or q <return> to quit---
> > >
> > > #10 0xffffffff803d5ccd in softclock (dummy=3D0x1)
> > >     at /usr/src/sys/kern/kern_timeout.c:220
>
> This is here:
> 		while (c) {
> 			depth++;
> 		=3D=3D>	if (c->c_time !=3D curticks) {
> 				c =3D TAILQ_NEXT(c, c_links.tqe);
>
> c can't be NULL due to the while loop.  Are any kernel modules being
> unloaded when this happens?

It isn't a NULL deref as "eva" is clearly non-NULL above.  This makes me th=
ink=20
of a callout list inconsistency.  Most likely - due to the rest of the thre=
ad=20
=2D this was introduced via "tn_timer_ch" in struct llinfo_nd6.  I am think=
ing=20
of a double callout_stop() or something like that.  The callout_stop/reset(=
)=20
calls on that callout are clearly over-nested to get things from a quick=20
glance :-\

The easiest seems to be to put some good old printf() debugging in=20
nd6_llinfo_settimer() and see what it does.  Vladimir, could you try that? =
=20
"Patch" attached.

> > > #11 0xffffffff803b05cc in ithread_loop (arg=3D0xffffff0000031780)
> > >     at /usr/src/sys/kern/kern_intr.c:662
> > > #12 0xffffffff803af3cb in fork_exit (
> > >     callout=3D0xffffffff803b0480 <ithread_loop>, arg=3D0xffffff000003=
1780,
> > >     frame=3D0xffffffff9508fc90) at /usr/src/sys/kern/kern_fork.c:789
> > > #13 0xffffffff8057b71e in fork_trampoline ()
> > >     at /usr/src/sys/amd64/amd64/exception.S:394
> > > #14 0x0000000000000000 in ?? ()

=2D-=20
/"\  Best regards,                      | mlaier@freebsd.org
\ /  Max Laier                          | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | mlaier@EFnet
/ \  ASCII Ribbon Campaign              | Against HTML Mail and News

--Boundary-01=_MimZDv6xdU755fY
Content-Type: text/x-diff; charset="iso-8859-1";
	name="nd6_llinfo_settimer.printf.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename="nd6_llinfo_settimer.printf.diff"

Index: nd6.c
===================================================================
RCS file: /usr/store/mlaier/fcvs/src/sys/netinet6/nd6.c,v
retrieving revision 1.62
diff -u -p -r1.62 nd6.c
--- nd6.c	22 Oct 2005 05:07:16 -0000	1.62
+++ nd6.c	31 Oct 2005 18:49:58 -0000
@@ -395,6 +395,7 @@ nd6_llinfo_settimer(ln, tick)
 	struct llinfo_nd6 *ln;
 	long tick;
 {
+	printf("For %p %ld ticks\n", ln, tick);
 	if (tick < 0) {
 		ln->ln_expire = 0;
 		ln->ln_ntick = 0;

--Boundary-01=_MimZDv6xdU755fY--

--nextPart1298973.9I7Ud9geTj
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (FreeBSD)

iD4DBQBDZmiRXyyEoT62BG0RAjIrAJjItg/4+0B3ox15ov2Xtf40Lf6GAJ4kCzFh
gs3UpibqAh3jo7KIqnoRkA==
=38Pu
-----END PGP SIGNATURE-----

--nextPart1298973.9I7Ud9geTj--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200510311955.13137.max>