Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 07 Oct 2002 03:48:45 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Stefan Farfeleder <e0026813@stud3.tuwien.ac.at>
Cc:        John Baldwin <jhb@FreeBSD.ORG>, Juli Mallett <jmallett@FreeBSD.ORG>, current@FreeBSD.ORG
Subject:   Re: [PATCH] Re: Junior Kernel Hacker page updated...
Message-ID:  <3DA1668D.E8153F43@mindspring.com>
References:  <20021004132203.A78223@FreeBSD.org> <XFMail.20021004163317.jhb@FreeBSD.org> <20021005135504.GA254@frog.fafoe> <3D9F39BB.66126C35@mindspring.com> <3DA12642.28BB8E1@mindspring.com> <20021007095024.GA252@frog.fafoe>

next in thread | previous in thread | raw e-mail | index | archive | help
Stefan Farfeleder wrote:
> On Sun, Oct 06, 2002 at 11:14:26PM -0700, Terry Lambert wrote:
> > Stefan: Did the patch fix it, or not?
> 
> Sorry for the long delay. No, it did not. But I now have a rather
> interesting core dump. I inserted a KASSERT, so that the code looks like
> this:
> 
>     TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe);
>     while (count) {
>         kn = TAILQ_FIRST(&kq->kq_head);
>         KASSERT(kn != NULL, ("TAILQ_FIRST returned NULL"));

[ ... ]

> panic: bremfree: bp 0xd2adf6f0 not locked

Second panic, during debugger sync.

> panic: TAILQ_FIRST returned NULL

See below...

> panic: from debugger

You, manually calling "panic" inside the debugger...

> syncing disks... panic: bremfree: bp 0xd2adf6f0 not locked

The second panic (again).

> #2  0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508

2nd panic.

> #10 0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508

Manual panic (no arguments).

> #18 0xc01babcf in panic (fmt=0x0)
> 
>     at /freebsd/current/src/sys/kern/kern_shutdown.c:494
> 
> #19 0xc01a1212 in kqueue_scan (fp=0x0, maxevents=4, ulistp=0xbfbfeb90,
> 
>     tsp=0xc754f828, td=0xc6426b60)
> 
>     at /freebsd/current/src/sys/kern/kern_event.c:717

*** OK, it's very hard to believe you didn't break into the
*** debugger and manually call "pnaic" to get this to happen.

Why?  Because the "fmt" string is 0x0, which indicates that you
called the panic manually, instead of being the address of the
string "TAILQ_FIRST returned NULL", like you'd expect.


> #19 0xc01a1212 in kqueue_scan (fp=0x0, maxevents=4, ulistp=0xbfbfeb90,
> 
>     tsp=0xc754f828, td=0xc6426b60)
> 
>     at /freebsd/current/src/sys/kern/kern_event.c:717
> 
> 717                     KASSERT(kn != NULL, ("TAILQ_FIRST returned NULL"));
> 
> (kgdb) info locals
> 
> kq = (struct kqueue *) 0xc754f800
> 
> kevp = (struct kevent *) 0xc754f828
> 
> atv = {tv_sec = 0, tv_usec = 0}
> 
> rtv = {tv_sec = 434, tv_usec = -1070420864}
> 
> ttv = {tv_sec = 1, tv_usec = -1070411616}
> 
> kn = (struct knote *) 0x0
> 
> marker = {kn_link = {sle_next = 0xc01b0d37}, kn_selnext = {
> 
>     sle_next = 0xc0368a20}, kn_tqe = {tqe_next = 0x0, tqe_prev = 0xc6650ac8},
> 
>   kn_kq = 0xc6426bcc, kn_kevent = {ident = 3344374324, filter = -30080,
> 
>     flags = 49206, fflags = 3224546432, data = 431, udata = 0xe2c9dca0},
> 
>   kn_status = 16, kn_sfflags = -1070167424, kn_sdata = 8, kn_ptr = {
> 
>     p_fp = 0xc032ac80, p_proc = 0xc032ac80}, kn_fop = 0x1af, kn_hook = 0x3}
> 
> count = 4
> 
> timeout = 0
> 
> nkev = 0
> 
> error = 0
> 
> (kgdb) p *kq
> 
> $2 = {kq_head = {tqh_first = 0x0, tqh_last = 0xc754f800}, kq_count = 1,
> 
>   kq_sel = {si_thrlist = {tqe_next = 0x0, tqe_prev = 0x0}, si_thread = 0x0,
> 
>     si_note = {slh_first = 0x0}, si_flags = 0}, kq_fdp = 0xc7571a00,
> 
>   kq_state = 0, kq_kev = {{ident = 23, filter = -1, flags = 1, fflags = 0,
> 
>       data = 69, udata = 0x80cd800}, {ident = 23, filter = -1, flags = 1,
> 
>       fflags = 0, data = 164, udata = 0x80cd800}, {ident = 27, filter = -1,
> 
>       flags = 1, fflags = 0, data = 218, udata = 0x80cf800}, {ident = 19,
> 
>       filter = -1, flags = 1, fflags = 0, data = 182, udata = 0x80cc800}, {
> 
>       ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}, {
> 
>       ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}, {
> 
>       ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}, {
> 
>       ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}}}
> 
> (kgdb) q
> 
> frog# ^Dexit
> 
> Script done on Mon Oct  7 11:32:50 2002
> 
> I'm confused why marker - if it was removed by TAILQ_REMOVE - hasn't
> kn_tqe.tqe_next and kn_tqe.tqe_prev set to (void *)-1.

OK, what this means is that the marker queue entry was removed
by something else going in there.

THis shouldn't happen.

Try adding this before the initialization of the marker data:

	bzero(&marker, sizeof(marker));

That should keep it from matching any removal criteria.  THe only
way this could keep crashing after this mod is if the queue is
being destroyed out from under you.

The implication here is that the queue should be protected by the
object lock for the object for which the pointer to the queue
instance is an element.

Fixing this would be very hard (IMO).

The next step (assuming it still panics) is to add:

	#define KQ_FREE	0x80

...and set it into kq_state on a kqueue that has been freed and/or
deallocated somewhere (then check to see if it's set, after the
panic).  Ugly, but it will tell you whether or not that's what's
happening (scanning a dead queue).

The worst case is scanning a dead queue quose memory has been
reused for some other purpose.  8-(.

I can't personally repeat the problem, so you're elected to do
the legwork on this one.  8-(.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3DA1668D.E8153F43>