Date: Mon, 07 Oct 2002 03:48:45 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Stefan Farfeleder <e0026813@stud3.tuwien.ac.at> Cc: John Baldwin <jhb@FreeBSD.ORG>, Juli Mallett <jmallett@FreeBSD.ORG>, current@FreeBSD.ORG Subject: Re: [PATCH] Re: Junior Kernel Hacker page updated... Message-ID: <3DA1668D.E8153F43@mindspring.com> References: <20021004132203.A78223@FreeBSD.org> <XFMail.20021004163317.jhb@FreeBSD.org> <20021005135504.GA254@frog.fafoe> <3D9F39BB.66126C35@mindspring.com> <3DA12642.28BB8E1@mindspring.com> <20021007095024.GA252@frog.fafoe>
next in thread | previous in thread | raw e-mail | index | archive | help
Stefan Farfeleder wrote: > On Sun, Oct 06, 2002 at 11:14:26PM -0700, Terry Lambert wrote: > > Stefan: Did the patch fix it, or not? > > Sorry for the long delay. No, it did not. But I now have a rather > interesting core dump. I inserted a KASSERT, so that the code looks like > this: > > TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe); > while (count) { > kn = TAILQ_FIRST(&kq->kq_head); > KASSERT(kn != NULL, ("TAILQ_FIRST returned NULL")); [ ... ] > panic: bremfree: bp 0xd2adf6f0 not locked Second panic, during debugger sync. > panic: TAILQ_FIRST returned NULL See below... > panic: from debugger You, manually calling "panic" inside the debugger... > syncing disks... panic: bremfree: bp 0xd2adf6f0 not locked The second panic (again). > #2 0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508 2nd panic. > #10 0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508 Manual panic (no arguments). > #18 0xc01babcf in panic (fmt=0x0) > > at /freebsd/current/src/sys/kern/kern_shutdown.c:494 > > #19 0xc01a1212 in kqueue_scan (fp=0x0, maxevents=4, ulistp=0xbfbfeb90, > > tsp=0xc754f828, td=0xc6426b60) > > at /freebsd/current/src/sys/kern/kern_event.c:717 *** OK, it's very hard to believe you didn't break into the *** debugger and manually call "pnaic" to get this to happen. Why? Because the "fmt" string is 0x0, which indicates that you called the panic manually, instead of being the address of the string "TAILQ_FIRST returned NULL", like you'd expect. > #19 0xc01a1212 in kqueue_scan (fp=0x0, maxevents=4, ulistp=0xbfbfeb90, > > tsp=0xc754f828, td=0xc6426b60) > > at /freebsd/current/src/sys/kern/kern_event.c:717 > > 717 KASSERT(kn != NULL, ("TAILQ_FIRST returned NULL")); > > (kgdb) info locals > > kq = (struct kqueue *) 0xc754f800 > > kevp = (struct kevent *) 0xc754f828 > > atv = {tv_sec = 0, tv_usec = 0} > > rtv = {tv_sec = 434, tv_usec = -1070420864} > > ttv = {tv_sec = 1, tv_usec = -1070411616} > > kn = (struct knote *) 0x0 > > marker = {kn_link = {sle_next = 0xc01b0d37}, kn_selnext = { > > sle_next = 0xc0368a20}, kn_tqe = {tqe_next = 0x0, tqe_prev = 0xc6650ac8}, > > kn_kq = 0xc6426bcc, kn_kevent = {ident = 3344374324, filter = -30080, > > flags = 49206, fflags = 3224546432, data = 431, udata = 0xe2c9dca0}, > > kn_status = 16, kn_sfflags = -1070167424, kn_sdata = 8, kn_ptr = { > > p_fp = 0xc032ac80, p_proc = 0xc032ac80}, kn_fop = 0x1af, kn_hook = 0x3} > > count = 4 > > timeout = 0 > > nkev = 0 > > error = 0 > > (kgdb) p *kq > > $2 = {kq_head = {tqh_first = 0x0, tqh_last = 0xc754f800}, kq_count = 1, > > kq_sel = {si_thrlist = {tqe_next = 0x0, tqe_prev = 0x0}, si_thread = 0x0, > > si_note = {slh_first = 0x0}, si_flags = 0}, kq_fdp = 0xc7571a00, > > kq_state = 0, kq_kev = {{ident = 23, filter = -1, flags = 1, fflags = 0, > > data = 69, udata = 0x80cd800}, {ident = 23, filter = -1, flags = 1, > > fflags = 0, data = 164, udata = 0x80cd800}, {ident = 27, filter = -1, > > flags = 1, fflags = 0, data = 218, udata = 0x80cf800}, {ident = 19, > > filter = -1, flags = 1, fflags = 0, data = 182, udata = 0x80cc800}, { > > ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}, { > > ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}, { > > ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}, { > > ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}}} > > (kgdb) q > > frog# ^Dexit > > Script done on Mon Oct 7 11:32:50 2002 > > I'm confused why marker - if it was removed by TAILQ_REMOVE - hasn't > kn_tqe.tqe_next and kn_tqe.tqe_prev set to (void *)-1. OK, what this means is that the marker queue entry was removed by something else going in there. THis shouldn't happen. Try adding this before the initialization of the marker data: bzero(&marker, sizeof(marker)); That should keep it from matching any removal criteria. THe only way this could keep crashing after this mod is if the queue is being destroyed out from under you. The implication here is that the queue should be protected by the object lock for the object for which the pointer to the queue instance is an element. Fixing this would be very hard (IMO). The next step (assuming it still panics) is to add: #define KQ_FREE 0x80 ...and set it into kq_state on a kqueue that has been freed and/or deallocated somewhere (then check to see if it's set, after the panic). Ugly, but it will tell you whether or not that's what's happening (scanning a dead queue). The worst case is scanning a dead queue quose memory has been reused for some other purpose. 8-(. I can't personally repeat the problem, so you're elected to do the legwork on this one. 8-(. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3DA1668D.E8153F43>