Date: Mon, 30 Aug 1999 11:06:43 +0930 From: Greg Lehey <grog@lemis.com> To: Parag Patel <parag@cgt.com> Cc: Matthew Dillon <dillon@apollo.backplane.com>, Mike Smith <mike@smith.net.au>, Bernd Walter <ticso@cicely.de>, freebsd-current@FreeBSD.ORG Subject: Re: 4.0-CURRENT SMP crash with vinum raid-5 and softupdates Message-ID: <19990830110643.V13904@freebie.lemis.com> In-Reply-To: <78740.935975313@pinhead.parag.codegen.com>; from Parag Patel on Sun, Aug 29, 1999 at 06:08:33PM -0700 References: <dillon@apollo.backplane.com> <78740.935975313@pinhead.parag.codegen.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sunday, 29 August 1999 at 18:08:33 -0700, Parag Patel wrote: > On Sun, 29 Aug 1999 16:48:32 PDT, Matthew Dillon wrote: > >> This looks like an indirect call through a NULL function pointer. > > Wow - I'm impressed! You really know your kernel debugging! :) It is a > null-pointer dereference that's crashing it. Here's the gdb stack > back-trace: > > (gdb) bt > #0 0x0 in ?? () > #1 0xc017afc3 in biodone (bp=0xc1550c18) at ../../kern/vfs_bio.c:2580 > #2 0xc0126c2a in dadone (periph=0xc13de980, done_ccb=0xc163a000) > at ../../cam/scsi/scsi_da.c:1295 > #3 0xc01228e7 in camisr (queue=0xc02bd854) at ../../cam/cam_xpt.c:6223 > #4 0xc01226f9 in swi_cambio () at ../../cam/cam_xpt.c:6130 > > The contents of the "bp" pointer are dumped below. Sure enough, > bp->b_iodone is indeed NULL, but I have no idea how or why. Perhaps it > is the flags that are incorrectly stating it is a B_CALL? > > I can print out other structs and such - I'll leave the system as-is for > now. Thanks! > >> The stack looks intact ... look at the sp verses the frame pointer. >> If the 'trace' command is resulting in a panic, perhaps it is because >> there is no new stack frame. Giving the trace command an argument >> will help. > > (gdb) up > #1 0xc017afc3 in biodone (bp=0xc1550c18) at ../../kern/vfs_bio.c:2580 > 2580 (*bp->b_iodone) (bp); > (gdb) l > 2575 } > 2576 > 2577 /* call optional completion function if requested */ > 2578 if (bp->b_flags & B_CALL) { > 2579 bp->b_flags &= ~B_CALL; > 2580 (*bp->b_iodone) (bp); > 2581 splx(s); > 2582 return; > 2583 } > 2584 if (LIST_FIRST(&bp->b_dep) != NULL && bioops.io_complete) > (gdb) p bp > $6 = (struct buf *) 0xc1550c18 > (gdb) p *bp > $7 = {b_hash = {le_next = 0x0, le_prev = 0x0}, b_vnbufs = {tqe_next = 0x0, > tqe_prev = 0x0}, b_freelist = {tqe_next = 0x0, tqe_prev = 0x0}, b_act = { > tqe_next = 0x0, tqe_prev = 0xc13e4400}, b_flags = 134218244, b_qindex = 0, > b_unused1 = 0 '\000', b_xflags = 0 '\000', b_lock = {lk_interlock = { > lock_data = 0}, lk_flags = 1024, lk_sharecount = 0, lk_waitcount = 0, > lk_exclusivecount = 1, lk_prio = 20, lk_wmesg = 0x0, lk_timo = 0, > lk_lockholder = 5}, b_error = 0, b_bufsize = 8192, b_bcount = 8192, > b_resid = 0, b_dev = 0xc1434d00, b_data = 0xc1954000 "ķA\003", > b_kvabase = 0x0, b_kvasize = 0, b_lblkno = 0, b_blkno = 0, b_offset = 0, > b_iodone = 0, b_iodone_chain = 0x0, b_vp = 0xd5725980, b_dirtyoff = 0, > b_dirtyend = 0, b_rcred = 0xffffffff, b_wcred = 0x0, b_pblkno = 1389417, > b_saveaddr = 0x0, b_driver1 = 0x0, b_driver2 = 0x0, b_caller1 = 0x0, > b_caller2 = 0x0, b_pager = {pg_spc = 0x0, pg_reqpage = 0}, b_cluster = { > cluster_head = {tqh_first = 0x0, tqh_last = 0x0}, cluster_entry = { > tqe_next = 0x0, tqe_prev = 0x0}}, b_pages = {0x0 <repeats 32 times>}, > b_npages = 0, b_dep = {lh_first = 0x0}, b_chain = {parent = 0x0, count = 0}} > (gdb) OK, this is indeed the same problem that Bernd has been seeing. B_CALL should be set, but so should b_iodone. In the header B_CALL isn't set, but that's not surprising in view of the code. What I don't understand is where b_iodone went. I *always* set b_iodone, and I've been through the entire source tree looking for where it might get reset. The only place is in geteblk, when allocating a fresh buffer. I can't make head or tail of it. Try these patches. They won't fix the problem, but they may help localize it: Index: kern/vfs_bio.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v retrieving revision 1.227 diff -w -u -r1.227 vfs_bio.c --- vfs_bio.c 1999/08/28 00:46:23 1.227 +++ vfs_bio.c 1999/08/30 01:31:00 @@ -2576,6 +2576,8 @@ /* call optional completion function if requested */ if (bp->b_flags & B_CALL) { + if (bp->b_iodone == NULL) + Debugger ("biodone"); bp->b_flags &= ~B_CALL; (*bp->b_iodone) (bp); splx(s); Index: dev/vinum/vinumrequest.c =================================================================== RCS file: /home/ncvs/src/sys/dev/vinum/vinumrequest.c,v retrieving revision 1.35 diff -w -u -r1.35 vinumrequest.c --- vinumrequest.c 1999/08/28 00:42:42 1.35 +++ vinumrequest.c 1999/08/30 01:26:58 @@ -396,6 +390,9 @@ if (debug & DEBUG_LASTREQS) logrq(loginfo_rqe, (union rqinfou) rqe, rq->bp); #endif + if ((rqe->b.b_flags & B_CALL) + && (rqe->b.b_iodone == NULL) ) + Debugger ("launch_requests"); /* fire off the request */ BUF_STRATEGY(&rqe->b, 0); } You could also enable some of Vinum's internal logging: # vinum debug 324 This will log all requests in an internal buffer. With the .gdbinit files in /sys/modules/vinum, you can display them with the gdb command 'rrqi'. Alternatively, if I can connect to the debug console, I'll look for myself. Greg - See complete headers for address, home page and phone numbers finger grog@lemis.com for PGP public key To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990830110643.V13904>