Date: Wed, 1 Sep 1999 07:24:37 +0930 From: Greg Lehey <grog@lemis.com> To: Bernd Walter <ticso@cicely.de> Cc: Matthew Dillon <dillon@apollo.backplane.com>, Mike Smith <mike@smith.net.au>, Parag Patel <parag@cgt.com>, freebsd-current@FreeBSD.ORG Subject: Help needed with debugging (was: 4.0-CURRENT SMP crash with vinum raid-5 and softupdates) Message-ID: <19990901072437.A86067@freebie.lemis.com> In-Reply-To: <19990830075311.A30271@cicely8.cicely.de>; from Bernd Walter on Mon, Aug 30, 1999 at 07:53:12AM %2B0200 References: <199908292224.PAA15435@dingo.cdrom.com> <199908292348.QAA07774@apollo.backplane.com> <19990830075311.A30271@cicely8.cicely.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, 30 August 1999 at 7:53:12 +0200, Bernd Walter wrote: > On Sun, Aug 29, 1999 at 04:48:32PM -0700, Matthew Dillon wrote: >> : >> :How similar? The trap above is extremely bad; it looks like a return >> :on a corrupted stack or a jump through a null function vector. >> : >> :Make very sure that your vinum kld is in sync with your kernel. >> >> This looks like an indirect call through a NULL function pointer. > > In my case it is a call to bp->b_iodone in kern/vfs_bio.c:2580 which is 0 :( We seem to have a specific problem here. To summarize: 1. It was first reported (by Bernd) on 15 August. 2. The system crashes in biodone because the buffer header has the B_CALL flag set, but the value of bp->b_iodone is NULL. 3. bp->b_iodone is only set to NULL (well, 0) in one place: getnewbuf, which I don't call. It gets reset to a previous value in dsiodone, which is about the only place where things could conceivably go wrong. I put some check code in at every conceivable place, and the only place which found the situation was in biodone. 4. In every case, the fields just before b_iodone were also zeroed: b_dev = 0xc098d840, b_data = 0xc0befc00 "\2275\t", b_kvabase = 0x0, b_kvasize = 0x0, b_lblkno = 0x0, b_blkno = 0x0, b_offset = 0x0, b_iodone = 0, b_iodone_chain = 0x0, b_vp = 0xc5d4a700, The b_dev and b_vp fields are OK, and b_data looks OK as well. I'm guessing that something is overwriting some of the fields in the header, but it's always the same, and I can't find anything in the code that does that. 5. In one case yesterday, there were two requests involved in the vinum request. The other one had already completed (been through biodone), but the bp->b_iodone word was zeroed out in the same manner. In addition, other fields have been set by Vinum's iodone function. From this I deduce that the fields were zeroed after biodone, which makes it very unlikely that it was done by Vinum. I don't know how to proceed at the moment. Matt Dillon has suggested adding some dummy fields in the buffer header and setting them to known values, but I expect this will drive the problem into hiding. Instead, I'm migrating the whole thing to -STABLE to see if it happens there. In view of the impending release of 3.3, this makes sense anyway. In the meantime, if anybody has any ideas, or if any of this rings a bell, I'd be grateful for feedback. Greg -- See complete headers for address, home page and phone numbers finger grog@lemis.com for PGP public key To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990901072437.A86067>