Date: Sat, 02 Aug 2003 17:56:49 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Greg 'groggy' Lehey <grog@FreeBSD.org> Cc: current@freebsd.org Subject: Re: Yet another crash in FreeBSD 5.1 Message-ID: <3F2C5DD1.36570B38@mindspring.com> References: <1079.192.168.0.3.1059811884.squirrel@webmail.aminor.no> <3F2B803C.21D38E0B@mindspring.com> <20030803000302.GE95375@wantadilla.lemis.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Greg 'groggy' Lehey wrote: > > You don't actually need a crash dump to debug a stack traceback. > > Great! So you know the answer? Please submit a patch. > > Seriously, this is nonsense. Yes, it's a null pointer dereference. > What? That is precisely what doing what I suggested discovers, Greg. If you haven't seen his response posting: (kgdb) list *(g_dev_strategy+29) 0xc02e812d is in g_dev_strategy (/usr/src/sys/geom/geom_dev.c:415). 410 KASSERT(cp->acr || cp->acw, 411 ("Consumer with zero access count in g_dev_strategy")); 412 413 bp2 = g_clone_bio(bp); 414 KASSERT(bp2 != NULL, ("XXX: ENOMEM in a bad place")); 415 bp2->bio_offset = (off_t)bp->bio_blkno << DEV_BSHIFT; 416 KASSERT(bp2->bio_offset >= 0, 417 ("Negative bio_offset (%jd) on bio %p", 418 (intmax_t)bp2->bio_offset, bp)); 419 bp2->bio_length = (off_t)bp->bio_bcount; Clearly, bp2 or bp is NULL at the time of the dereference. > Why? Programmer error. Either bp2 or bp is a NULL pointer. > How do you fix it? It depends on the root cause. If the root cause is that the bp is NULL, then I'd hope that it would have been caught higher up; if it wasn't, then I'd hope that g_clone_bio(bp) would have returned NULL. Is the KASSERT() active at the time of the problem? I don't know; if it isn't, it probably should be converted to an if()...panic(). If it is, then I'd have to expect that the validity fell out from under it as a result of an interrupt, preemption, reentrancy (if the locking didn't prevent it) or SMP races (if the locking didn't prevent it). I really can't answer it for the same reason that I couldn't locate the line in the source code that was failing for him from his posting of hex offsets into functions compiled from unknown source code: I don't have his object set for the problem in question, nor his debug kernel. > Finding the first step doesn't solve the problem. No. Finding the first step is *necessary* to solving the problem, but you are entirely correct in pointing out that it's not in itself *sufficient*. But it's one step farther along than he was. I didn't see anyone else helping him take that first step, so I did. -- Terry
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3F2C5DD1.36570B38>