Date: Tue, 2 Mar 1999 13:14:51 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Matthew Jacob <mjacob@feral.com>, freebsd-hackers@FreeBSD.ORG Subject: Re: Panic in FFS/4.0 as of yesterday - update Message-ID: <199903022114.NAA57990@apollo.backplane.com> References: <Pine.LNX.4.04.9902260950020.985-100000@feral-gw>
index | next in thread | previous in thread | raw e-mail
Ok, we're making progress. I found a major bug ( that Julian is
committing now ). Kirk has already comitted fixes to ffs_fsync() for
softupdates/NFS combinations and has some alternative code in softupdates
for a BMSAFEMAP related issue.
The bugfix is also in the queue to be committed into -3.x and hopefully
also 2.2.x after we resolve a minor issue that John has brought up.
It's a very serious bug though, gladly, it does not happen very often.
Basically the getblk() code in kern/vfs_bio.c has a section:
/*
* This code is used to make sure that a buffer is not
* created while the getnewbuf routine is blocked.
* Normally the vnode is locked so this isn't a problem.
* VBLK type I/O requests, however, don't lock the vnode.
*/
if (VOP_ISLOCKED(vp) != LK_EXCLUSIVE && gbincore(vp, blkno)) {
bp->b_flags |= B_INVAL;
brelse(bp);
goto loop;
}
Which really should be:
if (gbincore(vp, blkno)) {
bp->b_flags |= B_INVAL;
brelse(bp);
goto loop;
}
The problem is that the original comment implies that getblk() might be
called without the vnode locked. This does, in fact, happen. Ok... but
that doesn't mean you can avoid checking gbincore() if you happen to
find the vnode locked!
Thus, the bogus VOP_ISLOCKED check can result in the system creating
duplicate buffers for the same block.
Needless to say, this can result in the complete destruction of
directories, bitmaps, and filedata, as well as to duplicate allocation
of blocks and other bad things. I believe this bug to be responsible
for the 5 or 6 times ( over 4.5 years and 40+ machines ) that BEST has
experienced severe filesystem corruption after a crash.
--
The new VFS BIO and NFS fixes are still in the commit queue and under
review, but I expect to get them committed into -4.x soon. Specifically,
the new getnewbuf() code solves a low memory lockup problem and a more
serious supervisor stack overflow problem ( on machines which have deep
VFS call stacks, such as when you use the VN device ). Fixes to NFS's
B_DONE handling are part of this mess too.
-Matt
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199903022114.NAA57990>
