From owner-freebsd-bugs Wed Jul 19 05:24:15 1995 Return-Path: bugs-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.11/8.6.6) id FAA21640 for bugs-outgoing; Wed, 19 Jul 1995 05:24:15 -0700 Received: from blob.best.net (blob.best.net [204.156.128.88]) by freefall.cdrom.com (8.6.11/8.6.6) with ESMTP id FAA21634 for ; Wed, 19 Jul 1995 05:24:14 -0700 Received: (dillon@localhost) by blob.best.net (8.6.12/8.6.5) id FAA26031; Wed, 19 Jul 1995 05:24:07 -0700 Date: Wed, 19 Jul 1995 05:24:07 -0700 From: Matt Dillon Message-Id: <199507191224.FAA26031@blob.best.net> To: bugs@freebsd.org Subject: probable race condition in ufs/ffs/ffs_vfsops.c:ffs_vget() Sender: bugs-owner@freebsd.org Precedence: bulk We've been getting the following panic: panic: ffs_valloc: dup alloc It took a long while, and I could find no *direct* cause of the panic. Fortunately I had a debug kernel and a crash dump to work with. I still have it in case this doesn't turn out to solve the problem. It would appear that the inode that was allocated from the bitmap was VERY much in use... a non-zero length REG file with very valid-looking fields. The weird thing is that the latest access/modify/change timestamp on the inode was several HUNDRED seconds earlier then the time of the crash. I believe I have found the problem... a race condition in ffs_vget(). Here's a synopsis: (1) lookup (dev,ino) in hash table, return on success (2) allocate new vnode and new inode structure MALLOC(..., M_WAITOK) for the inode (3) enter new inode into hash table. The problem is that MALLOC() can block. If it does, you can potentially have TWO processes attempt to lookup an uncached inode simultaniously in a low memory situation. The MALLOC() blocks until memory is available, both processes unblock *AFTER* having determined that the inode wasn't cached, and *both* processes allocate new vnode/inode structures representing the *same* inode and enter both of them into the hash table. At some point in the future the inode is deallocated and the bitmap for it cleared, but this only removes one of the two cached inode structures. sync() comes along and commits the other one... poof, you now have an active inode on the platter whos bitmap entry is cleared. At some later time someone tries to create a new file and BANG it hits the screwed inode. The solution, as far as I can tell, is to check the hash table after MALLOC returns as well as before to determine if another process beat us to it. I put the following code just before the ufs_ihashins(). I do NOT know whether this code fixes the problem yet or even if the code is valid in terms of freeing the right stuff before returning... (I'll tell you in a few days re: the crashes... I'll either get more panics or I will not). #if 1 if ((*vpp = ufs_ihashget(dev, ino)) != NULL) { vp->v_data = NULL; vput(vp); printf("INODE COLLISION: %d\n", ino); FREE(ip, type); return (0); } #endif ... ufs_ihashins(ip); ... etc... -Matt