From owner-freebsd-bugs Tue May 12 17:23:35 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id RAA04384 for freebsd-bugs-outgoing; Tue, 12 May 1998 17:23:35 -0700 (PDT) (envelope-from owner-freebsd-bugs@FreeBSD.ORG) Received: from freefall.freebsd.org (freefall.FreeBSD.ORG [204.216.27.21]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id RAA04373 for ; Tue, 12 May 1998 17:23:33 -0700 (PDT) (envelope-from gnats@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.8.8/8.8.5) id RAA28544; Tue, 12 May 1998 17:20:00 -0700 (PDT) Received: from marker.cs.utah.edu (marker.cs.utah.edu [155.99.212.61]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id RAA03581 for ; Tue, 12 May 1998 17:19:24 -0700 (PDT) (envelope-from sclawson@marker.cs.utah.edu) Received: (from sclawson@localhost) by marker.cs.utah.edu (8.8.8/8.8.5) id SAA16560; Tue, 12 May 1998 18:19:25 -0600 (MDT) Message-Id: <199805130019.SAA16560@marker.cs.utah.edu> Date: Tue, 12 May 1998 18:19:25 -0600 (MDT) From: stephen clawson Reply-To: sclawson@marker.cs.utah.edu To: FreeBSD-gnats-submit@FreeBSD.ORG X-Send-Pr-Version: 3.2 Subject: kern/6611: nfs_inactive can write through a dangling pointer and corrupt memory. Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >Number: 6611 >Category: kern >Synopsis: nfs_inactive can write through a dangling pointer and corrupt memory. >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue May 12 17:20:00 PDT 1998 >Last-Modified: >Originator: stephen clawson >Organization: University of Utah, CSL >Release: FreeBSD 2.2.6-BETA i386 >Environment: FreeBSD marker.cs.utah.edu 2.2.6-BETA FreeBSD 2.2.6-BETA #4: Tue May 12 14:23:58 MDT 1998 sclawson@marker.cs.utah.edu:/usr/src/sys/compile/MARKER i386 >Description: On an nfs unlink, if there's a sillyrename to deal with, nfs_inactive can block while holding a copy of the vnode's v_data pointer (which points to it's nfsnode struct) because of it's calls to nfs_vinvalbuf and nfs_removeit. However, the vnode was already put on the head of the free list (assuming VAGE is set) by vrele before it called VOP_INACTIVE on the vnode. Since it's also unlocked, the next call to getnewvode after nfs_inactive blocks will grab this vnode off the free list and call vgone on it, leading to a VOP_RECLAIM, which frees the nfsnode struct associated with the vnode. If ffs_vget is called next, when it malloc's it's inode struct it will get the newly free'd nfsnode, since in-core inodes and nfsnodes are both 256 bytes and thus come from the same bucket. The inode will then be initialized by the ffs/ufs code. When control finally returns to nfs_inactive, it's pointer is now pointing to an inode struct. Before returning, last thing nfs_inactive does is clear some flag bits. It turns out that n_flag in an nfsnode coincides with the upper half of di_db[5] in an inode struct. This can cause bits to be cleared in the 6th direct block pointer, possibly making it point to either unallocated space, or (more destructively) into another file. If the corrupted direct block pointer points to unallocated space, you'll get a ``freeing free block'' panic when an attempt is made to remove it. In the case that it points to an already allocated block, you end up freeing another file's block, which will later cause a ``freeing free block'' panic when the second file is removed. The pattern of damage is such that the upper short of the 6th direct block pointer gets anded with 0x00e7 (~0xff18). >How-To-Repeat: My standard workload included: cvs checkout cycle (from an nfs mounted cvs root tree) large build (gdb-4.17) create/remove lots (100-1000) of 64k+ files (so they've actually got a 6th block pointer). These are done both on a local filesystem and an nfs mounted filesystem, so there are a total of 6 things going on. All of them just keep cycling (cvs checkout/rm, make/make clean, etc.). Usually it takes about 20-30 minutes for the problem to show up. What makes detecting it difficult is that there's nothing that directly breaks when the corruption occurs. You only notice it if you happen to remove a corrupt file and get a ``freeing free block'' panic, or you reboot and happen to fsck the filesystem with the problem (in which case fsc will tell you about the DUP allocation, assuming that the corrupt direct block is pointing into an allocated block and not free space). To notice when the corruption was occuring, I added code to the kernel to shadow the di_db[5] into di_spare[0] and periodically checked to see if di_db[5] had changed. >Fix: The simple fix is to grab an extra reference to the vnode if there's a possiblilty that we might block. It's pretty heavy-handed, since it vget dosen't just remove the vnode from the free list, it also allocates a vm_object for it that will just get destroyed when we do a vrele on it later. =( Alternately, the nfs code could actually do locking on it's nfsnode's, but since that code still isn't done in -current... =) Or, you could muck with the vnode freelist directly. Anything that prevents the nfsnode from being free'd before nfs_inactive is done with it. NetBSD incorporated the same fix (apparently given to them from BSDI) a while ago. They only grab a reference for the call to nfs_vinvalbuf though, in the case of -stable, nfs_removeit can also block, so I just grab the reference for the entire sillyrename code section. See NetBSD-current (1.3.1 should have it) sys/nfs/nfs_node.c:nfs_inactive. diff -c -r1.13.2.1 nfs_node.c *** nfs_node.c 1997/05/14 08:19:27 1.13.2.1 --- nfs_node.c 1998/05/11 17:59:21 *************** *** 202,207 **** --- 202,215 ---- } else sp = (struct sillyrename *)0; if (sp) { + /* + * We need a reference to keep the vnode from being + * recycled by getnewvnode while we do the I/O + * associated with discarding the buffers. + */ + if (vget(ap->a_vp, 0)) + panic("nfs_inactive: lost vnode"); + /* * Remove the silly file that was rename'd earlier */ *************** *** 210,215 **** --- 218,228 ---- crfree(sp->s_cred); vrele(sp->s_dvp); FREE((caddr_t)sp, M_NFSREQ); + + /* XXX Play it safe and release our reference + * after we're done. + */ + vrele(ap->a_vp); } np->n_flag &= (NMODIFIED | NFLUSHINPROG | NFLUSHWANT | NQNFSEVICTED | NQNFSNONCACHE | NQNFSWRITE); >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message