Skip site navigation (1)Skip section navigation (2)
Date:      13 May 1998 12:01:51 +0100
From:      Simon Marlow <simonm@dcs.gla.ac.uk>
To:        freebsd-bugs@FreeBSD.ORG
Subject:   Re: kern/6611: nfs_inactive can write through a dangling pointer and  corrupt memory.
Message-ID:  <t6zpgm1ljz.fsf@solander.dcs.gla.ac.uk>
In-Reply-To: stephen clawson's message of Tue, 12 May 1998 18:19:25 -0600 (MDT)
References:  <199805130019.SAA16560@marker.cs.utah.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
stephen clawson <sclawson@marker.cs.utah.edu> writes:

>      The pattern of damage is such that the upper short of the 6th
> direct block pointer gets anded with 0x00e7 (~0xff18).

And this could be part of a symbolic link, if the link is stored in
the inode, couldn't it?  If so, I've seen this: it shows up quite
easily by doing a large lndir.  I quite often get corrupted links,
with the corruption matching what Steven says above.

Thanks Steven!



> 
> 
> >How-To-Repeat:
> 
>      My standard workload included:
> 
> 	cvs checkout cycle (from an nfs mounted cvs root tree)
> 	large build (gdb-4.17)
> 	create/remove lots (100-1000) of 64k+ files (so they've
> 	  actually got a 6th block pointer).
> 
>      These are done both on a local filesystem and an nfs mounted
> filesystem, so there are a total of 6 things going on.  All of them
> just keep cycling (cvs checkout/rm, make/make clean, etc.).  Usually
> it takes about 20-30 minutes for the problem to show up.
> 
>      What makes detecting it difficult is that there's nothing that
> directly breaks when the corruption occurs.  You only notice it if you
> happen to remove a corrupt file and get a ``freeing free block''
> panic, or you reboot and happen to fsck the filesystem with the
> problem (in which case fsc will tell you about the DUP allocation, 
> assuming that the corrupt direct block is pointing into an allocated
> block and not free space).
> 
>      To notice when the corruption was occuring, I added code to the
> kernel to shadow the di_db[5] into di_spare[0] and periodically
> checked to see if di_db[5] had changed.
> 
> 
> >Fix:
> 
>      The simple fix is to grab an extra reference to the vnode if
> there's a possiblilty that we might block.  It's pretty heavy-handed, 
> since it vget dosen't just remove the vnode from the free list, it
> also allocates a vm_object for it that will just get destroyed 
> when we do a vrele on it later. =(  
>  
>      Alternately, the nfs code could actually do locking on it's 
> nfsnode's, but since that code still isn't done in -current... =)
> Or, you could muck with the vnode freelist directly.  Anything that
> prevents the nfsnode from being free'd before nfs_inactive is done
> with it. 
> 
>      NetBSD incorporated the same fix (apparently given to them 
> from BSDI) a while ago.  They only grab a reference for the call
> to nfs_vinvalbuf though, in the case of -stable, nfs_removeit can 
> also block, so I just grab the reference for the entire sillyrename
> code section.  See NetBSD-current (1.3.1 should have it) 
> sys/nfs/nfs_node.c:nfs_inactive.
> 
> 	
> diff -c -r1.13.2.1 nfs_node.c
> *** nfs_node.c  1997/05/14 08:19:27     1.13.2.1
> --- nfs_node.c  1998/05/11 17:59:21
> ***************
> *** 202,207 ****
> --- 202,215 ----
>         } else
>                 sp = (struct sillyrename *)0;
>         if (sp) {
> +                 /*
> +                  * We need a reference to keep the vnode from being
> +                  * recycled by getnewvnode while we do the I/O
> +                  * associated with discarding the buffers.
> +                  */
> +               if (vget(ap->a_vp, 0))
> +                       panic("nfs_inactive: lost vnode");
> +
>                 /*
>                  * Remove the silly file that was rename'd earlier
>                  */
> ***************
> *** 210,215 ****
> --- 218,228 ----
>                 crfree(sp->s_cred);
>                 vrele(sp->s_dvp);
>                 FREE((caddr_t)sp, M_NFSREQ);
> +
> +                 /* XXX Play it safe and release our reference
> +                  * after we're done.
> +                  */
> +                 vrele(ap->a_vp);
>         }
>         np->n_flag &= (NMODIFIED | NFLUSHINPROG | NFLUSHWANT | NQNFSEVICTED |
>                 NQNFSNONCACHE | NQNFSWRITE);
> 
> 
> >Audit-Trail:
> >Unformatted:
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-bugs" in the body of the message
> 

-- 
-- 
Simon Marlow						 simonm@dcs.gla.ac.uk
University of Glasgow			    http://www.dcs.gla.ac.uk/~simonm/
finger for PGP public key

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?t6zpgm1ljz.fsf>