Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Jan 2016 11:29:34 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: panic ffs_truncate3 (maybe fuse being evil)
Message-ID:  <20160114092934.GL72455@kib.kiev.ua>
In-Reply-To: <1351730674.159022044.1452699617235.JavaMail.zimbra@uoguelph.ca>
References:  <1696608910.154845456.1452438117036.JavaMail.zimbra@uoguelph.ca> <20160110154518.GU3625@kib.kiev.ua> <1773157955.158922767.1452698181137.JavaMail.zimbra@uoguelph.ca> <1351730674.159022044.1452699617235.JavaMail.zimbra@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jan 13, 2016 at 10:40:17AM -0500, Rick Macklem wrote:
> > Well, although I have r174973 in the kernel that crashes, it looks like this
> > bug might have been around for a while.
> > Here's what I've figured out sofar.
> > 1 - The crashes only occur if soft updates are disabled. This isn't
> > surprising
> >     if you look at ffs_truncate(), since the test for the panic isn't done
> >     when soft updates are enabled.
> > Here's the snippet from ffs_truncate(), in case you are interested:
> >        if (DOINGSOFTDEP(vp)) {
> > 335 	                if (softdeptrunc == 0 && journaltrunc == 0) {
> > 336 	                        /*
> > 337 	                         * If a file is only partially truncated, then
> > 338 	                         * we have to clean up the data structures
> > 339 	                         * describing the allocation past the truncation
> > 340 	                         * point. Finding and deallocating those
> > structures
> > 341 	                         * is a lot of work. Since partial truncation
> > occurs
> > 342 	                         * rarely, we solve the problem by syncing the
> > file
> > 343 	                         * so that it will have no data structures left.
> > 344 	                         */
> > 345 	                        if ((error = ffs_syncvnode(vp, MNT_WAIT, 0)) !=
> > 0)
> > 346 	                                return (error);
> > 347 	                } else {
> > 348 	                        flags = IO_NORMAL | (needextclean ? IO_EXT: 0);
> > 349 	                        if (journaltrunc)
> > 350 	                                softdep_journal_freeblocks(ip, cred,
> > length,
> > 351 	                                    flags);
> > 352 	                        else
> > 353 	                                softdep_setup_freeblocks(ip, length,
> > flags);
> > 354 	                        ASSERT_VOP_LOCKED(vp, "ffs_truncate1");
> > 355 	                        if (journaltrunc == 0) {
> > 356 	                                ip->i_flag |= IN_CHANGE | IN_UPDATE;
> > 357 	                                error = ffs_update(vp, 0);
> > 358 	                        }
> > 359 	                        return (error);
> > 360 	                }
> > 361 	        }
> > You can see that it always returns once in this code block. The only way the
> > code can get
> > past this block if soft updates are enabled is a "goto extclean;", which
> > takes you past
> > the "panic()".
I see.  Indeed, 'new' SU(+J) code invalidates and frees buffers in
ffs_softdep.c:indir_trunc().  Due to the nature of the SU code, which
attaches dependencies to the buffers, it is somewhat problematic to
assert same condition, since buffers might survive longer due to
unsatisfied dependencies.

> > 
> > By adding a few printf()s, I have determined:
> > - The bo_clean.bv_cnt == 1 when the panic occurs and the b_lblkno of the
> > buffer is -ve.
What is the exact value of lblkno ?

There are two kinds of buffers with negative lblk which may exist on
the UFS vnode queues.  One is indirect block, and another is the metadata
block.  Metadata lblks are -1 and -2, indir blocks numbers are calculated
by formula you can see in calculation of indir_lbn[] in ffs_truncate().

> > 
> > If you look at vtruncbuf():
> >         trunclbn = (length + blksize - 1) / blksize;
> > 1726
> > 1727 	        ASSERT_VOP_LOCKED(vp, "vtruncbuf");
> > 1728 	restart:
> > 1729 	        bo = &vp->v_bufobj;
> > 1730 	        BO_LOCK(bo);
> > 1731 	        anyfreed = 1;
> > 1732 	        for (;anyfreed;) {
> > 1733 	                anyfreed = 0;
> > 1734 	                TAILQ_FOREACH_SAFE(bp, &bo->bo_clean.bv_hd, b_bobufs,
> > nbp) {
> > 1735 	                        if (bp->b_lblkno < trunclbn)
> > 1736 	                                continue;
> > When length == 0 --> trunclbn is 0, but the test at line#1735 will skip over
> > the b_lblkno
> > because it is negative.
> > 
> > That is as far as I've gotten. A couple of things I need help from others on:
> > - Is vtruncbuf() skipping over the cases where b_lblkno < 0 a feature or a
> > bug?
It is a feature, since vtruncbuf() intent is truncation of the normal
file data.  Negative lbns are used as a trick to manage metadata or other
data not normally accessible to the read/write syscalls, but which still
is reasonable to consider inode data.

> > - If it is a feature, then what needs to be done in the code after the
> > vtruncbuf()
> >   call in ffs_truncate() to ensure the buffer is gone by the time the panic
> >   check is
> >   done?
> >   --> I do see a bunch of code after the vtruncbuf() call related to indirect
> >   blocks
> >      (which I think use the -ve b_lblkno?), but I'll admit I don't understand
> >      it well
> >       enough to know if it expects vtruncbuf() to leave the -ve block on the
> >       bo_hd list?
Either the left blocks are extattrs, and then the assert should be improved,
or we trully leak buffers and the leak must be fixed.  We will see  by
the block number.

> > 
> > Obviously fixing vtruncbuf() to get rid of these -ve b_lblkno entries would
> > be easy,
> > but I don't know if that is a feature or a bug?
> > 
> > I did look at the commit logs and vtruncbuf() has been like this for at least
> > 10years.
> > (I can only guess very few run UFS without soft updates or others would see
> > these panic()s.)
> > 
> > I am now running with soft updates enabled to avoid the crashes, but I can
> > easily test any
> > patch if others can a patch to try.
> > 
> Oh, and one more thing.
> Maybe having the buffer for an indirect block hanging off the vnode at the
> end of ffs_truncate() to 0 length is ok. After all, this is happening in
> VOP_INACTIVE() and the vnode isn't being recycled yet?
> (ie. The panic() test is not needed?)
See above, depends on kind of the block.

But ufs_inactive() truncating the inode means that the nlinks went to zero,
and the inode is going to be freed.  So we better not to leak anything.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160114092934.GL72455>