Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Jan 2016 22:14:18 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: panic ffs_truncate3 (maybe fuse being evil)
Message-ID:  <964333498.161527381.1452827658163.JavaMail.zimbra@uoguelph.ca>
In-Reply-To: <20160114092934.GL72455@kib.kiev.ua>
References:  <1696608910.154845456.1452438117036.JavaMail.zimbra@uoguelph.ca> <20160110154518.GU3625@kib.kiev.ua> <1773157955.158922767.1452698181137.JavaMail.zimbra@uoguelph.ca> <1351730674.159022044.1452699617235.JavaMail.zimbra@uoguelph.ca> <20160114092934.GL72455@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Kostik wrote:
> On Wed, Jan 13, 2016 at 10:40:17AM -0500, Rick Macklem wrote:
> > > Well, although I have r174973 in the kernel that crashes, it looks like
> > > this
> > > bug might have been around for a while.
> > > Here's what I've figured out sofar.
> > > 1 - The crashes only occur if soft updates are disabled. This isn't
> > > surprising
> > >     if you look at ffs_truncate(), since the test for the panic isn't
> > >     done
> > >     when soft updates are enabled.
> > > Here's the snippet from ffs_truncate(), in case you are interested:
> > >        if (DOINGSOFTDEP(vp)) {
> > > 335 	                if (softdeptrunc == 0 && journaltrunc == 0) {
> > > 336 	                        /*
> > > 337 	                         * If a file is only partially truncated,
> > > then
> > > 338 	                         * we have to clean up the data structures
> > > 339 	                         * describing the allocation past the
> > > truncation
> > > 340 	                         * point. Finding and deallocating those
> > > structures
> > > 341 	                         * is a lot of work. Since partial
> > > truncation
> > > occurs
> > > 342 	                         * rarely, we solve the problem by syncing
> > > the
> > > file
> > > 343 	                         * so that it will have no data structures
> > > left.
> > > 344 	                         */
> > > 345 	                        if ((error = ffs_syncvnode(vp, MNT_WAIT, 0))
> > > !=
> > > 0)
> > > 346 	                                return (error);
> > > 347 	                } else {
> > > 348 	                        flags = IO_NORMAL | (needextclean ? IO_EXT:
> > > 0);
> > > 349 	                        if (journaltrunc)
> > > 350 	                                softdep_journal_freeblocks(ip, cred,
> > > length,
> > > 351 	                                    flags);
> > > 352 	                        else
> > > 353 	                                softdep_setup_freeblocks(ip, length,
> > > flags);
> > > 354 	                        ASSERT_VOP_LOCKED(vp, "ffs_truncate1");
> > > 355 	                        if (journaltrunc == 0) {
> > > 356 	                                ip->i_flag |= IN_CHANGE | IN_UPDATE;
> > > 357 	                                error = ffs_update(vp, 0);
> > > 358 	                        }
> > > 359 	                        return (error);
> > > 360 	                }
> > > 361 	        }
> > > You can see that it always returns once in this code block. The only way
> > > the
> > > code can get
> > > past this block if soft updates are enabled is a "goto extclean;", which
> > > takes you past
> > > the "panic()".
> I see.  Indeed, 'new' SU(+J) code invalidates and frees buffers in
> ffs_softdep.c:indir_trunc().  Due to the nature of the SU code, which
> attaches dependencies to the buffers, it is somewhat problematic to
> assert same condition, since buffers might survive longer due to
> unsatisfied dependencies.
> 
> > > 
> > > By adding a few printf()s, I have determined:
> > > - The bo_clean.bv_cnt == 1 when the panic occurs and the b_lblkno of the
> > > buffer is -ve.
> What is the exact value of lblkno ?
> 
It was -1 for the crash I looked at. (Since all are the same backtrace and
happen for the same test sequence, I suspect they all are.)

> There are two kinds of buffers with negative lblk which may exist on
> the UFS vnode queues.  One is indirect block, and another is the metadata
> block.  Metadata lblks are -1 and -2, indir blocks numbers are calculated
> by formula you can see in calculation of indir_lbn[] in ffs_truncate().
> 
Thanks. I knew indirect blocks were -ve block #s, but I didn't know about
the metadata ones.

> > > 
> > > If you look at vtruncbuf():
> > >         trunclbn = (length + blksize - 1) / blksize;
> > > 1726
> > > 1727 	        ASSERT_VOP_LOCKED(vp, "vtruncbuf");
> > > 1728 	restart:
> > > 1729 	        bo = &vp->v_bufobj;
> > > 1730 	        BO_LOCK(bo);
> > > 1731 	        anyfreed = 1;
> > > 1732 	        for (;anyfreed;) {
> > > 1733 	                anyfreed = 0;
> > > 1734 	                TAILQ_FOREACH_SAFE(bp, &bo->bo_clean.bv_hd,
> > > b_bobufs,
> > > nbp) {
> > > 1735 	                        if (bp->b_lblkno < trunclbn)
> > > 1736 	                                continue;
> > > When length == 0 --> trunclbn is 0, but the test at line#1735 will skip
> > > over
> > > the b_lblkno
> > > because it is negative.
> > > 
> > > That is as far as I've gotten. A couple of things I need help from others
> > > on:
> > > - Is vtruncbuf() skipping over the cases where b_lblkno < 0 a feature or
> > > a
> > > bug?
> It is a feature, since vtruncbuf() intent is truncation of the normal
> file data.  Negative lbns are used as a trick to manage metadata or other
> data not normally accessible to the read/write syscalls, but which still
> is reasonable to consider inode data.
> 
> > > - If it is a feature, then what needs to be done in the code after the
> > > vtruncbuf()
> > >   call in ffs_truncate() to ensure the buffer is gone by the time the
> > >   panic
> > >   check is
> > >   done?
> > >   --> I do see a bunch of code after the vtruncbuf() call related to
> > >   indirect
> > >   blocks
> > >      (which I think use the -ve b_lblkno?), but I'll admit I don't
> > >      understand
> > >      it well
> > >       enough to know if it expects vtruncbuf() to leave the -ve block on
> > >       the
> > >       bo_hd list?
> Either the left blocks are extattrs, and then the assert should be improved,
> or we trully leak buffers and the leak must be fixed.  We will see  by
> the block number.
> 
So, if I've understood you correctly, a -1 b_lblkno is an extattr one and
the assert should allow it and not panic?

I can change the assert, although it will end up about 6 lines of code;-)
If you think I should do this, I will do so and test it.

> > > 
> > > Obviously fixing vtruncbuf() to get rid of these -ve b_lblkno entries
> > > would
> > > be easy,
> > > but I don't know if that is a feature or a bug?
> > > 
> > > I did look at the commit logs and vtruncbuf() has been like this for at
> > > least
> > > 10years.
> > > (I can only guess very few run UFS without soft updates or others would
> > > see
> > > these panic()s.)
> > > 
> > > I am now running with soft updates enabled to avoid the crashes, but I
> > > can
> > > easily test any
> > > patch if others can a patch to try.
> > > 
> > Oh, and one more thing.
> > Maybe having the buffer for an indirect block hanging off the vnode at the
> > end of ffs_truncate() to 0 length is ok. After all, this is happening in
> > VOP_INACTIVE() and the vnode isn't being recycled yet?
> > (ie. The panic() test is not needed?)
> See above, depends on kind of the block.
> 
> But ufs_inactive() truncating the inode means that the nlinks went to zero,
> and the inode is going to be freed.  So we better not to leak anything.
> 
Righto. Thanks for your help, rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?964333498.161527381.1452827658163.JavaMail.zimbra>