Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Jan 2016 10:16:21 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: panic ffs_truncate3 (maybe fuse being evil)
Message-ID:  <1773157955.158922767.1452698181137.JavaMail.zimbra@uoguelph.ca>
In-Reply-To: <20160110154518.GU3625@kib.kiev.ua>
References:  <1696608910.154845456.1452438117036.JavaMail.zimbra@uoguelph.ca> <20160110154518.GU3625@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Kostik wrote:
> On Sun, Jan 10, 2016 at 10:01:57AM -0500, Rick Macklem wrote:
> > Hi,
> > 
> > When fooling around with GlusterFS, I can get this panic intermittently.
> > (I had a couple yesterday.) This happens on a Dec. 5, 2015 head kernel.
> > 
> > panic: ffs_truncate3
> > - backtrace without the numbers (I just scribbled it off the screen)
> > ffs_truncate()
> > ufs_inactive()
> > VOP_INACTIVE_APV()
> > vinactive()
> > vputx()
> > kern_unlinkat()
> > 
> > So, at a glance, it seems that either
> >    b_dirty.bv_cnt
> > or b_clean.bv_cnt
> > is non-zero. (There is another case for the panic, but I thought it
> >               was less likely?)
> > 
> > So, I'm wondering if this might be another side effect of r291460,
> > since after that a new vnode isn't completely zero'd out?
> > 
> > However, shouldn't bo_dirty.bv_cnt and bo_clean.bv_cnt be zero when
> > a vnode is recycled?
> > Does this make sense or do some fields of v_bufobj need to be zero'd
> > out by getnewvnode()?
> Look at the _vdrop().  When a vnode is freed to zone, it is asserted
> that bufobj queues are empty.  I very much doubt that it is possible
> to leak either buffers or counters by reuse.
> 
> > 
> > GlusterFS is using fuse and I suspect that fuse isn't cleaning out
> > the buffers under some circumstance (I already noticed that there
> > isn't any code in its fuse_vnop_reclaim() and I vaguely recall that
> > there are conditions where VOP_INACTIVE() gets skipped, so that
> > VOP_RECLAIM()
> > has to check for anything that would have been done by VOP_INACTIVE()
> > and do it, if it isn't already done.)
> But even if fuse leaves the buffers around, is it UFS which panics for
> you ? I would rather worry about dandling pointers and use after free in
> fuse, which is a known issue with it anyway. I.e. it could be that fuse
> operates on reclaimed and reused vnode as its own.
> 
> > 
> > Anyhow, if others have thoughts on this (or other hunches w.r.t. what
> > could cause this panic(), please let me know.
> 
> The ffs_truncate3 was deterministically triggered by a bug in ffs_balloc().
> The routine allocated buffers for indirect blocks, but if the blocks cannot
> be allocated, the buffers where left on queue.  See r174973, this was fixed
> very long time ago.
> 
Well, although I have r174973 in the kernel that crashes, it looks like this
bug might have been around for a while.
Here's what I've figured out sofar.
1 - The crashes only occur if soft updates are disabled. This isn't surprising
    if you look at ffs_truncate(), since the test for the panic isn't done
    when soft updates are enabled.
Here's the snippet from ffs_truncate(), in case you are interested:
       if (DOINGSOFTDEP(vp)) {
335 	                if (softdeptrunc == 0 && journaltrunc == 0) {
336 	                        /*
337 	                         * If a file is only partially truncated, then
338 	                         * we have to clean up the data structures
339 	                         * describing the allocation past the truncation
340 	                         * point. Finding and deallocating those structures
341 	                         * is a lot of work. Since partial truncation occurs
342 	                         * rarely, we solve the problem by syncing the file
343 	                         * so that it will have no data structures left.
344 	                         */
345 	                        if ((error = ffs_syncvnode(vp, MNT_WAIT, 0)) != 0)
346 	                                return (error);
347 	                } else {
348 	                        flags = IO_NORMAL | (needextclean ? IO_EXT: 0);
349 	                        if (journaltrunc)
350 	                                softdep_journal_freeblocks(ip, cred, length,
351 	                                    flags);
352 	                        else
353 	                                softdep_setup_freeblocks(ip, length, flags);
354 	                        ASSERT_VOP_LOCKED(vp, "ffs_truncate1");
355 	                        if (journaltrunc == 0) {
356 	                                ip->i_flag |= IN_CHANGE | IN_UPDATE;
357 	                                error = ffs_update(vp, 0);
358 	                        }
359 	                        return (error);
360 	                }
361 	        }
You can see that it always returns once in this code block. The only way the code can get
past this block if soft updates are enabled is a "goto extclean;", which takes you past
the "panic()".

By adding a few printf()s, I have determined:
- The bo_clean.bv_cnt == 1 when the panic occurs and the b_lblkno of the buffer is -ve.

If you look at vtruncbuf():
        trunclbn = (length + blksize - 1) / blksize;
1726 	
1727 	        ASSERT_VOP_LOCKED(vp, "vtruncbuf");
1728 	restart:
1729 	        bo = &vp->v_bufobj;
1730 	        BO_LOCK(bo);
1731 	        anyfreed = 1;
1732 	        for (;anyfreed;) {
1733 	                anyfreed = 0;
1734 	                TAILQ_FOREACH_SAFE(bp, &bo->bo_clean.bv_hd, b_bobufs, nbp) {
1735 	                        if (bp->b_lblkno < trunclbn)
1736 	                                continue;
When length == 0 --> trunclbn is 0, but the test at line#1735 will skip over the b_lblkno
because it is negative.

That is as far as I've gotten. A couple of things I need help from others on:
- Is vtruncbuf() skipping over the cases where b_lblkno < 0 a feature or a bug?
- If it is a feature, then what needs to be done in the code after the vtruncbuf()
  call in ffs_truncate() to ensure the buffer is gone by the time the panic check is
  done?
  --> I do see a bunch of code after the vtruncbuf() call related to indirect blocks
     (which I think use the -ve b_lblkno?), but I'll admit I don't understand it well
      enough to know if it expects vtruncbuf() to leave the -ve block on the bo_hd list?

Obviously fixing vtruncbuf() to get rid of these -ve b_lblkno entries would be easy,
but I don't know if that is a feature or a bug?

I did look at the commit logs and vtruncbuf() has been like this for at least 10years.
(I can only guess very few run UFS without soft updates or others would see these panic()s.)

I am now running with soft updates enabled to avoid the crashes, but I can easily test any
patch if others can a patch to try.

Thanks for your help with this, rick




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1773157955.158922767.1452698181137.JavaMail.zimbra>