From owner-freebsd-fs@freebsd.org  Wed Jan 13 15:40:28 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2AFDDA81D64
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 13 Jan 2016 15:40:28 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id D3BF71E55
 for <freebsd-fs@freebsd.org>; Wed, 13 Jan 2016 15:40:27 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
IronPort-PHdr: 9a23:ECJCehLQZ/wpRAgkQtmcpTZWNBhigK39O0sv0rFitYgULPzxwZ3uMQTl6Ol3ixeRBMOAu6wC27Wd7fGocFdDyKjCmUhKSIZLWR4BhJdetC0bK+nBN3fGKuX3ZTcxBsVIWQwt1Xi6NU9IBJS2PAWK8TWM5DIfUi/yKRBybrysXNWC0YLniKvsqtX6WEZhunmUWftKNhK4rAHc5IE9oLBJDeIP8CbPuWZCYO9MxGlldhq5lhf44dqsrtY4q3wD86Fpy8kVcqL8ZLgxS6BZCnwMPmQy+dbsq1GXTgyU+nofWGgSuhVNCgnBqhr9W8GinDH9s79H2SKZdej/RrMwVDHqu71uQRTrjCoCHyM+/3zajtRwyqlS9kHy7ydjypLZNdnGfMF1ebnQKIsX
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A2CuBACdbpZW/61jaINehAxtBohUsy6BZBgKhSNKAoFwEgEBAQEBAQEBgQmCLYIHAQEBAwEBAQEgBCcgCwULAgEIDgoCAg0ZAgInAQkmAgQIBwQBHASIBQgOsAWQOgEBAQEBAQQBAQEBAQEZBIEBhVWEf4Q3AQEcgx+BSQWOOIhdhUOFK4RKhESIXkSKH4NxAikJMoIRHIF7IDQHhFM6gQgBAQE
X-IronPort-AV: E=Sophos;i="5.22,289,1449550800"; d="scan'208";a="261005595"
Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca)
 ([131.104.99.173])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 13 Jan 2016 10:40:18 -0500
Received: from localhost (localhost [127.0.0.1])
 by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 637DB15F565;
 Wed, 13 Jan 2016 10:40:18 -0500 (EST)
Received: from zcs1.mail.uoguelph.ca ([127.0.0.1])
 by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id 2iDgMN0Tgr4d; Wed, 13 Jan 2016 10:40:17 -0500 (EST)
Received: from localhost (localhost [127.0.0.1])
 by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 5BDA015F56D;
 Wed, 13 Jan 2016 10:40:17 -0500 (EST)
X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca
Received: from zcs1.mail.uoguelph.ca ([127.0.0.1])
 by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id GwJYeX06o9dB; Wed, 13 Jan 2016 10:40:17 -0500 (EST)
Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18])
 by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 3EF4115F565;
 Wed, 13 Jan 2016 10:40:17 -0500 (EST)
Date: Wed, 13 Jan 2016 10:40:17 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
Message-ID: <1351730674.159022044.1452699617235.JavaMail.zimbra@uoguelph.ca>
In-Reply-To: <1773157955.158922767.1452698181137.JavaMail.zimbra@uoguelph.ca>
References: <1696608910.154845456.1452438117036.JavaMail.zimbra@uoguelph.ca>
 <20160110154518.GU3625@kib.kiev.ua>
 <1773157955.158922767.1452698181137.JavaMail.zimbra@uoguelph.ca>
Subject: Re: panic ffs_truncate3 (maybe fuse being evil)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.95.11]
X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF43 (Win)/8.0.9_GA_6191)
Thread-Topic: panic ffs_truncate3 (maybe fuse being evil)
Thread-Index: 5I8dlwX1ds6laP1djmSqjnkEIr28EdIJ2VW4
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Jan 2016 15:40:28 -0000

I wrote:
> Kostik wrote:
> > On Sun, Jan 10, 2016 at 10:01:57AM -0500, Rick Macklem wrote:
> > > Hi,
> > > 
> > > When fooling around with GlusterFS, I can get this panic intermittently.
> > > (I had a couple yesterday.) This happens on a Dec. 5, 2015 head kernel.
> > > 
> > > panic: ffs_truncate3
> > > - backtrace without the numbers (I just scribbled it off the screen)
> > > ffs_truncate()
> > > ufs_inactive()
> > > VOP_INACTIVE_APV()
> > > vinactive()
> > > vputx()
> > > kern_unlinkat()
> > > 
> > > So, at a glance, it seems that either
> > >    b_dirty.bv_cnt
> > > or b_clean.bv_cnt
> > > is non-zero. (There is another case for the panic, but I thought it
> > >               was less likely?)
> > > 
> > > So, I'm wondering if this might be another side effect of r291460,
> > > since after that a new vnode isn't completely zero'd out?
> > > 
> > > However, shouldn't bo_dirty.bv_cnt and bo_clean.bv_cnt be zero when
> > > a vnode is recycled?
> > > Does this make sense or do some fields of v_bufobj need to be zero'd
> > > out by getnewvnode()?
> > Look at the _vdrop().  When a vnode is freed to zone, it is asserted
> > that bufobj queues are empty.  I very much doubt that it is possible
> > to leak either buffers or counters by reuse.
> > 
> > > 
> > > GlusterFS is using fuse and I suspect that fuse isn't cleaning out
> > > the buffers under some circumstance (I already noticed that there
> > > isn't any code in its fuse_vnop_reclaim() and I vaguely recall that
> > > there are conditions where VOP_INACTIVE() gets skipped, so that
> > > VOP_RECLAIM()
> > > has to check for anything that would have been done by VOP_INACTIVE()
> > > and do it, if it isn't already done.)
> > But even if fuse leaves the buffers around, is it UFS which panics for
> > you ? I would rather worry about dandling pointers and use after free in
> > fuse, which is a known issue with it anyway. I.e. it could be that fuse
> > operates on reclaimed and reused vnode as its own.
> > 
> > > 
> > > Anyhow, if others have thoughts on this (or other hunches w.r.t. what
> > > could cause this panic(), please let me know.
> > 
> > The ffs_truncate3 was deterministically triggered by a bug in ffs_balloc().
> > The routine allocated buffers for indirect blocks, but if the blocks cannot
> > be allocated, the buffers where left on queue.  See r174973, this was fixed
> > very long time ago.
> > 
> Well, although I have r174973 in the kernel that crashes, it looks like this
> bug might have been around for a while.
> Here's what I've figured out sofar.
> 1 - The crashes only occur if soft updates are disabled. This isn't
> surprising
>     if you look at ffs_truncate(), since the test for the panic isn't done
>     when soft updates are enabled.
> Here's the snippet from ffs_truncate(), in case you are interested:
>        if (DOINGSOFTDEP(vp)) {
> 335 	                if (softdeptrunc == 0 && journaltrunc == 0) {
> 336 	                        /*
> 337 	                         * If a file is only partially truncated, then
> 338 	                         * we have to clean up the data structures
> 339 	                         * describing the allocation past the truncation
> 340 	                         * point. Finding and deallocating those
> structures
> 341 	                         * is a lot of work. Since partial truncation
> occurs
> 342 	                         * rarely, we solve the problem by syncing the
> file
> 343 	                         * so that it will have no data structures left.
> 344 	                         */
> 345 	                        if ((error = ffs_syncvnode(vp, MNT_WAIT, 0)) !=
> 0)
> 346 	                                return (error);
> 347 	                } else {
> 348 	                        flags = IO_NORMAL | (needextclean ? IO_EXT: 0);
> 349 	                        if (journaltrunc)
> 350 	                                softdep_journal_freeblocks(ip, cred,
> length,
> 351 	                                    flags);
> 352 	                        else
> 353 	                                softdep_setup_freeblocks(ip, length,
> flags);
> 354 	                        ASSERT_VOP_LOCKED(vp, "ffs_truncate1");
> 355 	                        if (journaltrunc == 0) {
> 356 	                                ip->i_flag |= IN_CHANGE | IN_UPDATE;
> 357 	                                error = ffs_update(vp, 0);
> 358 	                        }
> 359 	                        return (error);
> 360 	                }
> 361 	        }
> You can see that it always returns once in this code block. The only way the
> code can get
> past this block if soft updates are enabled is a "goto extclean;", which
> takes you past
> the "panic()".
> 
> By adding a few printf()s, I have determined:
> - The bo_clean.bv_cnt == 1 when the panic occurs and the b_lblkno of the
> buffer is -ve.
> 
> If you look at vtruncbuf():
>         trunclbn = (length + blksize - 1) / blksize;
> 1726
> 1727 	        ASSERT_VOP_LOCKED(vp, "vtruncbuf");
> 1728 	restart:
> 1729 	        bo = &vp->v_bufobj;
> 1730 	        BO_LOCK(bo);
> 1731 	        anyfreed = 1;
> 1732 	        for (;anyfreed;) {
> 1733 	                anyfreed = 0;
> 1734 	                TAILQ_FOREACH_SAFE(bp, &bo->bo_clean.bv_hd, b_bobufs,
> nbp) {
> 1735 	                        if (bp->b_lblkno < trunclbn)
> 1736 	                                continue;
> When length == 0 --> trunclbn is 0, but the test at line#1735 will skip over
> the b_lblkno
> because it is negative.
> 
> That is as far as I've gotten. A couple of things I need help from others on:
> - Is vtruncbuf() skipping over the cases where b_lblkno < 0 a feature or a
> bug?
> - If it is a feature, then what needs to be done in the code after the
> vtruncbuf()
>   call in ffs_truncate() to ensure the buffer is gone by the time the panic
>   check is
>   done?
>   --> I do see a bunch of code after the vtruncbuf() call related to indirect
>   blocks
>      (which I think use the -ve b_lblkno?), but I'll admit I don't understand
>      it well
>       enough to know if it expects vtruncbuf() to leave the -ve block on the
>       bo_hd list?
> 
> Obviously fixing vtruncbuf() to get rid of these -ve b_lblkno entries would
> be easy,
> but I don't know if that is a feature or a bug?
> 
> I did look at the commit logs and vtruncbuf() has been like this for at least
> 10years.
> (I can only guess very few run UFS without soft updates or others would see
> these panic()s.)
> 
> I am now running with soft updates enabled to avoid the crashes, but I can
> easily test any
> patch if others can a patch to try.
> 
Oh, and one more thing.
Maybe having the buffer for an indirect block hanging off the vnode at the
end of ffs_truncate() to 0 length is ok. After all, this is happening in
VOP_INACTIVE() and the vnode isn't being recycled yet?
(ie. The panic() test is not needed?)

rick

> Thanks for your help with this, rick
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>