Date: Wed, 13 Jan 2016 10:16:21 -0500 (EST) From: Rick Macklem <rmacklem@uoguelph.ca> To: Konstantin Belousov <kostikbel@gmail.com> Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org> Subject: Re: panic ffs_truncate3 (maybe fuse being evil) Message-ID: <1773157955.158922767.1452698181137.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <20160110154518.GU3625@kib.kiev.ua> References: <1696608910.154845456.1452438117036.JavaMail.zimbra@uoguelph.ca> <20160110154518.GU3625@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Kostik wrote: > On Sun, Jan 10, 2016 at 10:01:57AM -0500, Rick Macklem wrote: > > Hi, > > > > When fooling around with GlusterFS, I can get this panic intermittently. > > (I had a couple yesterday.) This happens on a Dec. 5, 2015 head kernel. > > > > panic: ffs_truncate3 > > - backtrace without the numbers (I just scribbled it off the screen) > > ffs_truncate() > > ufs_inactive() > > VOP_INACTIVE_APV() > > vinactive() > > vputx() > > kern_unlinkat() > > > > So, at a glance, it seems that either > > b_dirty.bv_cnt > > or b_clean.bv_cnt > > is non-zero. (There is another case for the panic, but I thought it > > was less likely?) > > > > So, I'm wondering if this might be another side effect of r291460, > > since after that a new vnode isn't completely zero'd out? > > > > However, shouldn't bo_dirty.bv_cnt and bo_clean.bv_cnt be zero when > > a vnode is recycled? > > Does this make sense or do some fields of v_bufobj need to be zero'd > > out by getnewvnode()? > Look at the _vdrop(). When a vnode is freed to zone, it is asserted > that bufobj queues are empty. I very much doubt that it is possible > to leak either buffers or counters by reuse. > > > > > GlusterFS is using fuse and I suspect that fuse isn't cleaning out > > the buffers under some circumstance (I already noticed that there > > isn't any code in its fuse_vnop_reclaim() and I vaguely recall that > > there are conditions where VOP_INACTIVE() gets skipped, so that > > VOP_RECLAIM() > > has to check for anything that would have been done by VOP_INACTIVE() > > and do it, if it isn't already done.) > But even if fuse leaves the buffers around, is it UFS which panics for > you ? I would rather worry about dandling pointers and use after free in > fuse, which is a known issue with it anyway. I.e. it could be that fuse > operates on reclaimed and reused vnode as its own. > > > > > Anyhow, if others have thoughts on this (or other hunches w.r.t. what > > could cause this panic(), please let me know. > > The ffs_truncate3 was deterministically triggered by a bug in ffs_balloc(). > The routine allocated buffers for indirect blocks, but if the blocks cannot > be allocated, the buffers where left on queue. See r174973, this was fixed > very long time ago. > Well, although I have r174973 in the kernel that crashes, it looks like this bug might have been around for a while. Here's what I've figured out sofar. 1 - The crashes only occur if soft updates are disabled. This isn't surprising if you look at ffs_truncate(), since the test for the panic isn't done when soft updates are enabled. Here's the snippet from ffs_truncate(), in case you are interested: if (DOINGSOFTDEP(vp)) { 335 if (softdeptrunc == 0 && journaltrunc == 0) { 336 /* 337 * If a file is only partially truncated, then 338 * we have to clean up the data structures 339 * describing the allocation past the truncation 340 * point. Finding and deallocating those structures 341 * is a lot of work. Since partial truncation occurs 342 * rarely, we solve the problem by syncing the file 343 * so that it will have no data structures left. 344 */ 345 if ((error = ffs_syncvnode(vp, MNT_WAIT, 0)) != 0) 346 return (error); 347 } else { 348 flags = IO_NORMAL | (needextclean ? IO_EXT: 0); 349 if (journaltrunc) 350 softdep_journal_freeblocks(ip, cred, length, 351 flags); 352 else 353 softdep_setup_freeblocks(ip, length, flags); 354 ASSERT_VOP_LOCKED(vp, "ffs_truncate1"); 355 if (journaltrunc == 0) { 356 ip->i_flag |= IN_CHANGE | IN_UPDATE; 357 error = ffs_update(vp, 0); 358 } 359 return (error); 360 } 361 } You can see that it always returns once in this code block. The only way the code can get past this block if soft updates are enabled is a "goto extclean;", which takes you past the "panic()". By adding a few printf()s, I have determined: - The bo_clean.bv_cnt == 1 when the panic occurs and the b_lblkno of the buffer is -ve. If you look at vtruncbuf(): trunclbn = (length + blksize - 1) / blksize; 1726 1727 ASSERT_VOP_LOCKED(vp, "vtruncbuf"); 1728 restart: 1729 bo = &vp->v_bufobj; 1730 BO_LOCK(bo); 1731 anyfreed = 1; 1732 for (;anyfreed;) { 1733 anyfreed = 0; 1734 TAILQ_FOREACH_SAFE(bp, &bo->bo_clean.bv_hd, b_bobufs, nbp) { 1735 if (bp->b_lblkno < trunclbn) 1736 continue; When length == 0 --> trunclbn is 0, but the test at line#1735 will skip over the b_lblkno because it is negative. That is as far as I've gotten. A couple of things I need help from others on: - Is vtruncbuf() skipping over the cases where b_lblkno < 0 a feature or a bug? - If it is a feature, then what needs to be done in the code after the vtruncbuf() call in ffs_truncate() to ensure the buffer is gone by the time the panic check is done? --> I do see a bunch of code after the vtruncbuf() call related to indirect blocks (which I think use the -ve b_lblkno?), but I'll admit I don't understand it well enough to know if it expects vtruncbuf() to leave the -ve block on the bo_hd list? Obviously fixing vtruncbuf() to get rid of these -ve b_lblkno entries would be easy, but I don't know if that is a feature or a bug? I did look at the commit logs and vtruncbuf() has been like this for at least 10years. (I can only guess very few run UFS without soft updates or others would see these panic()s.) I am now running with soft updates enabled to avoid the crashes, but I can easily test any patch if others can a patch to try. Thanks for your help with this, rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1773157955.158922767.1452698181137.JavaMail.zimbra>