Date: Thu, 14 Jan 2016 11:29:34 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org> Subject: Re: panic ffs_truncate3 (maybe fuse being evil) Message-ID: <20160114092934.GL72455@kib.kiev.ua> In-Reply-To: <1351730674.159022044.1452699617235.JavaMail.zimbra@uoguelph.ca> References: <1696608910.154845456.1452438117036.JavaMail.zimbra@uoguelph.ca> <20160110154518.GU3625@kib.kiev.ua> <1773157955.158922767.1452698181137.JavaMail.zimbra@uoguelph.ca> <1351730674.159022044.1452699617235.JavaMail.zimbra@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jan 13, 2016 at 10:40:17AM -0500, Rick Macklem wrote: > > Well, although I have r174973 in the kernel that crashes, it looks like this > > bug might have been around for a while. > > Here's what I've figured out sofar. > > 1 - The crashes only occur if soft updates are disabled. This isn't > > surprising > > if you look at ffs_truncate(), since the test for the panic isn't done > > when soft updates are enabled. > > Here's the snippet from ffs_truncate(), in case you are interested: > > if (DOINGSOFTDEP(vp)) { > > 335 if (softdeptrunc == 0 && journaltrunc == 0) { > > 336 /* > > 337 * If a file is only partially truncated, then > > 338 * we have to clean up the data structures > > 339 * describing the allocation past the truncation > > 340 * point. Finding and deallocating those > > structures > > 341 * is a lot of work. Since partial truncation > > occurs > > 342 * rarely, we solve the problem by syncing the > > file > > 343 * so that it will have no data structures left. > > 344 */ > > 345 if ((error = ffs_syncvnode(vp, MNT_WAIT, 0)) != > > 0) > > 346 return (error); > > 347 } else { > > 348 flags = IO_NORMAL | (needextclean ? IO_EXT: 0); > > 349 if (journaltrunc) > > 350 softdep_journal_freeblocks(ip, cred, > > length, > > 351 flags); > > 352 else > > 353 softdep_setup_freeblocks(ip, length, > > flags); > > 354 ASSERT_VOP_LOCKED(vp, "ffs_truncate1"); > > 355 if (journaltrunc == 0) { > > 356 ip->i_flag |= IN_CHANGE | IN_UPDATE; > > 357 error = ffs_update(vp, 0); > > 358 } > > 359 return (error); > > 360 } > > 361 } > > You can see that it always returns once in this code block. The only way the > > code can get > > past this block if soft updates are enabled is a "goto extclean;", which > > takes you past > > the "panic()". I see. Indeed, 'new' SU(+J) code invalidates and frees buffers in ffs_softdep.c:indir_trunc(). Due to the nature of the SU code, which attaches dependencies to the buffers, it is somewhat problematic to assert same condition, since buffers might survive longer due to unsatisfied dependencies. > > > > By adding a few printf()s, I have determined: > > - The bo_clean.bv_cnt == 1 when the panic occurs and the b_lblkno of the > > buffer is -ve. What is the exact value of lblkno ? There are two kinds of buffers with negative lblk which may exist on the UFS vnode queues. One is indirect block, and another is the metadata block. Metadata lblks are -1 and -2, indir blocks numbers are calculated by formula you can see in calculation of indir_lbn[] in ffs_truncate(). > > > > If you look at vtruncbuf(): > > trunclbn = (length + blksize - 1) / blksize; > > 1726 > > 1727 ASSERT_VOP_LOCKED(vp, "vtruncbuf"); > > 1728 restart: > > 1729 bo = &vp->v_bufobj; > > 1730 BO_LOCK(bo); > > 1731 anyfreed = 1; > > 1732 for (;anyfreed;) { > > 1733 anyfreed = 0; > > 1734 TAILQ_FOREACH_SAFE(bp, &bo->bo_clean.bv_hd, b_bobufs, > > nbp) { > > 1735 if (bp->b_lblkno < trunclbn) > > 1736 continue; > > When length == 0 --> trunclbn is 0, but the test at line#1735 will skip over > > the b_lblkno > > because it is negative. > > > > That is as far as I've gotten. A couple of things I need help from others on: > > - Is vtruncbuf() skipping over the cases where b_lblkno < 0 a feature or a > > bug? It is a feature, since vtruncbuf() intent is truncation of the normal file data. Negative lbns are used as a trick to manage metadata or other data not normally accessible to the read/write syscalls, but which still is reasonable to consider inode data. > > - If it is a feature, then what needs to be done in the code after the > > vtruncbuf() > > call in ffs_truncate() to ensure the buffer is gone by the time the panic > > check is > > done? > > --> I do see a bunch of code after the vtruncbuf() call related to indirect > > blocks > > (which I think use the -ve b_lblkno?), but I'll admit I don't understand > > it well > > enough to know if it expects vtruncbuf() to leave the -ve block on the > > bo_hd list? Either the left blocks are extattrs, and then the assert should be improved, or we trully leak buffers and the leak must be fixed. We will see by the block number. > > > > Obviously fixing vtruncbuf() to get rid of these -ve b_lblkno entries would > > be easy, > > but I don't know if that is a feature or a bug? > > > > I did look at the commit logs and vtruncbuf() has been like this for at least > > 10years. > > (I can only guess very few run UFS without soft updates or others would see > > these panic()s.) > > > > I am now running with soft updates enabled to avoid the crashes, but I can > > easily test any > > patch if others can a patch to try. > > > Oh, and one more thing. > Maybe having the buffer for an indirect block hanging off the vnode at the > end of ffs_truncate() to 0 length is ok. After all, this is happening in > VOP_INACTIVE() and the vnode isn't being recycled yet? > (ie. The panic() test is not needed?) See above, depends on kind of the block. But ufs_inactive() truncating the inode means that the nlinks went to zero, and the inode is going to be freed. So we better not to leak anything.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160114092934.GL72455>