Date: Thu, 14 Jan 2016 22:14:18 -0500 (EST) From: Rick Macklem <rmacklem@uoguelph.ca> To: Konstantin Belousov <kostikbel@gmail.com> Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org> Subject: Re: panic ffs_truncate3 (maybe fuse being evil) Message-ID: <964333498.161527381.1452827658163.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <20160114092934.GL72455@kib.kiev.ua> References: <1696608910.154845456.1452438117036.JavaMail.zimbra@uoguelph.ca> <20160110154518.GU3625@kib.kiev.ua> <1773157955.158922767.1452698181137.JavaMail.zimbra@uoguelph.ca> <1351730674.159022044.1452699617235.JavaMail.zimbra@uoguelph.ca> <20160114092934.GL72455@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Kostik wrote: > On Wed, Jan 13, 2016 at 10:40:17AM -0500, Rick Macklem wrote: > > > Well, although I have r174973 in the kernel that crashes, it looks like > > > this > > > bug might have been around for a while. > > > Here's what I've figured out sofar. > > > 1 - The crashes only occur if soft updates are disabled. This isn't > > > surprising > > > if you look at ffs_truncate(), since the test for the panic isn't > > > done > > > when soft updates are enabled. > > > Here's the snippet from ffs_truncate(), in case you are interested: > > > if (DOINGSOFTDEP(vp)) { > > > 335 if (softdeptrunc == 0 && journaltrunc == 0) { > > > 336 /* > > > 337 * If a file is only partially truncated, > > > then > > > 338 * we have to clean up the data structures > > > 339 * describing the allocation past the > > > truncation > > > 340 * point. Finding and deallocating those > > > structures > > > 341 * is a lot of work. Since partial > > > truncation > > > occurs > > > 342 * rarely, we solve the problem by syncing > > > the > > > file > > > 343 * so that it will have no data structures > > > left. > > > 344 */ > > > 345 if ((error = ffs_syncvnode(vp, MNT_WAIT, 0)) > > > != > > > 0) > > > 346 return (error); > > > 347 } else { > > > 348 flags = IO_NORMAL | (needextclean ? IO_EXT: > > > 0); > > > 349 if (journaltrunc) > > > 350 softdep_journal_freeblocks(ip, cred, > > > length, > > > 351 flags); > > > 352 else > > > 353 softdep_setup_freeblocks(ip, length, > > > flags); > > > 354 ASSERT_VOP_LOCKED(vp, "ffs_truncate1"); > > > 355 if (journaltrunc == 0) { > > > 356 ip->i_flag |= IN_CHANGE | IN_UPDATE; > > > 357 error = ffs_update(vp, 0); > > > 358 } > > > 359 return (error); > > > 360 } > > > 361 } > > > You can see that it always returns once in this code block. The only way > > > the > > > code can get > > > past this block if soft updates are enabled is a "goto extclean;", which > > > takes you past > > > the "panic()". > I see. Indeed, 'new' SU(+J) code invalidates and frees buffers in > ffs_softdep.c:indir_trunc(). Due to the nature of the SU code, which > attaches dependencies to the buffers, it is somewhat problematic to > assert same condition, since buffers might survive longer due to > unsatisfied dependencies. > > > > > > > By adding a few printf()s, I have determined: > > > - The bo_clean.bv_cnt == 1 when the panic occurs and the b_lblkno of the > > > buffer is -ve. > What is the exact value of lblkno ? > It was -1 for the crash I looked at. (Since all are the same backtrace and happen for the same test sequence, I suspect they all are.) > There are two kinds of buffers with negative lblk which may exist on > the UFS vnode queues. One is indirect block, and another is the metadata > block. Metadata lblks are -1 and -2, indir blocks numbers are calculated > by formula you can see in calculation of indir_lbn[] in ffs_truncate(). > Thanks. I knew indirect blocks were -ve block #s, but I didn't know about the metadata ones. > > > > > > If you look at vtruncbuf(): > > > trunclbn = (length + blksize - 1) / blksize; > > > 1726 > > > 1727 ASSERT_VOP_LOCKED(vp, "vtruncbuf"); > > > 1728 restart: > > > 1729 bo = &vp->v_bufobj; > > > 1730 BO_LOCK(bo); > > > 1731 anyfreed = 1; > > > 1732 for (;anyfreed;) { > > > 1733 anyfreed = 0; > > > 1734 TAILQ_FOREACH_SAFE(bp, &bo->bo_clean.bv_hd, > > > b_bobufs, > > > nbp) { > > > 1735 if (bp->b_lblkno < trunclbn) > > > 1736 continue; > > > When length == 0 --> trunclbn is 0, but the test at line#1735 will skip > > > over > > > the b_lblkno > > > because it is negative. > > > > > > That is as far as I've gotten. A couple of things I need help from others > > > on: > > > - Is vtruncbuf() skipping over the cases where b_lblkno < 0 a feature or > > > a > > > bug? > It is a feature, since vtruncbuf() intent is truncation of the normal > file data. Negative lbns are used as a trick to manage metadata or other > data not normally accessible to the read/write syscalls, but which still > is reasonable to consider inode data. > > > > - If it is a feature, then what needs to be done in the code after the > > > vtruncbuf() > > > call in ffs_truncate() to ensure the buffer is gone by the time the > > > panic > > > check is > > > done? > > > --> I do see a bunch of code after the vtruncbuf() call related to > > > indirect > > > blocks > > > (which I think use the -ve b_lblkno?), but I'll admit I don't > > > understand > > > it well > > > enough to know if it expects vtruncbuf() to leave the -ve block on > > > the > > > bo_hd list? > Either the left blocks are extattrs, and then the assert should be improved, > or we trully leak buffers and the leak must be fixed. We will see by > the block number. > So, if I've understood you correctly, a -1 b_lblkno is an extattr one and the assert should allow it and not panic? I can change the assert, although it will end up about 6 lines of code;-) If you think I should do this, I will do so and test it. > > > > > > Obviously fixing vtruncbuf() to get rid of these -ve b_lblkno entries > > > would > > > be easy, > > > but I don't know if that is a feature or a bug? > > > > > > I did look at the commit logs and vtruncbuf() has been like this for at > > > least > > > 10years. > > > (I can only guess very few run UFS without soft updates or others would > > > see > > > these panic()s.) > > > > > > I am now running with soft updates enabled to avoid the crashes, but I > > > can > > > easily test any > > > patch if others can a patch to try. > > > > > Oh, and one more thing. > > Maybe having the buffer for an indirect block hanging off the vnode at the > > end of ffs_truncate() to 0 length is ok. After all, this is happening in > > VOP_INACTIVE() and the vnode isn't being recycled yet? > > (ie. The panic() test is not needed?) > See above, depends on kind of the block. > > But ufs_inactive() truncating the inode means that the nlinks went to zero, > and the inode is going to be freed. So we better not to leak anything. > Righto. Thanks for your help, rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?964333498.161527381.1452827658163.JavaMail.zimbra>