From owner-freebsd-fs@freebsd.org Tue May 17 19:11:06 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6E7C3B3E397 for ; Tue, 17 May 2016 19:11:06 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 5B6101B79 for ; Tue, 17 May 2016 19:11:06 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id 571DDB3E396; Tue, 17 May 2016 19:11:06 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 54778B3E394 for ; Tue, 17 May 2016 19:11:06 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id EF8D71B76 for ; Tue, 17 May 2016 19:11:05 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 9F912780BD4; Wed, 18 May 2016 05:11:01 +1000 (AEST) Date: Wed, 18 May 2016 05:11:01 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: fs@freebsd.org Subject: Re: quick fix for slow directory shrinking in ffs In-Reply-To: <20160517111715.GC89104@kib.kiev.ua> Message-ID: <20160518035413.L4357@besplex.bde.org> References: <20160517072705.F2157@besplex.bde.org> <20160517082050.GX89104@kib.kiev.ua> <20160517192933.U4573@besplex.bde.org> <20160517111715.GC89104@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=c+ZWOkJl c=1 sm=1 tr=0 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=m90GG2ySlDWwqfHBogYA:9 a=5Ij-lXQwDHgBn59Q:21 a=qLoGEIkQGsovWur8:21 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 May 2016 19:11:06 -0000 On Tue, 17 May 2016, Konstantin Belousov wrote: > On Tue, May 17, 2016 at 08:26:26PM +1000, Bruce Evans wrote: >> On Tue, 17 May 2016, Konstantin Belousov wrote: >> >>> On Tue, May 17, 2016 at 07:54:27AM +1000, Bruce Evans wrote: >>>> ffs does very slow shrinking of directories after removing some files >>>> leaves unused blocks at the end, by always doing synchronous truncation. >>>> ... >>>> X Index: ufs_lookup.c >>>> X =================================================================== >>>> X --- ufs_lookup.c (revision 299263) >>>> X +++ ufs_lookup.c (working copy) >>>> X @@ -1131,9 +1131,9 @@ >>>> X if (tvp != NULL) >>>> X VOP_UNLOCK(tvp, 0); >>>> X error = UFS_TRUNCATE(dvp, (off_t)dp->i_endoff, >>>> X - IO_NORMAL | IO_SYNC, cr); >>>> X + IO_NORMAL | (DOINGASYNC(dvp) ? 0 : IO_SYNC), cr); >>>> X if (error != 0) >>>> X - vprint("ufs_direnter: failted to truncate", dvp); >>>> X + vprint("ufs_direnter: failed to truncate", dvp); I keep looking at wrong versions. I checked the old version and now see another problem with this "failted" message (which you fixed). It is debugging code and shouldn't be printed at all. Old versions ignored errors from the truncation since the truncation is supposed to be optional but that was broken for dirhash so r262812 added error handling. If the error handling actually works, then this becomes a non-error. >> Some relevant code in ffs_truncate: This was from an old versions. Perhaps r181717. FreeBSD-8 is similar, but FreeBSD-9+ has most of my DOINGASYNC() additions and -current has just 2 more of them than FreeBSD-9. >> Y /* >> Y * Shorten the size of the file. If the file is not being >> Y * truncated to a block boundary, the contents of the >> Y * partial block following the end of the file must be >> Y * zero'ed in case it ever becomes accessible again because >> Y * of subsequent file growth. Directories however are not >> Y * zero'ed as they should grow back initialized to empty. >> Y */ >> Y offset = blkoff(fs, length); >> Y if (offset == 0) { >> Y ip->i_size = length; >> Y DIP_SET(ip, i_size, length); >> Y } else { >> Y lbn = lblkno(fs, length); >> Y flags |= BA_CLRBUF; >> Y error = UFS_BALLOC(vp, length - 1, 1, cred, flags, &bp); >> Y if (error) { >> Y return (error); >> Y } >> Y /* >> Y * When we are doing soft updates and the UFS_BALLOC >> Y * above fills in a direct block hole with a full sized >> Y * block that will be truncated down to a fragment below, >> Y * we must flush out the block dependency with an FSYNC >> Y * so that we do not get a soft updates inconsistency >> Y * when we create the fragment below. >> Y */ >> Y if (DOINGSOFTDEP(vp) && lbn < NDADDR && >> Y fragroundup(fs, blkoff(fs, length)) < fs->fs_bsize && >> Y (error = ffs_syncvnode(vp, MNT_WAIT)) != 0) >> Y return (error); >> Y ip->i_size = length; >> Y DIP_SET(ip, i_size, length); >> Y size = blksize(fs, ip, lbn); >> Y if (vp->v_type != VDIR) >> Y bzero((char *)bp->b_data + offset, >> Y (u_int)(size - offset)); >> Y /* Kirk's code has reallocbuf(bp, size, 1) here */ >> Y allocbuf(bp, size); >> Y if (bp->b_bufsize == fs->fs_bsize) >> Y bp->b_flags |= B_CLUSTEROK; >> Y if (flags & IO_SYNC) >> Y bwrite(bp); >> Y else >> Y bawrite(bp); FreeBSD-9+ already has my DOINGASYNC() fix here. However, an async write is still done when DOINGASYNC(). It is done by vtruncbuf() 50 lines after here. vtruncbuf() doesn't know about DOINGASYNC(). It turns delayed writes into unconditional async ones. >> Y } >> >> I think we usually arrive here and honor the IO_SYNC flag. This is correct. >> Otherwise, we always do an async write, but that is wrong for async mounts. >> Here is my old fix for this: >> >> Z diff -u2 ffs_inode.c~ ffs_inode.c >> Z --- ffs_inode.c~ Wed Apr 7 21:22:26 2004 >> Z +++ ffs_inode.c Sat Mar 23 01:23:16 2013 >> Z @@ -345,4 +431,6 @@ >> Z if (flags & IO_SYNC) >> Z bwrite(bp); >> Z + else if (DOINGASYNC(ovp)) >> Z + bdwrite(bp); >> Z else >> Z bawrite(bp); >> >> This fix must be sprinkled in most places where there is a bwrite()/ >> bawrite() decision. > No, I do not think that it would be correct for SU mounts. It is essential SU silently ignores the async mount flag (by silently killing it instead of ignoring it later), so the DOINGASYNC() checks don't affect it. > for the correct operation of e.g. ffs_indirtrunc() that writes for SU > case are synchronous, since no dependencies on the indirect block updates > are recorded. The fact that syncvnode() is done before is similarly > important, because no existing dependencies are cleared. > > On the other hand, I agree with the note that the final ffs_update() > must honour IO_SYNC requests. > > Anyway, my point was that your patch does not change the hardest source > of sync writes, only the write of the final block. I will commit the > following. Er, it fixes all cases of directory shrinking for async mounts. All cases should probably use watermarks and shrink at block or frag boundaries instead of 512-boundaries. E.g., for small directories, shrink if size - endoff >= fs_fsize && . WIth fs_fsize = 2K, this gives for example: - size <= 2K: never shrink - size nearly 4K but endoff between 1K and 2K: don't shrink, because shrinking would free a frag but not leave much space for expansion. > diff --git a/sys/ufs/ffs/ffs_inode.c b/sys/ufs/ffs/ffs_inode.c > index 0202820..50b456b 100644 > --- a/sys/ufs/ffs/ffs_inode.c > +++ b/sys/ufs/ffs/ffs_inode.c > @@ -610,7 +610,7 @@ extclean: > softdep_journal_freeblocks(ip, cred, length, IO_EXT); > else > softdep_setup_freeblocks(ip, length, IO_EXT); > - return (ffs_update(vp, !DOINGASYNC(vp))); > + return (ffs_update(vp, (flags & IO_SYNC) != 0 || !DOINGASYNC(vp))); > } > > /* Oops, this needs fixing in my version, but in -current the fix has little effect since in -current ffs_update() still dishonors the waitfor flag for its bwrite()/bdwrite() decision if DOINGASYNC(). This is essentially the same as dishonoring the IO_SYNC flag here. ffs_update() needs the same fix in 4 more places. > diff --git a/sys/ufs/ufs/ufs_lookup.c b/sys/ufs/ufs/ufs_lookup.c > index 43b4e5c..53536ff 100644 > --- a/sys/ufs/ufs/ufs_lookup.c > +++ b/sys/ufs/ufs/ufs_lookup.c > @@ -1131,7 +1131,7 @@ ufs_direnter(dvp, tvp, dirp, cnp, newdirbp, isrename) > if (tvp != NULL) > VOP_UNLOCK(tvp, 0); > error = UFS_TRUNCATE(dvp, (off_t)dp->i_endoff, > - IO_NORMAL | IO_SYNC, cr); > + IO_NORMAL | (DOINGASYNC(dvp) ? 0 : IO_SYNC), cr); > if (error != 0) > vprint("ufs_direnter: failed to truncate", dvp); > #ifdef UFS_DIRHASH > OK. I want this to avoid _any_ sync writes here for async mounts even after the excessive truncations are fixed. Perhaps vtruncbuf() should just check the async mount flag to avoid async writes (except possibly when buf_dirty_count_severe()). Bruce