Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 4 Apr 2015 02:15:30 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Artem Kuchin <artem@artem.ru>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Little research how rm -rf and tar kill server
Message-ID:  <20150403231530.GH2379@kib.kiev.ua>
In-Reply-To: <551F0D4A.5040007@artem.ru>
References:  <1427730597.303984.247097389.165D5AAB@webmail.messagingengine.com> <5519716F.6060007@artem.ru> <1427731061.306961.247099633.0A421E90@webmail.messagingengine.com> <5519740A.1070902@artem.ru> <1427731759.309823.247107417.308CD298@webmail.messagingengine.com> <5519F74C.1040308@artem.ru> <20150331164202.GN2379@kib.kiev.ua> <551C6D9F.8010506@artem.ru> <20150402210241.GD2379@kib.kiev.ua> <551F0D4A.5040007@artem.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Apr 04, 2015 at 12:59:38AM +0300, Artem Kuchin wrote:
> 03.04.2015 0:02, Konstantin Belousov пишет:
> > On Thu, Apr 02, 2015 at 01:13:51AM +0300, Artem Kuchin wrote:
> >> 31.03.2015 19:42, Konstantin Belousov пишет:
> >>> Syncer and sync(2) perform different kind of syncs. Take the snapshot of
> >>> sysctl debug.softdep before and after the situation occur to have some
> >>> hints what is going on.
> >>>
> >>>
> >> Okay. Here is the sysctl  data
> > Try this.  It may be not enough, I will provide some update in this case.
> > No need to resend the sysctl data.  Just test whether explicit sync(2) is
> > needed in your situation after the patch.
> >
> >
> 
> Okay, patches, recompiled and installed new kernel.
> 
> The behaviour changed a bit.
> 
> Now when i start untar mysql quickly rises to 40 queries in the queue in 
> opening table state.
> (before the rise was slower)
> BUT after a while (20-30 seconds) all queries are executed.
> This cycle repeated 4 times and then situation aggravated quickly. It 
> happened when untar
> reached big subtree with tons of small files.
> Queue grew to 70 queries, processes went to 600 (from 450).
> I stopped untar. Waited 3 minutes. Everything was becoming even worse 
> (700 process, over 100
> queries). Issued sync. It executed for 3 seconds and voila! 20 idle 
> connections, 450 processes.
> So, manual sync is still need.
> 
> Also it seems like during untar shell was less responsive than before.
> 
> Also, when the system managed to flush query queue systat -io shows over 
> 1000 tps, but when
> they got stuck it showed only about 200 tps.

So there were the i/o ops during the stall period ?  I.e., a situation
where there is clogged queue and hung processes, but no disk activity,
does not occur, even temporary ?

In what state the hung processes are blocked ?  Look at the wchan name
either in top or ps output.  Are there processes in "suspfs" state ?

Try the following patch.

diff --git a/sys/ufs/ffs/ffs_extern.h b/sys/ufs/ffs/ffs_extern.h
index c29e5d5..8494223 100644
--- a/sys/ufs/ffs/ffs_extern.h
+++ b/sys/ufs/ffs/ffs_extern.h
@@ -160,7 +160,7 @@ void	softdep_journal_fsync(struct inode *);
 void	softdep_buf_append(struct buf *, struct workhead *);
 void	softdep_inode_append(struct inode *, struct ucred *, struct workhead *);
 void	softdep_freework(struct workhead *);
-
+int	softdep_need_sbupdate(struct ufsmount *ump);
 
 /*
  * Things to request flushing in softdep_request_cleanup()
diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
index ab2bd41..da7a34f 100644
--- a/sys/ufs/ffs/ffs_softdep.c
+++ b/sys/ufs/ffs/ffs_softdep.c
@@ -612,6 +612,13 @@ softdep_freework(wkhd)
 	panic("softdep_freework called");
 }
 
+int
+softdep_need_sbupdate(ump)
+     struct ufsmount *ump;
+{
+	
+	panic("softdep_need_sbupdate called");
+}
 #else
 
 FEATURE(softupdates, "FFS soft-updates support");
@@ -3560,8 +3567,10 @@ softdep_process_journal(mp, needwk, flags)
 	 * unsuspend it if we already have.
 	 */
 	if (flags == 0 && jblocks->jb_suspended) {
+#if 0
 		if (journal_unsuspend(ump))
 			return;
+#endif
 		FREE_LOCK(ump);
 		VFS_SYNC(mp, MNT_NOWAIT);
 		ffs_sbupdate(ump, MNT_WAIT, 0);
@@ -9479,6 +9488,18 @@ first_unlinked_inodedep(ump)
 	return (inodedep);
 }
 
+int
+softdep_need_sbupdate(ump)
+     struct ufsmount *ump;
+{
+	struct inodedep *inodedep;
+
+	ACQUIRE_LOCK(ump);
+	inodedep = first_unlinked_inodedep(ump);
+	FREE_LOCK(ump);
+	return (inodedep != NULL);
+}
+
 /*
  * Set the sujfree unlinked head pointer prior to writing a superblock.
  */
diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
index 6e2e556..274c0f9 100644
--- a/sys/ufs/ffs/ffs_vfsops.c
+++ b/sys/ufs/ffs/ffs_vfsops.c
@@ -1419,7 +1419,8 @@ static int
 ffs_sync_lazy(mp)
      struct mount *mp;
 {
-	struct vnode *mvp, *vp;
+	struct ufsmount *ump;
+	struct vnode *devvp, *mvp, *vp;
 	struct inode *ip;
 	struct thread *td;
 	int allerror, error;
@@ -1461,9 +1462,21 @@ qupdate:
 	qsync(mp);
 #endif
 
-	if (VFSTOUFS(mp)->um_fs->fs_fmod != 0 &&
-	    (error = ffs_sbupdate(VFSTOUFS(mp), MNT_LAZY, 0)) != 0)
-		allerror = error;
+	ump = VFSTOUFS(mp);
+	if (MOUNTEDSUJ(mp)) {
+		devvp = ump->um_devvp;
+		vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY);
+		error = VOP_FSYNC(devvp, MNT_WAIT, td);
+		VOP_UNLOCK(devvp, 0);
+		if (error != 0)
+			allerror = error;
+	}
+	if (ump->um_fs->fs_fmod != 0 || (MOUNTEDSUJ(mp) &&
+	    softdep_need_sbupdate(ump))) {
+		error = ffs_sbupdate(ump, MNT_LAZY, 0);
+		if (error != 0)
+			allerror = error;
+	}
 	return (allerror);
 }
 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150403231530.GH2379>