From owner-freebsd-fs@FreeBSD.ORG  Thu May  1 16:52:55 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id BF1D1494
 for <fs@freebsd.org>; Thu,  1 May 2014 16:52:55 +0000 (UTC)
Received: from chez.mckusick.com (chez.mckusick.com
 [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 62FDE12BF
 for <fs@freebsd.org>; Thu,  1 May 2014 16:52:55 +0000 (UTC)
Received: from chez.mckusick.com (localhost [127.0.0.1])
 by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id s41GphgX089174;
 Thu, 1 May 2014 09:51:43 -0700 (PDT)
 (envelope-from mckusick@chez.mckusick.com)
Message-Id: <201405011651.s41GphgX089174@chez.mckusick.com>
To: David Wolfskill <david@catwhisker.org>
Subject: Re: SU+J: 185 processes in state "suspfs" for >8 hrs. ... not good,
 right? 
In-reply-to: <20140501161856.GH1120@albert.catwhisker.org> 
Date: Thu, 01 May 2014 09:51:43 -0700
From: Kirk McKusick <mckusick@mckusick.com>
Cc: fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 May 2014 16:52:55 -0000

> Date: Thu, 1 May 2014 09:18:56 -0700
> From: David Wolfskill <david@catwhisker.org>
> To: fs@freebsd.org
> Subject: SU+J: 185 processes in state "suspfs" for >8 hrs. .. not good, right?
> 
> I'm probably abusing things somewhat, but limits are to be pushed,
> yeah...? :-}
> 
> At work, we have some build servers, presently running FreeBSD/amd64
> stable/9 @r257221.  They have 2 "packages" with 6 cores each (Xeon(R)
> CPU X5690  @ 3.47GHz); SMT is enabled, so the scheduler sees 24
> cores.  The local "build space" is a RAID 5 array of 10 2TB drives
> with a single UFS2+SU file system on it (~15TB).  The software
> builds are performed within a jail (that is intended to look like
> FreeBSD/i386 7.1-RELEASE).
> 
> ...

The following fix for related problems was made to head and MFC'ed
to stable/10 but not stable/9.

*** stable/9/sys/ufs/ffs/ffs_vnops.c	2014-03-05 08:51:48.000000000 -0800
--- stable/9/sys/ufs/ffsffs_vnops.c	2014-05-01 09:41:35.000000000 -0700
***************
*** 258,266 ****
  			continue;
  		if (bp->b_lblkno > lbn)
  			panic("ffs_syncvnode: syncing truncated data.");
! 		if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL))
  			continue;
- 		BO_UNLOCK(bo);
  		if ((bp->b_flags & B_DELWRI) == 0)
  			panic("ffs_fsync: not dirty");
  		/*
--- 258,274 ----
  			continue;
  		if (bp->b_lblkno > lbn)
  			panic("ffs_syncvnode: syncing truncated data.");
! 		if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL) == 0) {
! 			BO_UNLOCK(bo);
! 		} else if (wait != 0) {
! 			if (BUF_LOCK(bp,
! 			    LK_EXCLUSIVE | LK_SLEEPFAIL | LK_INTERLOCK,
! 			    BO_LOCKPTR(bo)) != 0) {
! 				bp->b_vflags &= ~BV_SCANNED;
! 				goto next;
! 			}
! 		} else
  			continue;
  		if ((bp->b_flags & B_DELWRI) == 0)
  			panic("ffs_fsync: not dirty");
  		/*

The associated comment is:

    If we fail to do a non-blocking acquire of a buf lock while doing a
    waiting sync pass we need to do a blocking acquire and restart.
    Another thread, typically the buf daemon, may have this buf locked and
    if we don't wait we can fail to sync the file.  This lead to a great
    variety of softdep panics and deadlocks because we rely on all
    dependencies being flushed before proceeding in several cases.

Let me know if it helps your problem. If it does, I will MFC it to 9.
There have been several other fixes made to SU+J that are more likely
to be the cause of your problem, but they are not easily back-ported
to stable/9. So if this does not fix your problem my only suggestions
are to turn off journaling or move to running on stable/10.

    Kirk McKusick