From owner-freebsd-fs@FreeBSD.ORG Thu May 1 16:52:55 2014 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BF1D1494 for ; Thu, 1 May 2014 16:52:55 +0000 (UTC) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 62FDE12BF for ; Thu, 1 May 2014 16:52:55 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id s41GphgX089174; Thu, 1 May 2014 09:51:43 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201405011651.s41GphgX089174@chez.mckusick.com> To: David Wolfskill Subject: Re: SU+J: 185 processes in state "suspfs" for >8 hrs. ... not good, right? In-reply-to: <20140501161856.GH1120@albert.catwhisker.org> Date: Thu, 01 May 2014 09:51:43 -0700 From: Kirk McKusick Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 May 2014 16:52:55 -0000 > Date: Thu, 1 May 2014 09:18:56 -0700 > From: David Wolfskill > To: fs@freebsd.org > Subject: SU+J: 185 processes in state "suspfs" for >8 hrs. .. not good, right? > > I'm probably abusing things somewhat, but limits are to be pushed, > yeah...? :-} > > At work, we have some build servers, presently running FreeBSD/amd64 > stable/9 @r257221. They have 2 "packages" with 6 cores each (Xeon(R) > CPU X5690 @ 3.47GHz); SMT is enabled, so the scheduler sees 24 > cores. The local "build space" is a RAID 5 array of 10 2TB drives > with a single UFS2+SU file system on it (~15TB). The software > builds are performed within a jail (that is intended to look like > FreeBSD/i386 7.1-RELEASE). > > ... The following fix for related problems was made to head and MFC'ed to stable/10 but not stable/9. *** stable/9/sys/ufs/ffs/ffs_vnops.c 2014-03-05 08:51:48.000000000 -0800 --- stable/9/sys/ufs/ffsffs_vnops.c 2014-05-01 09:41:35.000000000 -0700 *************** *** 258,266 **** continue; if (bp->b_lblkno > lbn) panic("ffs_syncvnode: syncing truncated data."); ! if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL)) continue; - BO_UNLOCK(bo); if ((bp->b_flags & B_DELWRI) == 0) panic("ffs_fsync: not dirty"); /* --- 258,274 ---- continue; if (bp->b_lblkno > lbn) panic("ffs_syncvnode: syncing truncated data."); ! if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL) == 0) { ! BO_UNLOCK(bo); ! } else if (wait != 0) { ! if (BUF_LOCK(bp, ! LK_EXCLUSIVE | LK_SLEEPFAIL | LK_INTERLOCK, ! BO_LOCKPTR(bo)) != 0) { ! bp->b_vflags &= ~BV_SCANNED; ! goto next; ! } ! } else continue; if ((bp->b_flags & B_DELWRI) == 0) panic("ffs_fsync: not dirty"); /* The associated comment is: If we fail to do a non-blocking acquire of a buf lock while doing a waiting sync pass we need to do a blocking acquire and restart. Another thread, typically the buf daemon, may have this buf locked and if we don't wait we can fail to sync the file. This lead to a great variety of softdep panics and deadlocks because we rely on all dependencies being flushed before proceeding in several cases. Let me know if it helps your problem. If it does, I will MFC it to 9. There have been several other fixes made to SU+J that are more likely to be the cause of your problem, but they are not easily back-ported to stable/9. So if this does not fix your problem my only suggestions are to turn off journaling or move to running on stable/10. Kirk McKusick