From owner-freebsd-fs@FreeBSD.ORG Thu May 1 18:28:12 2014 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DA751DFB for ; Thu, 1 May 2014 18:28:12 +0000 (UTC) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B60391BDF for ; Thu, 1 May 2014 18:28:12 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id s41IRSpS010249; Thu, 1 May 2014 11:27:28 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201405011827.s41IRSpS010249@chez.mckusick.com> To: fs@freebsd.org, David Wolfskill Subject: Re: SU+J: 185 processes in state "suspfs" for >8 hrs. ... not good, right? In-reply-to: <20140501182057.GJ1120@albert.catwhisker.org> Date: Thu, 01 May 2014 11:27:28 -0700 From: Kirk McKusick X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 May 2014 18:28:12 -0000 > Date: Thu, 1 May 2014 11:20:57 -0700 > From: David Wolfskill > To: Kirk McKusick > Cc: fs@freebsd.org > Subject: Re: SU+J: 185 processes in state "suspfs" for >8 hrs. ... not good, > right? > > On Thu, May 01, 2014 at 09:51:43AM -0700, Kirk McKusick wrote: > >> Let me know if it helps your problem. If it does, I will MFC it to 9. >> There have been several other fixes made to SU+J that are more likely >> to be the cause of your problem, but they are not easily back-ported >> to stable/9. So if this does not fix your problem my only suggestions >> are to turn off journaling or move to running on stable/10. >> ... > > Hrrrmmm... Looks as if the above reflects stable/10's r251171 (in > particular, "Convert the bufobj lock to rwlock.") -- stable/9 doesn't > seem to know about BO_LOCKPTR(), and gcc makes some assumptions. That > doesn't turn out well. > > I think that migrating to stable/10 might make more sense than figuring > out how to fix this, especially if there are other causes of the > observed failure that are fixed in stable/10. > > Thanks.... > > Peace, > david > -- > David H. Wolfskill david@catwhisker.org > Taliban: Evil cowards with guns afraid of truth from a 14-year old girl. > > See http://www.catwhisker.org/~david/publickey.gpg for my public key. I think that you have now discovered why Jeff did not MFC to stable/9. You are correct that putting in this fix requires seriously more work. Sorry about sending you down that path. Kirk McKusick