From owner-freebsd-fs@FreeBSD.ORG  Tue Apr  1 20:19:10 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 737431065673;
	Tue,  1 Apr 2008 20:19:10 +0000 (UTC)
	(envelope-from kris@FreeBSD.org)
Received: from weak.local (freefall.freebsd.org [IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id D18048FC1C;
	Tue,  1 Apr 2008 20:19:08 +0000 (UTC)
	(envelope-from kris@FreeBSD.org)
Message-ID: <47F298C2.7040606@FreeBSD.org>
Date: Tue, 01 Apr 2008 22:19:14 +0200
From: Kris Kennaway <kris@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213)
MIME-Version: 1.0
To: Cyrus Rahman <crahman@gmail.com>
References: <9e77bdb50804011251q65eca371kc6bc9a60ac0c248@mail.gmail.com>
In-Reply-To: <9e77bdb50804011251q65eca371kc6bc9a60ac0c248@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: Trouble with snapshots
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Apr 2008 20:19:10 -0000

Cyrus Rahman wrote:
> I'm seeing serious problems with snapshot deadlocks on 7.0-RELEASE
> right now.  I haven't been able to set up a test environment to really
> determine precise details, but this much I know:  Filesystem i/o will
> eventually lock up, requiring a hard reset, after the snapshot mount
> sleeps permanently on suspfs.  Eventually there's a cascade and
> everything ends up waiting on suspfs.  Running a 'sync' after mount
> hangs is a sure way to propagate the problem.  This happens very often
> - probably 15% probability per snapshot on the server running 7.0.
> It's bad enough so that it's not realistic to use snapshots there.
> Other strange things have been observed, in that an entire day's worth
> of work vanished - after the reset/reboot the filesystems were consistent,
> but in the state they were in many hours before, at the time the snapshot
> hung.  The snapshot had been observed hanging, but everything else seemed
> to work so a decision was made to reboot at the end of the day - with
> disastrous effect!  During the day nothing unusual except for the hung
> snapshot was noticed.  I'm guessing everything just got cached (for
> hours!) and the cache never got flushed.
> 
> This is happening on a system set up with journaled ufs filesystems,
> so that may be part of the problem.  The system is running amd64 with
> an Intel Q6600.

I thought gjournal and soft updates were supposed to be mutually 
exclusive (the latter is required for UFS snapshots).  Anyway, even if 
they are supposed to work together this interaction is almost certainly 
the cause.

Kris