From owner-freebsd-current@FreeBSD.ORG Wed Jan 11 09:30:55 2012 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 65BEF106564A; Wed, 11 Jan 2012 09:30:55 +0000 (UTC) (envelope-from lists@yamagi.org) Received: from mail.yamagi.org (unknown [IPv6:2a01:4f8:121:2102:1::7]) by mx1.freebsd.org (Postfix) with ESMTP id DB7148FC15; Wed, 11 Jan 2012 09:30:54 +0000 (UTC) Received: from happy.home.yamagi.org (f054061055.adsl.alicedsl.de [78.54.61.55]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.yamagi.org (Postfix) with ESMTPSA id 449031666334; Wed, 11 Jan 2012 10:30:50 +0100 (CET) Date: Wed, 11 Jan 2012 10:30:39 +0100 From: Yamagi Burmeister To: mckusick@mckusick.com Message-Id: <20120111103039.d342aef4.lists@yamagi.org> In-Reply-To: <201201101830.q0AIUDP7062707@chez.mckusick.com> References: <20120109183051.1e4de3ca.lists@yamagi.org> <201201101830.q0AIUDP7062707@chez.mckusick.com> X-Mailer: Sylpheed 3.1.2 (GTK+ 2.24.6; amd64-portbld-freebsd9.0) Mime-Version: 1.0 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg="PGP-SHA1"; boundary="Signature=_Wed__11_Jan_2012_10_30_39_+0100_MZEwDsONQMm2zskF" Cc: freebsd-current@freebsd.org, bryce@bryce.net Subject: Re: FS hang when creating snapshots on a UFS SU+J setup X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 09:30:55 -0000 --Signature=_Wed__11_Jan_2012_10_30_39_+0100_MZEwDsONQMm2zskF Content-Type: text/plain; charset=US-ASCII Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello, I've done some tests to verify that the problem only occures when SU+J is used, but not SU without J. In fact, I did run the following two loops on different TTYs in parallel: while 1 cp -r /usr/src /root rm -Rf /root/src end while 1 mksnap_ffs / /.snap/snap rm -f /.snap/snap end With SU without J the system survives this for at least 1 hour. But as soon as SU+J is used it most likely deadlocks or even panics in the first 1 or 2 minutes. What extactly happens seems to vary... In most cases the system just deadlocks, sometimes like alain@bsdgate.org descripes and sometimes it's completely unresponsive to any input.=20 I've seen kernel messages like "fsync: giving up on dirty". Several times the system paniced. In most cases printing the generic "panic: page fault while in kernel mode" and one time printing=20 "panic: snapacct_ufs2: bad block". I've never seen the same backtrace twice. One time the system suddenly rebooted, like a tripple fault or something like that happend. Since it's much more likely that the problems described above arrise when the the filesystem is loaded (for example by the first loop) while taking the snapshot this looks like some kind of race condition or something like that.=20 Some more information from an older debug session can be found at: http://deponie.yamagi.org/freebsd/debug/snapshots_panic/ On Tue, 10 Jan 2012 10:30:13 -0800 Kirk McKusick wrote: > > Date: Mon, 9 Jan 2012 18:30:51 +0100 > > From: Yamagi Burmeister > > To: jeff@freebsd.org, mckusick@freebsd.org > > Cc: freebsd-current@freebsd.org, bryce@bryce.net > > Subject: Re: FS hang when creating snapshots on a UFS SU+J setup > >=20 > > Hello, > >=20 > > I'm sorry to bother you, but you may not be aware of this thread and > > this problem. We are several people experiencing deadlocks, kernel > > panics and other problems when creating sanpshots on file systems > > with SU+J. It would be nice to get some feedback, e.g. how can we > > help debugging and / or fixing this problem. > >=20 > > Thank you, > > Yamagi >=20 > First step in debugging is to find out if the problem is SU+J > specific. To find out, turn off SU+J but leave SU. This change > is done by running: >=20 > umount > tunefs -j disable > mount > cd > rm .sujournal >=20 > You may want to run `fsck -f' on the filesystem while you have > it unmounted just to be sure that it is clean. Then run your > snapshot request to see if it still fails. If it works, then > we have narrowed the problem down to something related to SU+J. > If it fails then we have a broader issue to deal with. >=20 > If you wish to go back to using SU+J after the test, you can > reenable SU+J by running: >=20 > umount > tunefs -j enable > mount >=20 > When responding to me, it is best to use my > email as I tend to read it more regularly. >=20 > Kirk McKusick >=20 --=20 Homepage: www.yamagi.org XMPP: yamagi@yamagi.org GnuPG/GPG: 0xEFBCCBCB --Signature=_Wed__11_Jan_2012_10_30_39_+0100_MZEwDsONQMm2zskF Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk8NVskACgkQWTjlg++8y8sZmQCg5cPr/75mNEicRDUs8izR061u V5sAoL414rVpi5hYaAB48YVOJIwSrFhV =lN7c -----END PGP SIGNATURE----- --Signature=_Wed__11_Jan_2012_10_30_39_+0100_MZEwDsONQMm2zskF--