From owner-freebsd-stable@FreeBSD.ORG Thu Nov 13 13:21:15 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D5BF3106567F; Thu, 13 Nov 2008 13:21:15 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.terabit.net.ua (mail.terabit.net.ua [195.137.202.147]) by mx1.freebsd.org (Postfix) with ESMTP id 6A4DA8FC08; Thu, 13 Nov 2008 13:21:15 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from skuns.zoral.com.ua ([91.193.166.194] helo=mail.zoral.com.ua) by mail.terabit.net.ua with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63 (FreeBSD)) (envelope-from ) id 1L0c8G-000HdC-VE; Thu, 13 Nov 2008 15:21:13 +0200 Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id mADDLA1L002595 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 13 Nov 2008 15:21:10 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3) with ESMTP id mADDLA64032199; Thu, 13 Nov 2008 15:21:10 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3/Submit) id mADDL9qo032198; Thu, 13 Nov 2008 15:21:09 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 13 Nov 2008 15:21:09 +0200 From: Kostik Belousov To: Jeremy Chadwick Message-ID: <20081113132109.GT47073@deviant.kiev.zoral.com.ua> References: <20081112175826.GD26195@carrick.bishnet.net> <20081112194735.GK47073@deviant.kiev.zoral.com.ua> <20081113004102.GD24360@carrick.bishnet.net> <20081113044200.GA10419@icarus.home.lan> <20081113102642.GQ47073@deviant.kiev.zoral.com.ua> <20081113104514.GA17589@icarus.home.lan> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="wYXww9TlNKyqAMAe" Content-Disposition: inline In-Reply-To: <20081113104514.GA17589@icarus.home.lan> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: ClamAV version 0.93.3, clamav-milter version 0.93.3 on skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua X-Virus-Scanned: mail.terabit.net.ua 1L0c8G-000HdC-VE d5683dc112945d4236afcf77baaff217 X-Terabit: YES Cc: Tim Bishop , freebsd-stable@freebsd.org Subject: Re: System deadlock when using mksnap_ffs X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Nov 2008 13:21:15 -0000 --wYXww9TlNKyqAMAe Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Nov 13, 2008 at 02:45:14AM -0800, Jeremy Chadwick wrote: > On Thu, Nov 13, 2008 at 12:26:42PM +0200, Kostik Belousov wrote: > > On Wed, Nov 12, 2008 at 08:42:00PM -0800, Jeremy Chadwick wrote: > > > On Thu, Nov 13, 2008 at 12:41:02AM +0000, Tim Bishop wrote: > > > > On Wed, Nov 12, 2008 at 09:47:35PM +0200, Kostik Belousov wrote: > > > > > On Wed, Nov 12, 2008 at 05:58:26PM +0000, Tim Bishop wrote: > > > > > > I've been playing around with snapshots lately but I've got a p= roblem on > > > > > > one of my servers running 7-STABLE amd64: > > > > > >=20 > > > > > > FreeBSD paladin 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #8: Mon N= ov 10 20:49:51 GMT 2008 tdb@paladin:/usr/obj/usr/src/sys/PALADIN amd64 > > > > > >=20 > > > > > > I run the mksnap_ffs command to take the snapshot and some time= later > > > > > > the system completely freezes up: > > > > > >=20 > > > > > > paladin# cd /u2/.snap/ > > > > > > paladin# mksnap_ffs /u2 test.1 > > > > > >=20 > > > > > > It only happens on this one filesystem, though, which might be = to do > > > > > > with its size. It's not over the 2TB marker, but it's pretty cl= ose. It's > > > > > > also backed by a hardware RAID system, although a smaller files= ystem on > > > > > > the same RAID has no issues. > > > > > >=20 > > > > > > Filesystem 1K-blocks Used Avail Capacity Mounted on > > > > > > /dev/da0s1a 2078881084 921821396 990749202 48% /u2 > > > > > >=20 > > > > > > To clarify "completely freezes up": unresponsive to all service= s over > > > > > > the network, except ping. On the console I can switch between t= he ttys, > > > > > > but none of them respond. The only way out is to hit the reset = button. > > > > >=20 > > > > > You need to provide information described in the > > > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handb= ook/kerneldebug.html > > > > > and especially > > > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handb= ook/kerneldebug-deadlocks.html > > > >=20 > > > > Ok, I've done that, and removed the patch that seemed to fix things. > > > >=20 > > > > The first thing I notice after doing this on the console is that I = can > > > > still ctrl+t the process: > > > >=20 > > > > load: 0.14 cmd: mksnap_ffs 2603 [newbuf] 0.00u 10.75s 0% 1160k > > > >=20 > > > > But the top and ps I left running on other ttys have all stopped > > > > responding. > > >=20 > > > Then in my book, the patch didn't fix anything. :-) The system is > > > still "deadlocking"; snapshot generation **should not** wedge the sys= tem > > > hard like this. > > You systematically mix two completely different issues: > > - first one is the _deadlock_ experienced by Tim; >=20 > Re-read what he wrote. Quote: >=20 > "Ok, I've done that, and removed the patch that seemed to fix things. >=20 > The first thing I notice after doing this on the console is that I can > still ctrl+t the process: >=20 > load: 0.14 cmd: mksnap_ffs 2603 [newbuf] 0.00u 10.75s 0% 1160k >=20 > But the top and ps I left running on other ttys have all stopped > responding." >=20 > If he can press Control-T, it means SIGINFO can be sent to the > mksnap_ffs process, and the process responds with that information. So, > the system is not deadlocked -- meaning, I believe what he experiences > is what others experience (the system becomes completely unusable during > mksnap_ffs running, but DOES NOT hang or lock up, it just becomes so > god-awful slow that processes on the machine literally sit and spin for > minutes at a time). Unless NOKERNINFO is specified in the local flags in the controlling terminal termios, kernel prints one line summary as shown above. This is done from the tty discipline input handler (or whatever it is in new tty code). No process cooperation is required. On the other hand, actually delivering SIGINFO and getting output from the process-installed handler do require process to either executing usermode or sleeping interruptible. --wYXww9TlNKyqAMAe Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEARECAAYFAkkcKcQACgkQC3+MBN1Mb4gaUQCfbjbNGNyPx86eJrw9QlxfJ+rA sv4An32gV1qu4OUDxqO9RQYSlNkcqu8R =K4VS -----END PGP SIGNATURE----- --wYXww9TlNKyqAMAe--