Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Nov 2008 15:21:09 +0200
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Jeremy Chadwick <koitsu@freebsd.org>
Cc:        Tim Bishop <tim@bishnet.net>, freebsd-stable@freebsd.org
Subject:   Re: System deadlock when using mksnap_ffs
Message-ID:  <20081113132109.GT47073@deviant.kiev.zoral.com.ua>
In-Reply-To: <20081113104514.GA17589@icarus.home.lan>
References:  <20081112175826.GD26195@carrick.bishnet.net> <20081112194735.GK47073@deviant.kiev.zoral.com.ua> <20081113004102.GD24360@carrick.bishnet.net> <20081113044200.GA10419@icarus.home.lan> <20081113102642.GQ47073@deviant.kiev.zoral.com.ua> <20081113104514.GA17589@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help

--wYXww9TlNKyqAMAe
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Nov 13, 2008 at 02:45:14AM -0800, Jeremy Chadwick wrote:
> On Thu, Nov 13, 2008 at 12:26:42PM +0200, Kostik Belousov wrote:
> > On Wed, Nov 12, 2008 at 08:42:00PM -0800, Jeremy Chadwick wrote:
> > > On Thu, Nov 13, 2008 at 12:41:02AM +0000, Tim Bishop wrote:
> > > > On Wed, Nov 12, 2008 at 09:47:35PM +0200, Kostik Belousov wrote:
> > > > > On Wed, Nov 12, 2008 at 05:58:26PM +0000, Tim Bishop wrote:
> > > > > > I've been playing around with snapshots lately but I've got a p=
roblem on
> > > > > > one of my servers running 7-STABLE amd64:
> > > > > >=20
> > > > > > FreeBSD paladin 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #8: Mon N=
ov 10 20:49:51 GMT 2008 tdb@paladin:/usr/obj/usr/src/sys/PALADIN  amd64
> > > > > >=20
> > > > > > I run the mksnap_ffs command to take the snapshot and some time=
 later
> > > > > > the system completely freezes up:
> > > > > >=20
> > > > > > paladin# cd /u2/.snap/
> > > > > > paladin# mksnap_ffs /u2 test.1
> > > > > >=20
> > > > > > It only happens on this one filesystem, though, which might be =
to do
> > > > > > with its size. It's not over the 2TB marker, but it's pretty cl=
ose. It's
> > > > > > also backed by a hardware RAID system, although a smaller files=
ystem on
> > > > > > the same RAID has no issues.
> > > > > >=20
> > > > > > Filesystem  1K-blocks       Used     Avail Capacity  Mounted on
> > > > > > /dev/da0s1a 2078881084 921821396 990749202    48%    /u2
> > > > > >=20
> > > > > > To clarify "completely freezes up": unresponsive to all service=
s over
> > > > > > the network, except ping. On the console I can switch between t=
he ttys,
> > > > > > but none of them respond. The only way out is to hit the reset =
button.
> > > > >=20
> > > > > You need to provide information described in the
> > > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handb=
ook/kerneldebug.html
> > > > > and especially
> > > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handb=
ook/kerneldebug-deadlocks.html
> > > >=20
> > > > Ok, I've done that, and removed the patch that seemed to fix things.
> > > >=20
> > > > The first thing I notice after doing this on the console is that I =
can
> > > > still ctrl+t the process:
> > > >=20
> > > > load: 0.14  cmd: mksnap_ffs 2603 [newbuf] 0.00u 10.75s 0% 1160k
> > > >=20
> > > > But the top and ps I left running on other ttys have all stopped
> > > > responding.
> > >=20
> > > Then in my book, the patch didn't fix anything.  :-)  The system is
> > > still "deadlocking"; snapshot generation **should not** wedge the sys=
tem
> > > hard like this.
> > You systematically mix two completely different issues:
> > - first one is the _deadlock_ experienced by Tim;
>=20
> Re-read what he wrote.  Quote:
>=20
> "Ok, I've done that, and removed the patch that seemed to fix things.
>=20
> The first thing I notice after doing this on the console is that I can
> still ctrl+t the process:
>=20
> load: 0.14  cmd: mksnap_ffs 2603 [newbuf] 0.00u 10.75s 0% 1160k
>=20
> But the top and ps I left running on other ttys have all stopped
> responding."
>=20
> If he can press Control-T, it means SIGINFO can be sent to the
> mksnap_ffs process, and the process responds with that information.  So,
> the system is not deadlocked -- meaning, I believe what he experiences
> is what others experience (the system becomes completely unusable during
> mksnap_ffs running, but DOES NOT hang or lock up, it just becomes so
> god-awful slow that processes on the machine literally sit and spin for
> minutes at a time).

Unless NOKERNINFO is specified in the local flags in the controlling
terminal termios, kernel prints one line summary as shown above. This is
done from the tty discipline input handler (or whatever it is in new tty
code). No process cooperation is required. On the other hand, actually
delivering SIGINFO and getting output from the process-installed
handler do require process to either executing usermode or sleeping
interruptible.

--wYXww9TlNKyqAMAe
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (FreeBSD)

iEYEARECAAYFAkkcKcQACgkQC3+MBN1Mb4gaUQCfbjbNGNyPx86eJrw9QlxfJ+rA
sv4An32gV1qu4OUDxqO9RQYSlNkcqu8R
=K4VS
-----END PGP SIGNATURE-----

--wYXww9TlNKyqAMAe--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081113132109.GT47073>