Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 Nov 2005 21:12:24 -0500
From:      Kris Kennaway <kris@obsecurity.org>
To:        Greg Rivers <gcr+freebsd-stable@tharned.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Recurring problem: processes block accessing UFS file system
Message-ID:  <20051122021224.GA12402@xor.obsecurity.org>
In-Reply-To: <20051121164139.T48994@w10.sac.fedex.com>
References:  <20051121164139.T48994@w10.sac.fedex.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--CE+1k2dSO48ffgeK
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Nov 21, 2005 at 05:54:09PM -0600, Greg Rivers wrote:
> I've recently put up three busy email relay hosts running 6.0-STABLE.=20
> Performance is excellent except for a nagging critical issue that keeps=
=20
> cropping up.
>=20
> /var/spool is its own file system mounted on a geom stripe of four BSD=20
> partitions (details below).  Once every two or three days all the=20
> processes accessing /var/spool block forever in disk wait.  All three=20
> machines suffer this problem.  No diagnostic messages are generated and=
=20
> the machines continue running fine otherwise, but a reboot is required to=
=20
> clear the condition.  This problem occurs during normal operation, but is=
=20
> particularly likely to occur during a backup when dump makes a snapshot.
>=20
> There doesn't appear to be a problem with gstripe, as gstripe status is=
=20
> "UP" and I can read the raw device just fine while processes continue to=
=20
> block on the file system.  I tried running a kernel with WITNESS and=20
> DIAGNOSTIC, but these options shed no light.
>=20
> If I catch the problem early enough I can break successfully into kdb;=20
> otherwise, if too many processes stack up, the machine hangs going into=
=20
> kdb and must be power-cycled.

Make sure you have KDB_STOP_NMI in your kernel.

> I obtained the following process listing and traces from kdb.  I traced
> mksnap_ffs which was blocked in "ufs", and two random sendmail processes
> that were blocked in "ufs" and "suspfs" respectively.

Looks like a UFS snapshot deadlock.  Are you running something like
dump -L on this filesystem, or making other use of snapshots?  fsck -B
also uses them, but shouldn't be running except at boot time.

You should take this up with Kirk McKusick <freebsd@McKusick.COM> - in
the meantime you can work around it by not making use of UFS
snapshots.

Kris
--CE+1k2dSO48ffgeK
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (FreeBSD)

iD8DBQFDgn6HWry0BWjoQKURAlYMAJwI/gWWLYpScm/qOv7nHraxiDp7QgCeLXok
ICacUFcnQmb3Jtm+gKOtUNw=
=Xou6
-----END PGP SIGNATURE-----

--CE+1k2dSO48ffgeK--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051122021224.GA12402>