Date: Mon, 21 Nov 2005 21:12:24 -0500 From: Kris Kennaway <kris@obsecurity.org> To: Greg Rivers <gcr+freebsd-stable@tharned.org> Cc: freebsd-stable@freebsd.org Subject: Re: Recurring problem: processes block accessing UFS file system Message-ID: <20051122021224.GA12402@xor.obsecurity.org> In-Reply-To: <20051121164139.T48994@w10.sac.fedex.com> References: <20051121164139.T48994@w10.sac.fedex.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--CE+1k2dSO48ffgeK Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Nov 21, 2005 at 05:54:09PM -0600, Greg Rivers wrote: > I've recently put up three busy email relay hosts running 6.0-STABLE.=20 > Performance is excellent except for a nagging critical issue that keeps= =20 > cropping up. >=20 > /var/spool is its own file system mounted on a geom stripe of four BSD=20 > partitions (details below). Once every two or three days all the=20 > processes accessing /var/spool block forever in disk wait. All three=20 > machines suffer this problem. No diagnostic messages are generated and= =20 > the machines continue running fine otherwise, but a reboot is required to= =20 > clear the condition. This problem occurs during normal operation, but is= =20 > particularly likely to occur during a backup when dump makes a snapshot. >=20 > There doesn't appear to be a problem with gstripe, as gstripe status is= =20 > "UP" and I can read the raw device just fine while processes continue to= =20 > block on the file system. I tried running a kernel with WITNESS and=20 > DIAGNOSTIC, but these options shed no light. >=20 > If I catch the problem early enough I can break successfully into kdb;=20 > otherwise, if too many processes stack up, the machine hangs going into= =20 > kdb and must be power-cycled. Make sure you have KDB_STOP_NMI in your kernel. > I obtained the following process listing and traces from kdb. I traced > mksnap_ffs which was blocked in "ufs", and two random sendmail processes > that were blocked in "ufs" and "suspfs" respectively. Looks like a UFS snapshot deadlock. Are you running something like dump -L on this filesystem, or making other use of snapshots? fsck -B also uses them, but shouldn't be running except at boot time. You should take this up with Kirk McKusick <freebsd@McKusick.COM> - in the meantime you can work around it by not making use of UFS snapshots. Kris --CE+1k2dSO48ffgeK Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFDgn6HWry0BWjoQKURAlYMAJwI/gWWLYpScm/qOv7nHraxiDp7QgCeLXok ICacUFcnQmb3Jtm+gKOtUNw= =Xou6 -----END PGP SIGNATURE----- --CE+1k2dSO48ffgeK--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051122021224.GA12402>