Date: Tue, 9 Mar 2010 13:58:15 +0100 From: Pawel Jakub Dawidek <pjd@FreeBSD.org> To: Borja Marcos <borjam@sarenet.es> Cc: FreeBSD Stable <freebsd-stable@freebsd.org>, Stefan Bethke <stb@lassitu.de> Subject: Re: Many processes stuck in zfs Message-ID: <20100309125815.GF3155@garage.freebsd.pl> In-Reply-To: <EC9BC6B4-8D0E-4FE3-852F-0E3A24569D33@sarenet.es> References: <864468D4-DCE9-493B-9280-00E5FAB2A05C@lassitu.de> <20100309122954.GE3155@garage.freebsd.pl> <EC9BC6B4-8D0E-4FE3-852F-0E3A24569D33@sarenet.es>
next in thread | previous in thread | raw e-mail | index | archive | help
--+sHJum3is6Tsg7/J Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 09, 2010 at 01:57:07PM +0100, Borja Marcos wrote: >=20 > On Mar 9, 2010, at 1:29 PM, Pawel Jakub Dawidek wrote: >=20 > > On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote: > >> Over the past couple of months, I've more or less regularly observed m= achines having more and more processes stuck in the zfs wchan. The process= es never recover from that, and trying to reboot only gets the entire syste= m stuck, without any console messages. I can enter the debugger, and I hav= e saved a couple of dumps. > >>=20 > >> The situation seems to be triggered by zfs receive'ing snapshots from = the sister machine (both synchronize their active ZFS filesystems to each o= ther, using zfs send and zfs receive). It appears it's the receiving causi= ng trouble. > >>=20 > >> Both machines run 8-stable from mid-February, with a single-disk ZFS p= ool, with ARC limited to 512M, prefetch and ZIL disabled via loader.conf. > >>=20 > >> What should I be looking at to further diagnose? > >=20 > > What kind of hardware do you have there? There is 3-way deadlock I've a > > fix for which would be hard to trigger on single or dual core machines. > >=20 > > Feel free to try the fix: > >=20 > > http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch >=20 > Maybe related to the deadlock I reported when I was receiving an incremen= tal snapshot while the target dataset was being read? Could be. This deadlock is in general related to zfs recv functionality. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --+sHJum3is6Tsg7/J Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkuWReYACgkQForvXbEpPzQXUgCff7LzvckBJCEu/KzhxEwApHCe hXcAoPS1vGVYm+6SnLr4LHP3k9+tdXWq =GWQu -----END PGP SIGNATURE----- --+sHJum3is6Tsg7/J--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100309125815.GF3155>