Date: Tue, 8 Jan 2013 02:12:31 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Dmitry Morozovsky <marck@rinet.ru> Cc: freebsd-fs@freebsd.org Subject: Re: zfs -> ufs rsync: livelock in wdrain state Message-ID: <20130108001231.GB82219@kib.kiev.ua> In-Reply-To: <alpine.BSF.2.00.1301080013520.7949@woozle.rinet.ru> References: <alpine.BSF.2.00.1301080013520.7949@woozle.rinet.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
--gPVs24VLDFKgHP1I Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jan 08, 2013 at 12:19:15AM +0400, Dmitry Morozovsky wrote: > Dear colleagues, >=20 > I have archive server with pretty large ZFS (24*2T in single raidz2 raidg= roup) >=20 > Sometimes we moved really old archives to external SATA drives, which are= =20 > formatted with UFS2/SU. Files are copied via rsync >=20 > The system in question is stable/8; upgrade to stable/9 is planned, but n= ot yet=20 > completed. >=20 > Now, during last rsync, the process is stuck as >=20 > dump.2012062219.bin.gz > 3208015437 100% 102.42MB/s 0:00:29 (xfer#66, to-check=3D196/721) > dump.2012062220.bin.gz > load: 0.01 cmd: rsync 47543 [wdrain] 1904.69r 443.01u 241.12s 0% 1736k > ^C > rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(645= )=20 > [sender=3D3.0.9] >=20 > As we can see, rsync writer stops in wdrain state. >=20 > I terminated it by ^C in terminal session, as it was not autogenerated=20 > backup. >=20 > Now, zfs and other system is working seemingly well, but trying to sync= =20 > manually stucks console forever: >=20 > root@moose:/ar# sync > load: 0.00 cmd: sync 67229 [wdrain] 468.17r 0.00u 0.00s 0% 596k >=20 > Any hints? Quick searching throug freebsd mailing lists and/or open PRs d= oes=20 > not reveal much. >=20 Are there any kernel messages about the disk system ? The wdrain means that the amount of the dirty buffers accumulated exceeds the allowed maximum. The transient 'wdrain' state is normal on a machine doing lot of writes to a filesystem using buffer cache, say UFS. Failure to clean the dirty buffers is usually related to the disk i/o stalling. It cannot be denied that a bug could cause stuck 'wdrain' state, but in the last five or so years all the cases I investigated were due to disks. --gPVs24VLDFKgHP1I Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJQ62RuAAoJEJDCuSvBvK1BQbAP/2bUyXPL/GfvgXG/GiaIWBZm 75vlOyeNlQ7+zAR+Z++BmQUCnNPCSAbzEDlmfJ4nxcCCFBG/2slDdcHUsMr6osu5 /20G9UaBRt+tvjhlXiIAU6JgIKyv3o/DDEVTd4RW1lJmVDlFPQVqD9EK4tq/HITf BefQVznBHZHCyBs93YapOtghpJak81/nIMBTwLHe2lTuMTRaP1R8lhqK8TeputHr FcC70CyBwPz1oJqyHVu1fOcqMUWXZOGn0rlYmtv236Ba8z7W5p8wiSw70o4JSrqJ KN4rTzwtC8NsG7c/TaeAqzrMeSnvjBMwIC9SuoK1xhxUZxzCrZklrQEgaVeO2g6V BH4+1yEZDUPdXBvS+7TKA2fHd8cGdGFnil4mkMY2xRt9zpOPg5rrNP0Ubc4/3C+d wDj0LKPE/Uiq2LFlJQxg8cD8yyzoIb7T+4AuFqelGnwkvpgbbq7AQtXedY8afwBq qdeW2Zb3l3qMsF/IUoa1UFtQNPK4hLfcOuATVTPGufyCOwLwNIq13EQwsTQaxJc5 v9l9cU4m3pUybqAGFfMYkM7/W2jd/v9dfMhN9P2pz8HP5UzyoWNfMNYaNaYmd5eZ OeeHyOmPYpkMWlAK/ok+AIDV+qOxynqM532BzK85uk4BWM7Hi8yncT2wxer9N+NZ t5O43VdHbtTQIut0ZWPs =Urrw -----END PGP SIGNATURE----- --gPVs24VLDFKgHP1I--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130108001231.GB82219>