Date: Mon, 24 Apr 2006 14:15:31 -0400 From: Kris Kennaway <kris@obsecurity.org> To: Dmitry Morozovsky <marck@rinet.ru> Cc: stable@freebsd.org, Kris Kennaway <kris@obsecurity.org> Subject: Re: fsck_ufs locked in snaplk Message-ID: <20060424181531.GA13774@xor.obsecurity.org> In-Reply-To: <20060424215650.P36233@woozle.rinet.ru> References: <20060423193208.N1187@woozle.rinet.ru> <20060423201732.GA74905@xor.obsecurity.org> <20060424091803.L20593@woozle.rinet.ru> <20060424215650.P36233@woozle.rinet.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
--DocE+STaALJfprDB Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 24, 2006 at 10:04:57PM +0400, Dmitry Morozovsky wrote: > On Mon, 24 Apr 2006, Dmitry Morozovsky wrote: >=20 > DM> kKK> > one of my servers had to be rebooted uncleanly and then I have= backgrounded=20 > DM> KK> > fsck locked for more than an our in snaplk: > DM> KK> >=20 > DM> KK> > 742 root 1 -4 4 1320K 688K snaplk 0:02 0.00% = fsck_ufs > DM> KK> >=20 > DM> KK> > File system in question is 200G gmirror on SATA. Usually making= a snapshot=20 > DM> KK> > (e.g., for making dumps) consumes 3-4 minutes for that fs, so i= t seems to me=20 > DM> KK> > that filesystem is in a deadlock. > DM> KK>=20 > DM> KK> Is the process performing I/O? Background fsck deliberately runs= at a > DM> KK> slow rate so it does not destroy I/O performance on the rest of t= he > DM> KK> system. > DM>=20 > DM> Nope. For that case, 50+ smbds had been locked in 'ufs' state, so I'v= e been=20 > DM> urged to revive the machine and reboot, turning off bgfsck. > DM>=20 > DM> This night, dump -L locks in the same position on the same filesystem: > DM>=20 > DM> 0 2887 2886 0 -4 0 1260 692 snaplk D ?? 0:01.28=20 > DM> /sbin/mksnap_ffs root 0.0 0.1 5:19AM > DM>=20 > DM> it has been started at 5:19am, and now is 9:20 - no disk activity > DM>=20 > DM>=20 > DM> For the reference: it's fresh RELENG_6_1/i386. >=20 > Just rechecked it: did mksnap_ffs on an otherwise idle file system: >=20 > marck@office:/> mksnap_ffs /st /st/.snap/test_snapshot > load: 0.02 cmd: mksnap_ffs 4012 [biord] 0.00u 0.04s 0% 696k > load: 0.04 cmd: mksnap_ffs 4012 [biord] 0.00u 0.44s 0% 696k > load: 0.21 cmd: mksnap_ffs 4012 [snaprdb] 0.00u 1.17s 0% 696k > load: 0.20 cmd: mksnap_ffs 4012 [snaprdb] 0.00u 1.23s 0% 696k > load: 0.13 cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k > load: 0.08 cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k > load: 0.01 cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k >=20 > (I hit ^T several times) >=20 > biord phase consumes about 1.5-2 mins, > snaprdb phase - about 30-40 secs, and then process died. Most disk reques= ts > succeeds; however, accessing /st/.snap locks process in ufs state forever. >=20 > What bothers me most is that it is the only machine reproducibly hangs in= =20 > snapshots, and it did not hang before RELENG_5 -> RELENG_6 upgrade. Other= =20 > RELENG_6 machines do snapshot backups flawlessly (knock-on-wood!) Are you quite certain it's running up-to-date RELENG_6_1? All known snapshot deadlock issues were believed to have been fixed a few weeks ago. If so, we might need you to enable extra debugging to track this down. Kris --DocE+STaALJfprDB Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (FreeBSD) iD8DBQFETRXDWry0BWjoQKURAiHAAJ9Dlg3ehtWg06XT6ERDLL2iwDR63QCgrWcO MynIxdzcBJxfC6iWGfzMsDg= =m4+C -----END PGP SIGNATURE----- --DocE+STaALJfprDB--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060424181531.GA13774>