Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Apr 2006 14:15:31 -0400
From:      Kris Kennaway <kris@obsecurity.org>
To:        Dmitry Morozovsky <marck@rinet.ru>
Cc:        stable@freebsd.org, Kris Kennaway <kris@obsecurity.org>
Subject:   Re: fsck_ufs locked in snaplk
Message-ID:  <20060424181531.GA13774@xor.obsecurity.org>
In-Reply-To: <20060424215650.P36233@woozle.rinet.ru>
References:  <20060423193208.N1187@woozle.rinet.ru> <20060423201732.GA74905@xor.obsecurity.org> <20060424091803.L20593@woozle.rinet.ru> <20060424215650.P36233@woozle.rinet.ru>

next in thread | previous in thread | raw e-mail | index | archive | help

--DocE+STaALJfprDB
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Apr 24, 2006 at 10:04:57PM +0400, Dmitry Morozovsky wrote:
> On Mon, 24 Apr 2006, Dmitry Morozovsky wrote:
>=20
> DM> kKK> > one of my servers had to be rebooted uncleanly and then I have=
 backgrounded=20
> DM> KK> > fsck locked for more than an our in snaplk:
> DM> KK> >=20
> DM> KK> > 742 root         1  -4    4  1320K   688K snaplk   0:02  0.00% =
fsck_ufs
> DM> KK> >=20
> DM> KK> > File system in question is 200G gmirror on SATA. Usually making=
 a snapshot=20
> DM> KK> > (e.g., for making dumps) consumes 3-4 minutes for that fs, so i=
t seems to me=20
> DM> KK> > that filesystem is in a deadlock.
> DM> KK>=20
> DM> KK> Is the process performing I/O?  Background fsck deliberately runs=
 at a
> DM> KK> slow rate so it does not destroy I/O performance on the rest of t=
he
> DM> KK> system.
> DM>=20
> DM> Nope. For that case, 50+ smbds had been locked in 'ufs' state, so I'v=
e been=20
> DM> urged to revive the machine and reboot, turning off bgfsck.
> DM>=20
> DM> This night, dump -L locks in the same position on the same filesystem:
> DM>=20
> DM> 0  2887  2886   0  -4  0  1260   692 snaplk D     ??    0:01.28=20
> DM> /sbin/mksnap_ffs root    0.0  0.1  5:19AM
> DM>=20
> DM> it has been started at 5:19am, and now is 9:20 - no disk activity
> DM>=20
> DM>=20
> DM> For the reference: it's fresh RELENG_6_1/i386.
>=20
> Just rechecked it: did mksnap_ffs on an otherwise idle file system:
>=20
> marck@office:/> mksnap_ffs /st /st/.snap/test_snapshot
> load: 0.02  cmd: mksnap_ffs 4012 [biord] 0.00u 0.04s 0% 696k
> load: 0.04  cmd: mksnap_ffs 4012 [biord] 0.00u 0.44s 0% 696k
> load: 0.21  cmd: mksnap_ffs 4012 [snaprdb] 0.00u 1.17s 0% 696k
> load: 0.20  cmd: mksnap_ffs 4012 [snaprdb] 0.00u 1.23s 0% 696k
> load: 0.13  cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k
> load: 0.08  cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k
> load: 0.01  cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k
>=20
> (I hit ^T several times)
>=20
> biord phase consumes about 1.5-2 mins,
> snaprdb phase - about 30-40 secs, and then process died. Most disk reques=
ts
> succeeds; however, accessing /st/.snap locks process in ufs state forever.
>=20
> What bothers me most is that it is the only machine reproducibly hangs in=
=20
> snapshots, and it did not hang before RELENG_5 -> RELENG_6 upgrade. Other=
=20
> RELENG_6 machines do snapshot backups flawlessly (knock-on-wood!)

Are you quite certain it's running up-to-date RELENG_6_1?  All known
snapshot deadlock issues were believed to have been fixed a few weeks
ago.  If so, we might need you to enable extra debugging to track this
down.

Kris

--DocE+STaALJfprDB
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (FreeBSD)

iD8DBQFETRXDWry0BWjoQKURAiHAAJ9Dlg3ehtWg06XT6ERDLL2iwDR63QCgrWcO
MynIxdzcBJxfC6iWGfzMsDg=
=m4+C
-----END PGP SIGNATURE-----

--DocE+STaALJfprDB--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060424181531.GA13774>