Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Jul 2006 14:24:24 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        User Freebsd <freebsd@hub.org>
Cc:        freebsd-stable@freebsd.org, Robert Watson <rwatson@freebsd.org>
Subject:   Re: file system deadlock - the whole story?
Message-ID:  <20060719112424.GK1464@deviant.kiev.zoral.com.ua>
In-Reply-To: <20060718074804.W1799@ganymede.hub.org>
References:  <E1FxzUU-000MMw-5m@cs1.cs.huji.ac.il> <20060705100403.Y80381@fledge.watson.org> <cone.1152136419.991036.72616.1000@zoraida.natserv.net> <20060705234514.I70011@fledge.watson.org> <20060715000351.U1799@ganymede.hub.org> <20060715035308.GJ32624@deviant.kiev.zoral.com.ua> <20060718074804.W1799@ganymede.hub.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--IJAclU0AInkryoed
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jul 18, 2006 at 07:51:52AM -0300, User Freebsd wrote:
>=20
> 'k, had a bunch of fun tonight, but one of the results is that I was able=
=20
> to achieve file system deadlock, or so it appears ...
>=20
> Using the following from DDB:
>=20
> set $lines=3D0
> show pcpu
> show allpcpu
> ps
> trace
> alltrace
> show locks
> show alllocks
> show uma
> show malloc
> show lockedvnods
> call doadump
>=20
> I've been able to produce the attached output, as well as have a core dum=
p=20
> that can hopefully be used to gather any that I may have missed this time=
=20
> *cross fingers*

Marc,
I seriously doubt that the problems machine experiencing is deadlock.

At the http://people.freebsd.org/~kib/e1.gif is the graph of the locking
dependencies for the vnode locks. The edge from process a to process b means
that process a holds a lock and process b is waiting for the lock. Black
edge means dependency by the vnode lock, red edge - by the buffer lock.

As you see, graph is acyclic. Basically, there are two groups of the
processes that a blocked: one hierarchy rooted in the pid 66575, this one
includes shell 806. Second one is rooted in the process 32.

What are they doing ? Pid 66575:

Tracing command smtpd pid 66575 tid 101396 td 0xceb0a180
sched_switch(ceb0a180,0,1) at sched_switch+0x177
mi_switch(1,0) at mi_switch+0x270
sleepq_switch(dc5b5b20,c0661b60,0,c05fd078,20c) at sleepq_switch+0xc1
sleepq_wait(dc5b5b20,0,c0601d10,e59,8) at sleepq_wait+0x46
msleep(dc5b5b20,c06afde0,44,c061021d,0) at msleep+0x279
bwait(dc5b5b20,44,c061021d) at bwait+0x47
vnode_pager_generic_getpages(c8e85000,ed347c80,1000,0,c8e22000) at vnode_pa=
ger_generic_getpages+0x777
ffs_getpages(ed347bbc,c8e85000,0,ed347be8,c0597c41) at ffs_getpages+0x100
VOP_GETPAGES_APV(c063c100,ed347bbc) at VOP_GETPAGES_APV+0xa9
vnode_pager_getpages(c8e22000,ed347c80,1,0) at vnode_pager_getpages+0xa5
vm_fault(c88da4a0,280bb000,1,0,ceb0a180) at vm_fault+0x980
trap_pfault(ed347d38,1,280bb000,280bb000,0) at trap_pfault+0xce
trap(3b,3b,3b,8078d1c,807952c) at trap+0x1eb
calltrap() at calltrap+0x5
--- trap 0xc, eip =3D 0x280baffd, esp =3D 0xbfbfe894, ebp =3D 0xbfbfe8d8 ---

This process waits for the data to be paged in.

Pid 32 (syncer)

Tracing command syncer pid 32 tid 100033 td 0xc8544780
sched_switch(c8544780,0,1) at sched_switch+0x177
mi_switch(1,0) at mi_switch+0x270
sleepq_switch(dc79fe68,c0661b60,0,c05fd078,20c) at sleepq_switch+0xc1
sleepq_wait(dc79fe68,0,c0601d10,e59,c06039a0) at sleepq_wait+0x46
msleep(dc79fe68,c06afde0,4c,c06024dc,0) at msleep+0x279
bwait(dc79fe68,4c,c06024dc) at bwait+0x47
bufwait(dc79fe68,1,0,0,0) at bufwait+0x1a
breadn(c8a0b414,6537700,0,4000,0) at breadn+0x266
bread(c8a0b414,6537700,0,4000,0) at bread+0x20
ffs_update(c9992000,0,6,0,0) at ffs_update+0x228
ffs_syncvnode(c9992000,3) at ffs_syncvnode+0x3be
ffs_sync(c8831400,3,c8544780,c8831400,2) at ffs_sync+0x209
sync_fsync(e817fcbc,c8a11ae0,c8a11bec,e817fcd8,c04ed586) at sync_fsync+0x126
VOP_FSYNC_APV(c0634220,e817fcbc) at VOP_FSYNC_APV+0x9b
sync_vnode(c8a11bec,c8544780) at sync_vnode+0x106
sched_sync(0,e817fd38,0,c04ed614,0) at sched_sync+0x1ed
fork_exit(c04ed614,0,e817fd38) at fork_exit+0xa0
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip =3D 0, esp =3D 0xe817fd6c, ebp =3D 0 ---

also waits for the data.

What happens with blocks ?
syncer (pid 32) locked block 0xc8a0b414 and waits for data (as shown before=
).
Processes 33 (softdepflush), umount (pid 73338) waits for this block.

You did not provided the output of "show lockedbufs",
but, even without that data, I doubt that the buf subsystem deadlocked by
itself.

I make an conjecture that the problem is either with you disk hardware (i.e=
.,
actual hard drive or disk controller), or in the controller driver.

At least, you could show us the dmesg.

--IJAclU0AInkryoed
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (FreeBSD)

iD8DBQFEvhZnC3+MBN1Mb4gRAkBOAJ9PRADeaDsO6B4ugtqBgZrrsckMpACfRmnv
JEX9eaQqtjmB2VRA0HsdV/Y=
=pgP4
-----END PGP SIGNATURE-----

--IJAclU0AInkryoed--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060719112424.GK1464>