Date: Fri, 3 Sep 2010 16:30:44 -0700 From: "David O'Brien" <obrien@freebsd.org> To: Jeff Roberson <jroberson@jroberson.net> Cc: freebsd-current@freebsd.org Subject: Re: SUJ deadlock Message-ID: <20100903233038.GA1383@dragon.NUXI.org> In-Reply-To: <alpine.BSF.2.00.1005051253480.1398@desktop> References: <B9090D36-D0E7-48D9-9FE2-FD0C7A486AC3@netasq.com> <4BDF2A4D.3030706@gmail.com> <C5565B6E-11C7-46C1-97A1-81AE1D5A7C78@netasq.com> <alpine.BSF.2.00.1005051253480.1398@desktop>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, May 05, 2010 at 12:54:07PM -1000, Jeff Roberson wrote:
> On Mon, 3 May 2010, Fabien Thomas wrote:
>>>> I'm with r207548 now and since some days i've system deadlock.
>>>> It seems related to SUJ with process waiting on suspfs or ppwait.
>>>
>>> I've also seen it stalled in suspfs, but this information is way better
>>> than what I was able to garner. I was only able to tell via ctrl-t on
>>> a stalled 'ls' process in a terminal before hard booting.
[..]
> Can anyone who has experienced this hang test this patch:
>
> Thanks,
> Jeff
> Index: ffs_softdep.c
> ===================================================================
> --- ffs_softdep.c (revision 207480)
> +++ ffs_softdep.c (working copy)
> @@ -9301,7 +9301,7 @@
> hadchanges = 1;
> }
> /* Leave this inodeblock dirty until it's in the list. */
> - if ((inodedep->id_state & (UNLINKED | DEPCOMPLETE)) == UNLINKED)
> + if ((inodedep->id_state & (UNLINKED | UNLINKONLIST)) == UNLINKED)
Hi Jeff,
I didn't seem to experience this problem back in May, but I'm now
experiencing it on a regular basis.
I seem to trigger it almost every other or 3rd day during the daily run.
I wind up with cvsup or svnsync stalled and any 'ls' of my sources
partition waiting on suspfs.
(note, I am also running diskcheckd from ports.)
My kernel sources are at:
Last Changed Author: davidxu
Last Changed Rev: 211534
Last Changed Date: 2010-08-20 16:51:34 -0700 (Fri, 20 Aug 2010)
I have also experienced it back to at least:
Last Changed Author: yongari
Last Changed Rev: 210152
Last Changed Date: 2010-07-15 16:34:58 -0700 (Thu, 15 Jul 2010)
Weird thing is - I can still access this partition across NFS without
problems.
dragon$ cd /src/fbsd
Filesystem Size Used Avail Capacity Mounted on
/dev/da31s1f 271G 119G 130G 48% /src
dragon$ ls
load: 0.12 cmd: ls 77901 [suspfs] 2.26r 0.00u 0.00s 0% 1212k
quynh$ cd /src/fbsd
quynh$ df .
Filesystem Size Used Avail Capacity Mounted on
dragon:/src 271G 119G 130G 48% /src
quynh$ ls
.svn/ lib/
COPYRIGHT libexec/
..snip..
Processes also have a tendency to complete quite slowly at times - waiting
in vlruwk.
When I reboot, usually / and /src (but not 3 other partitions) give a
"Bad cg number {negative number}" error from fsck; so a full fsck is run.
This results in what seems tens of thousands iterations of:
UNREF FILE I=[..snip..]
RECONNECT? yes
SORRY no space in lost+found directory
unexpected soft update inconsistency
CLEAR? yes
thoughts?
--
-- David (obrien@FreeBSD.org)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100903233038.GA1383>
