Date: Fri, 3 Sep 2010 16:30:44 -0700 From: "David O'Brien" <obrien@freebsd.org> To: Jeff Roberson <jroberson@jroberson.net> Cc: freebsd-current@freebsd.org Subject: Re: SUJ deadlock Message-ID: <20100903233038.GA1383@dragon.NUXI.org> In-Reply-To: <alpine.BSF.2.00.1005051253480.1398@desktop> References: <B9090D36-D0E7-48D9-9FE2-FD0C7A486AC3@netasq.com> <4BDF2A4D.3030706@gmail.com> <C5565B6E-11C7-46C1-97A1-81AE1D5A7C78@netasq.com> <alpine.BSF.2.00.1005051253480.1398@desktop>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, May 05, 2010 at 12:54:07PM -1000, Jeff Roberson wrote: > On Mon, 3 May 2010, Fabien Thomas wrote: >>>> I'm with r207548 now and since some days i've system deadlock. >>>> It seems related to SUJ with process waiting on suspfs or ppwait. >>> >>> I've also seen it stalled in suspfs, but this information is way better >>> than what I was able to garner. I was only able to tell via ctrl-t on >>> a stalled 'ls' process in a terminal before hard booting. [..] > Can anyone who has experienced this hang test this patch: > > Thanks, > Jeff > Index: ffs_softdep.c > =================================================================== > --- ffs_softdep.c (revision 207480) > +++ ffs_softdep.c (working copy) > @@ -9301,7 +9301,7 @@ > hadchanges = 1; > } > /* Leave this inodeblock dirty until it's in the list. */ > - if ((inodedep->id_state & (UNLINKED | DEPCOMPLETE)) == UNLINKED) > + if ((inodedep->id_state & (UNLINKED | UNLINKONLIST)) == UNLINKED) Hi Jeff, I didn't seem to experience this problem back in May, but I'm now experiencing it on a regular basis. I seem to trigger it almost every other or 3rd day during the daily run. I wind up with cvsup or svnsync stalled and any 'ls' of my sources partition waiting on suspfs. (note, I am also running diskcheckd from ports.) My kernel sources are at: Last Changed Author: davidxu Last Changed Rev: 211534 Last Changed Date: 2010-08-20 16:51:34 -0700 (Fri, 20 Aug 2010) I have also experienced it back to at least: Last Changed Author: yongari Last Changed Rev: 210152 Last Changed Date: 2010-07-15 16:34:58 -0700 (Thu, 15 Jul 2010) Weird thing is - I can still access this partition across NFS without problems. dragon$ cd /src/fbsd Filesystem Size Used Avail Capacity Mounted on /dev/da31s1f 271G 119G 130G 48% /src dragon$ ls load: 0.12 cmd: ls 77901 [suspfs] 2.26r 0.00u 0.00s 0% 1212k quynh$ cd /src/fbsd quynh$ df . Filesystem Size Used Avail Capacity Mounted on dragon:/src 271G 119G 130G 48% /src quynh$ ls .svn/ lib/ COPYRIGHT libexec/ ..snip.. Processes also have a tendency to complete quite slowly at times - waiting in vlruwk. When I reboot, usually / and /src (but not 3 other partitions) give a "Bad cg number {negative number}" error from fsck; so a full fsck is run. This results in what seems tens of thousands iterations of: UNREF FILE I=[..snip..] RECONNECT? yes SORRY no space in lost+found directory unexpected soft update inconsistency CLEAR? yes thoughts? -- -- David (obrien@FreeBSD.org)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100903233038.GA1383>