Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Jul 2010 21:47:20 +0100 (BST)
From:      Gavin Atkinson <gavin@FreeBSD.org>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        David Xu <davidxu@FreeBSD.org>, FreeBSD Current <current@FreeBSD.org>
Subject:   Re: firefox is stuck in getbuf()
Message-ID:  <alpine.LNX.2.00.1007292140570.372@ury.york.ac.uk>
In-Reply-To: <1279726551.25909.17.camel@buffy.york.ac.uk>
References:  <4C4510B8.6090105@freebsd.org> <20100720132931.GI2381@deviant.kiev.zoral.com.ua> <1279726551.25909.17.camel@buffy.york.ac.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 21 Jul 2010, Gavin Atkinson wrote:
> On Tue, 2010-07-20 at 16:29 +0300, Kostik Belousov wrote:
> > On Tue, Jul 20, 2010 at 10:58:00AM +0800, David Xu wrote:
> > > With newest -HEAD code, firefox is stuck in getbuf().
> > > 
> > > top
> > > 
> > > last pid:  1814;  load averages:  0.00,  0.05,  0.07 
> > > 
> > >                                         up 0+00:37:11  10:54:01
> > > 135 processes: 1 running, 134 sleeping
> > > CPU:  3.7% user,  0.0% nice,  0.6% system,  0.0% interrupt, 95.7% idle
> > > Mem: 259M Active, 393M Inact, 151M Wired, 1484K Cache, 111M Buf, 186M Free
> > > Swap: 2020M Total, 2020M Free
> > > 
> > >   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU 
> > > COMMAND
> > >  1427 davidxu       1  45    0   114M   101M select  0   1:24  0.29% Xorg
> > >  1588 davidxu      10  44    0   279M   145M getbuf  0   2:15  0.00% 
> > > firefox-bin
> > > 
> > > 
> > > procstat  -k 1588
> > >   PID    TID COMM             TDNAME           KSTACK 
> > > 
> > >  1588 100200 firefox-bin      initial thread   mi_switch sleepq_switch 
> > > sleepq_wait _sleep getdirtybuf flush_deplist softdep_sync_metadata 
> > > ffs_syncvnode ffs_fsync VOP_FSYNC_APV fsync syscallenter syscall 
> > > Xint0x80_syscall
> > >  1588 100207 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_wait_sig _cv_wait_sig seltdwait poll 
> > > syscallenter syscall Xint0x80_syscall
> > >  1588 100208 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op 
> > > syscallenter syscall Xint0x80_syscall
> > >  1588 100209 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_timedwait_sig _sleep __umtx_op_cv_wait 
> > > _umtx_op syscallenter syscall Xint0x80_syscall
> > >  1588 100210 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_timedwait_sig _sleep __umtx_op_cv_wait 
> > > _umtx_op syscallenter syscall Xint0x80_syscall
> > >  1588 100216 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op 
> > > syscallenter syscall Xint0x80_syscall
> > >  1588 100220 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_wait _sleep getdirtybuf flush_deplist softdep_sync_metadata 
> > > ffs_syncvnode ffs_fsync VOP_FSYNC_APV fsync syscallenter syscall 
> > > Xint0x80_syscall
> > >  1588 100238 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op 
> > > syscallenter syscall Xint0x80_syscall
> > >  1588 100239 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op 
> > > syscallenter syscall Xint0x80_syscall
> > >  1588 100240 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op 
> > > syscallenter syscall Xint0x80_syscall
> > 
> > Can you, please, do the following:
> > show the backtraces for the system processes, in particular, syncer,
> > bufdaemon, softdepflush daemon, pagedaemon and vm ?
> > for the stuck firefox thread, find the address of the buffer
> > supplied as an argument to getdirtybuf, and print the *(struct buf *)addr ?
> > This can be done on the live/stuck system using kgdb on /dev/mem.
> 
> I can relatively easily recreate this, see my thread on -current on the
> 17th July ("Filesystem wedge, SUJ-related?"), which (and the followup
> emails) contain additional info.  I'm currently trying to find the
> commit responsible for introducing this, and have established that a

OK, sorry for the delay.  I have the information requested.

Please see http://people.freebsd.org/~gavin/rho-fs-hang.txt

I've started to try and narrow down where exactly the hangs started:

r208700 - June 1st  - seems to work fime
r209425 - June 22st - hangs occur

If you need any more info, let me know.

Thanks,

Gavin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.LNX.2.00.1007292140570.372>