Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Aug 2010 15:35:33 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Ren? Ladan <rene@freebsd.org>
Cc:        freebsd-current@freebsd.org
Subject:   Re: Filesystem wedge, SUJ-related?
Message-ID:  <20100812123533.GU2396@deviant.kiev.zoral.com.ua>
In-Reply-To: <AANLkTimowSpdMSCfRKn2BV3GNO_MxeBBEyjbnRiacQ-0@mail.gmail.com>
References:  <alpine.LNX.2.00.1007171747130.19876@ury.york.ac.uk> <alpine.LNX.2.00.1007181352110.19876@ury.york.ac.uk> <4C5E7F8A.7030800@freebsd.org> <AANLkTimowSpdMSCfRKn2BV3GNO_MxeBBEyjbnRiacQ-0@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--QFliEIXSSz7hGqqc
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Aug 12, 2010 at 02:01:57PM +0200, Ren? Ladan wrote:
> Hi,
>=20
> if I understand correctly, this is what r211212 and r211213 fixes?
r211213. r211212 fixes different issue, and there is one more known
problem.

>=20
> Rene
>=20
> 2010/8/8 Rene Ladan <rene@freebsd.org>:
> > On 18-07-2010 15:02, Gavin Atkinson wrote:
> >>
> >> On Sat, 17 Jul 2010, Gavin Atkinson wrote:
> >>>
> >>> Semi-regularly (every two-three days) I'm seeing what appears to be s=
ome
> >>> sort of filesystem wedge. =9AI usually see it initially with web brow=
sers,
> >>> but it's possible that's only because it's what produces most disk
> >>> activity on this machine. =9AI've seen it with both Opera and Firefox.
> >>>
> > I've been seeing this too. It still happens with kernel r211000.
> >
> >>> What happens is that the process will just wedge. =9AA "procstat -kk"=
 on it
> >>> shows the following stack backtrace:
> >>>
> >>> =9A9012 100243 firefox-bin =9A =9A =9Ainitial thread =9A mi_switch+0x=
21d
> >>> sleepq_switch+0x123 sleepq_wait+0x4d _sleep+0x357 getdirtybuf+0x21e
> >>> flush_deplist+0x6f softdep_sync_metadata+0x153 ffs_syncvnode+0x213
> >>> ffs_fsync+0x43 fsync+0x148 syscallenter+0x1b5 syscall+0x4c
> >>> Xfast_syscall+0xe2
> >>
> > Firefox is the usual first sign:
> > acer % ps ax|grep firefox
> > 82117 =9Av0 =9AT =9A =9A 16:24,08 /usr/local/lib/firefox3/firefox-bin
> > 13416 =9A 3 =9AS+ =9A =9A 0:00,00 grep firefox
> > acer % procstat -kk 82117
> > =9APID =9A =9ATID COMM =9A =9A =9A =9A =9A =9A TDNAME =9A =9A =9A =9A =
=9A KSTACK
> > 82117 100195 firefox-bin =9A =9A =9A- =9A =9A =9A =9A =9A =9A =9A =9Ami=
_switch+0x219
> > thread_suspend_switch+0x103 thread_single+0x25c exit1+0x81 sigexit+0x84
> > cursig+0 ast+0x1aa doreti_ast+0x1f
> > 82117 100221 firefox-bin =9A =9A =9Ainitial thread =9A mi_switch+0x219
> > sleepq_switch+0xfa sleepq_wait+0x46 _sleep+0x256 getdirtybuf+0x1af
> > flush_deplist+0x6a softdep_sync_metadata+0x153 ffs_syncvnode+0x22d
> > ffs_fsync+0x43 fsync+0x13d syscallenter+0x194 syscall+0x41
> > Xfast_syscall+0xe2
> > acer %
> >
> >> A bit more detail: it does look like whatever is supposed to periodica=
lly
> >> flush the journal just stops doing it's job. =9APresumably this is als=
o the
> >> root cause of the "softdep: Out of journal space!" messages I have been
> >> seeing in the past, which I had assumed may have been fixed by r209717.
> >>
> > I haven't seen any "softdep: Out of journal space!" messages since June=
 24,
> > but I've indeed seen it once before (somewhere after June 11).
> >
> >> (I'm running r209723 at the moment)
> >>
> >> While processes are starting to hang, "sh ffs" from ddb shows:
> >>
> >> db> =9Ash ffs
> >> mp 0xffffff0002c45be0 / devvp 0xffffff0002c51000 fs 0xffffff0002c67000
> >> su_wl 0 su_wl_in 0 su_deps 0 su_req 0
> >> mp 0xffffff0002d705f0 /tmp devvp 0xffffff0002d48780 fs 0xffffff0002c64=
800
> >> su_wl 0 su_wl_in 0 su_deps 0 su_req 0
> >> mp 0xffffff0002c458e8 /usr devvp 0xffffff0002d485a0 fs 0xffffff0002c66=
000
> >> su_wl 0 su_wl_in 0 su_deps 17345 su_req 0
> >> mp 0xffffff0002c455f0 /var devvp 0xffffff0002d483c0 fs 0xffffff0002c66=
800
> >> su_wl 0 su_wl_in 0 su_deps 55 su_req 0
> >>
> >> Leaving it another couple of hours, I then see:
> >>
> >> db> =9Ash ffs
> >> mp 0xffffff0002c45be0 / devvp 0xffffff0002c51000 fs 0xffffff0002c67000
> >> su_wl 0 su_wl_in 0 su_deps 0 su_req 0
> >> mp 0xffffff0002d705f0 /tmp devvp 0xffffff0002d48780 fs 0xffffff0002c64=
800
> >> su_wl 0 su_wl_in 0 su_deps 36 su_req 0
> >> mp 0xffffff0002c458e8 /usr devvp 0xffffff0002d485a0 fs 0xffffff0002c66=
000
> >> su_wl 0 su_wl_in 0 su_deps 31899 su_req 0
> >> mp 0xffffff0002c455f0 /var devvp 0xffffff0002d483c0 fs 0xffffff0002c66=
800
> >> su_wl 0 su_wl_in 0 su_deps 95 su_req 0
> >>
> >> so, su_deps is increasing significantly.
> >>
> >> During reboot, vnlru failed to stop within 60 seconds, and gave up on
> >> syncing 125 vnodes and 140 buffers (no idea if these are related). =9A=
On
> >> reboot, SU+J fsck shows for /usr:
> >>
> >> ** SU+J Recovering /dev/ad4s1f
> >> ** Reading 33554432 byte journal from inode 150.
> >> ** Building recovery table.
> >> ** Resolving unreferenced inode list.
> >> ** Processing journal entries.
> >> ** 405991 journal records in 18194944 bytes for 71.40% utilization
> >> ** Freed 3872 inodes (0 dirs) 48157 blocks, and 8744 frags.
> >>
> > Similar here.
> >
> >> So it seems clear that somehow the journal is filling up, and never be=
ing
> >> written.
> >>
> >> Any other suggestions as to where I should go from here?
> >>
> > Disabling the journal would be a "solution", but not desirable.
> > Maybe any lock order reversals to look out for (most are ufs-related) ?
> >
> > I don't know if it is related, but yesterday a full fsck on /usr cleare=
d up
> > two unallocated files in /usr/ports/editors/openoffice-3/work/ (they we=
re in
> > userland as having a bad file descriptor), which the journal didn't cat=
ch.
> >
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

--QFliEIXSSz7hGqqc
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (FreeBSD)

iEYEARECAAYFAkxj6pUACgkQC3+MBN1Mb4gBlwCcDtQbLzTgcum0MJSENjhwzHGK
seYAoIpeWbDWvnyBrXxD5kwqKnjtR8N9
=vU9L
-----END PGP SIGNATURE-----

--QFliEIXSSz7hGqqc--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100812123533.GU2396>