From owner-freebsd-current@FreeBSD.ORG Thu Aug 12 12:35:52 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0F9D81065693 for ; Thu, 12 Aug 2010 12:35:52 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 502768FC08 for ; Thu, 12 Aug 2010 12:35:50 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o7CCZXQP023758 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 12 Aug 2010 15:35:33 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o7CCZXRu079958; Thu, 12 Aug 2010 15:35:33 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o7CCZXgR079957; Thu, 12 Aug 2010 15:35:33 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 12 Aug 2010 15:35:33 +0300 From: Kostik Belousov To: Ren? Ladan Message-ID: <20100812123533.GU2396@deviant.kiev.zoral.com.ua> References: <4C5E7F8A.7030800@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="QFliEIXSSz7hGqqc" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.2 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_50, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-current@freebsd.org Subject: Re: Filesystem wedge, SUJ-related? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Aug 2010 12:35:52 -0000 --QFliEIXSSz7hGqqc Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Aug 12, 2010 at 02:01:57PM +0200, Ren? Ladan wrote: > Hi, >=20 > if I understand correctly, this is what r211212 and r211213 fixes? r211213. r211212 fixes different issue, and there is one more known problem. >=20 > Rene >=20 > 2010/8/8 Rene Ladan : > > On 18-07-2010 15:02, Gavin Atkinson wrote: > >> > >> On Sat, 17 Jul 2010, Gavin Atkinson wrote: > >>> > >>> Semi-regularly (every two-three days) I'm seeing what appears to be s= ome > >>> sort of filesystem wedge. =9AI usually see it initially with web brow= sers, > >>> but it's possible that's only because it's what produces most disk > >>> activity on this machine. =9AI've seen it with both Opera and Firefox. > >>> > > I've been seeing this too. It still happens with kernel r211000. > > > >>> What happens is that the process will just wedge. =9AA "procstat -kk"= on it > >>> shows the following stack backtrace: > >>> > >>> =9A9012 100243 firefox-bin =9A =9A =9Ainitial thread =9A mi_switch+0x= 21d > >>> sleepq_switch+0x123 sleepq_wait+0x4d _sleep+0x357 getdirtybuf+0x21e > >>> flush_deplist+0x6f softdep_sync_metadata+0x153 ffs_syncvnode+0x213 > >>> ffs_fsync+0x43 fsync+0x148 syscallenter+0x1b5 syscall+0x4c > >>> Xfast_syscall+0xe2 > >> > > Firefox is the usual first sign: > > acer % ps ax|grep firefox > > 82117 =9Av0 =9AT =9A =9A 16:24,08 /usr/local/lib/firefox3/firefox-bin > > 13416 =9A 3 =9AS+ =9A =9A 0:00,00 grep firefox > > acer % procstat -kk 82117 > > =9APID =9A =9ATID COMM =9A =9A =9A =9A =9A =9A TDNAME =9A =9A =9A =9A = =9A KSTACK > > 82117 100195 firefox-bin =9A =9A =9A- =9A =9A =9A =9A =9A =9A =9A =9Ami= _switch+0x219 > > thread_suspend_switch+0x103 thread_single+0x25c exit1+0x81 sigexit+0x84 > > cursig+0 ast+0x1aa doreti_ast+0x1f > > 82117 100221 firefox-bin =9A =9A =9Ainitial thread =9A mi_switch+0x219 > > sleepq_switch+0xfa sleepq_wait+0x46 _sleep+0x256 getdirtybuf+0x1af > > flush_deplist+0x6a softdep_sync_metadata+0x153 ffs_syncvnode+0x22d > > ffs_fsync+0x43 fsync+0x13d syscallenter+0x194 syscall+0x41 > > Xfast_syscall+0xe2 > > acer % > > > >> A bit more detail: it does look like whatever is supposed to periodica= lly > >> flush the journal just stops doing it's job. =9APresumably this is als= o the > >> root cause of the "softdep: Out of journal space!" messages I have been > >> seeing in the past, which I had assumed may have been fixed by r209717. > >> > > I haven't seen any "softdep: Out of journal space!" messages since June= 24, > > but I've indeed seen it once before (somewhere after June 11). > > > >> (I'm running r209723 at the moment) > >> > >> While processes are starting to hang, "sh ffs" from ddb shows: > >> > >> db> =9Ash ffs > >> mp 0xffffff0002c45be0 / devvp 0xffffff0002c51000 fs 0xffffff0002c67000 > >> su_wl 0 su_wl_in 0 su_deps 0 su_req 0 > >> mp 0xffffff0002d705f0 /tmp devvp 0xffffff0002d48780 fs 0xffffff0002c64= 800 > >> su_wl 0 su_wl_in 0 su_deps 0 su_req 0 > >> mp 0xffffff0002c458e8 /usr devvp 0xffffff0002d485a0 fs 0xffffff0002c66= 000 > >> su_wl 0 su_wl_in 0 su_deps 17345 su_req 0 > >> mp 0xffffff0002c455f0 /var devvp 0xffffff0002d483c0 fs 0xffffff0002c66= 800 > >> su_wl 0 su_wl_in 0 su_deps 55 su_req 0 > >> > >> Leaving it another couple of hours, I then see: > >> > >> db> =9Ash ffs > >> mp 0xffffff0002c45be0 / devvp 0xffffff0002c51000 fs 0xffffff0002c67000 > >> su_wl 0 su_wl_in 0 su_deps 0 su_req 0 > >> mp 0xffffff0002d705f0 /tmp devvp 0xffffff0002d48780 fs 0xffffff0002c64= 800 > >> su_wl 0 su_wl_in 0 su_deps 36 su_req 0 > >> mp 0xffffff0002c458e8 /usr devvp 0xffffff0002d485a0 fs 0xffffff0002c66= 000 > >> su_wl 0 su_wl_in 0 su_deps 31899 su_req 0 > >> mp 0xffffff0002c455f0 /var devvp 0xffffff0002d483c0 fs 0xffffff0002c66= 800 > >> su_wl 0 su_wl_in 0 su_deps 95 su_req 0 > >> > >> so, su_deps is increasing significantly. > >> > >> During reboot, vnlru failed to stop within 60 seconds, and gave up on > >> syncing 125 vnodes and 140 buffers (no idea if these are related). =9A= On > >> reboot, SU+J fsck shows for /usr: > >> > >> ** SU+J Recovering /dev/ad4s1f > >> ** Reading 33554432 byte journal from inode 150. > >> ** Building recovery table. > >> ** Resolving unreferenced inode list. > >> ** Processing journal entries. > >> ** 405991 journal records in 18194944 bytes for 71.40% utilization > >> ** Freed 3872 inodes (0 dirs) 48157 blocks, and 8744 frags. > >> > > Similar here. > > > >> So it seems clear that somehow the journal is filling up, and never be= ing > >> written. > >> > >> Any other suggestions as to where I should go from here? > >> > > Disabling the journal would be a "solution", but not desirable. > > Maybe any lock order reversals to look out for (most are ufs-related) ? > > > > I don't know if it is related, but yesterday a full fsck on /usr cleare= d up > > two unallocated files in /usr/ports/editors/openoffice-3/work/ (they we= re in > > userland as having a bad file descriptor), which the journal didn't cat= ch. > > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" --QFliEIXSSz7hGqqc Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkxj6pUACgkQC3+MBN1Mb4gBlwCcDtQbLzTgcum0MJSENjhwzHGK seYAoIpeWbDWvnyBrXxD5kwqKnjtR8N9 =vU9L -----END PGP SIGNATURE----- --QFliEIXSSz7hGqqc--