From owner-freebsd-current@FreeBSD.ORG Thu Aug 12 12:01:59 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6D06F10656A3 for ; Thu, 12 Aug 2010 12:01:59 +0000 (UTC) (envelope-from r.c.ladan@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id F1EDE8FC2C for ; Thu, 12 Aug 2010 12:01:58 +0000 (UTC) Received: by wyj26 with SMTP id 26so1644595wyj.13 for ; Thu, 12 Aug 2010 05:01:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=uDAh8mBrNdwvrVVMHFwIhu32K2kY+nKjcd1Pm4ecr0I=; b=KSAAmQ3ICq7rrNVCgEtR6SDdamtmBwZPKRqX/ukxvp20vEeqj+LuC1kV7NmgNpJruQ W91Cy9N2uAx6WhQvWAivW7p8M9jf45CLTUPjvzgOkvua1L2WRd6Jcr4WX+WM/fNzhCEt hVCgGIhCoQ6H1YZs1kxVmCrE8pOwWtGWUFj20= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=IlnFgQ3HjsZB1nOph3G7RnD2n99v/mTAXAmt3N4eopxyY16ISJvzNAeShglqABaS2X EVmbWRSOd8a0mQaNzGBd+sSZRlq3zA94DAX10L2bLYxPlXm4BCWmfADZE9Lpbu5bvafT lTf015fzzXQgoU1xBZb5unpRQHTV7w+UBvm1I= MIME-Version: 1.0 Received: by 10.227.141.84 with SMTP id l20mr18103335wbu.119.1281614517691; Thu, 12 Aug 2010 05:01:57 -0700 (PDT) Sender: r.c.ladan@gmail.com Received: by 10.227.36.193 with HTTP; Thu, 12 Aug 2010 05:01:57 -0700 (PDT) In-Reply-To: <4C5E7F8A.7030800@freebsd.org> References: <4C5E7F8A.7030800@freebsd.org> Date: Thu, 12 Aug 2010 14:01:57 +0200 X-Google-Sender-Auth: mKoxKj3F_Tm2eOyTAImSW3G-R9k Message-ID: From: =?ISO-8859-1?Q?Ren=E9_Ladan?= To: freebsd-current@freebsd.org, kib@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Subject: Re: Filesystem wedge, SUJ-related? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Aug 2010 12:01:59 -0000 Hi, if I understand correctly, this is what r211212 and r211213 fixes? Rene 2010/8/8 Rene Ladan : > On 18-07-2010 15:02, Gavin Atkinson wrote: >> >> On Sat, 17 Jul 2010, Gavin Atkinson wrote: >>> >>> Semi-regularly (every two-three days) I'm seeing what appears to be som= e >>> sort of filesystem wedge. =A0I usually see it initially with web browse= rs, >>> but it's possible that's only because it's what produces most disk >>> activity on this machine. =A0I've seen it with both Opera and Firefox. >>> > I've been seeing this too. It still happens with kernel r211000. > >>> What happens is that the process will just wedge. =A0A "procstat -kk" o= n it >>> shows the following stack backtrace: >>> >>> =A09012 100243 firefox-bin =A0 =A0 =A0initial thread =A0 mi_switch+0x21= d >>> sleepq_switch+0x123 sleepq_wait+0x4d _sleep+0x357 getdirtybuf+0x21e >>> flush_deplist+0x6f softdep_sync_metadata+0x153 ffs_syncvnode+0x213 >>> ffs_fsync+0x43 fsync+0x148 syscallenter+0x1b5 syscall+0x4c >>> Xfast_syscall+0xe2 >> > Firefox is the usual first sign: > acer % ps ax|grep firefox > 82117 =A0v0 =A0T =A0 =A0 16:24,08 /usr/local/lib/firefox3/firefox-bin > 13416 =A0 3 =A0S+ =A0 =A0 0:00,00 grep firefox > acer % procstat -kk 82117 > =A0PID =A0 =A0TID COMM =A0 =A0 =A0 =A0 =A0 =A0 TDNAME =A0 =A0 =A0 =A0 =A0= KSTACK > 82117 100195 firefox-bin =A0 =A0 =A0- =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mi_s= witch+0x219 > thread_suspend_switch+0x103 thread_single+0x25c exit1+0x81 sigexit+0x84 > cursig+0 ast+0x1aa doreti_ast+0x1f > 82117 100221 firefox-bin =A0 =A0 =A0initial thread =A0 mi_switch+0x219 > sleepq_switch+0xfa sleepq_wait+0x46 _sleep+0x256 getdirtybuf+0x1af > flush_deplist+0x6a softdep_sync_metadata+0x153 ffs_syncvnode+0x22d > ffs_fsync+0x43 fsync+0x13d syscallenter+0x194 syscall+0x41 > Xfast_syscall+0xe2 > acer % > >> A bit more detail: it does look like whatever is supposed to periodicall= y >> flush the journal just stops doing it's job. =A0Presumably this is also = the >> root cause of the "softdep: Out of journal space!" messages I have been >> seeing in the past, which I had assumed may have been fixed by r209717. >> > I haven't seen any "softdep: Out of journal space!" messages since June 2= 4, > but I've indeed seen it once before (somewhere after June 11). > >> (I'm running r209723 at the moment) >> >> While processes are starting to hang, "sh ffs" from ddb shows: >> >> db> =A0sh ffs >> mp 0xffffff0002c45be0 / devvp 0xffffff0002c51000 fs 0xffffff0002c67000 >> su_wl 0 su_wl_in 0 su_deps 0 su_req 0 >> mp 0xffffff0002d705f0 /tmp devvp 0xffffff0002d48780 fs 0xffffff0002c6480= 0 >> su_wl 0 su_wl_in 0 su_deps 0 su_req 0 >> mp 0xffffff0002c458e8 /usr devvp 0xffffff0002d485a0 fs 0xffffff0002c6600= 0 >> su_wl 0 su_wl_in 0 su_deps 17345 su_req 0 >> mp 0xffffff0002c455f0 /var devvp 0xffffff0002d483c0 fs 0xffffff0002c6680= 0 >> su_wl 0 su_wl_in 0 su_deps 55 su_req 0 >> >> Leaving it another couple of hours, I then see: >> >> db> =A0sh ffs >> mp 0xffffff0002c45be0 / devvp 0xffffff0002c51000 fs 0xffffff0002c67000 >> su_wl 0 su_wl_in 0 su_deps 0 su_req 0 >> mp 0xffffff0002d705f0 /tmp devvp 0xffffff0002d48780 fs 0xffffff0002c6480= 0 >> su_wl 0 su_wl_in 0 su_deps 36 su_req 0 >> mp 0xffffff0002c458e8 /usr devvp 0xffffff0002d485a0 fs 0xffffff0002c6600= 0 >> su_wl 0 su_wl_in 0 su_deps 31899 su_req 0 >> mp 0xffffff0002c455f0 /var devvp 0xffffff0002d483c0 fs 0xffffff0002c6680= 0 >> su_wl 0 su_wl_in 0 su_deps 95 su_req 0 >> >> so, su_deps is increasing significantly. >> >> During reboot, vnlru failed to stop within 60 seconds, and gave up on >> syncing 125 vnodes and 140 buffers (no idea if these are related). =A0On >> reboot, SU+J fsck shows for /usr: >> >> ** SU+J Recovering /dev/ad4s1f >> ** Reading 33554432 byte journal from inode 150. >> ** Building recovery table. >> ** Resolving unreferenced inode list. >> ** Processing journal entries. >> ** 405991 journal records in 18194944 bytes for 71.40% utilization >> ** Freed 3872 inodes (0 dirs) 48157 blocks, and 8744 frags. >> > Similar here. > >> So it seems clear that somehow the journal is filling up, and never bein= g >> written. >> >> Any other suggestions as to where I should go from here? >> > Disabling the journal would be a "solution", but not desirable. > Maybe any lock order reversals to look out for (most are ufs-related) ? > > I don't know if it is related, but yesterday a full fsck on /usr cleared = up > two unallocated files in /usr/ports/editors/openoffice-3/work/ (they were= in > userland as having a bad file descriptor), which the journal didn't catch= . >