From owner-freebsd-current@FreeBSD.ORG Sun Aug 8 09:57:35 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 133ED1065674 for ; Sun, 8 Aug 2010 09:57:35 +0000 (UTC) (envelope-from r.c.ladan@gmail.com) Received: from mail-ew0-f54.google.com (mail-ew0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 91CE38FC1C for ; Sun, 8 Aug 2010 09:57:34 +0000 (UTC) Received: by ewy26 with SMTP id 26so3710132ewy.13 for ; Sun, 08 Aug 2010 02:57:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :organization:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=nnTQDF6mf3HZsIk99XUkDYpg0ukWtP/A45Xr984Euvc=; b=Y/XxZlWm/QIBsPCtFu7AlvFlF3QHpuh3NdonnGxhpwiwrz4yTmsHEpK85OSNhi80/g 3ANJQsHgkPljttYzEhqsJ8kUQ7/AXBH0K0fzJvXdHurdqdL+mbvljzPN957A+uIN4PCS QbOWNjNVKmgsil4pTZghEMb0+XDaZ7NCNSvmE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:organization:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; b=L8iH67YEiXR/hLq+b2dZkVNNyYvBbIueplsW17t7sj9WIVCNoM0C+m2S4Av75M07Vn oYLlXmiqLgtSBKvoQQfKpVxc4aPiP0fKWrdw757LXN0BrRJh3G4SLRgEq/FIAcoLbkoc yXcgrcAkmBsfkIbrEhtAVMOKDbVAYCknlhglY= Received: by 10.213.45.194 with SMTP id g2mr1512099ebf.0.1281261453433; Sun, 08 Aug 2010 02:57:33 -0700 (PDT) Received: from [192.168.1.70] (ip4da3ae31.direct-adsl.nl [77.163.174.49]) by mx.google.com with ESMTPS id u9sm5740640eeh.23.2010.08.08.02.57.31 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 08 Aug 2010 02:57:31 -0700 (PDT) Sender: =?UTF-8?Q?Ren=C3=A9_Ladan?= Message-ID: <4C5E7F8A.7030800@freebsd.org> Date: Sun, 08 Aug 2010 11:57:30 +0200 From: Rene Ladan Organization: The FreeBSD Project User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; nl-NL; rv:1.9.2.8) Gecko/20100806 Thunderbird/3.1.2 MIME-Version: 1.0 To: freebsd-current@freebsd.org References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Filesystem wedge, SUJ-related? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Aug 2010 09:57:35 -0000 On 18-07-2010 15:02, Gavin Atkinson wrote: > On Sat, 17 Jul 2010, Gavin Atkinson wrote: >> Semi-regularly (every two-three days) I'm seeing what appears to be some >> sort of filesystem wedge. I usually see it initially with web browsers, >> but it's possible that's only because it's what produces most disk >> activity on this machine. I've seen it with both Opera and Firefox. >> I've been seeing this too. It still happens with kernel r211000. >> What happens is that the process will just wedge. A "procstat -kk" on it >> shows the following stack backtrace: >> >> 9012 100243 firefox-bin initial thread mi_switch+0x21d >> sleepq_switch+0x123 sleepq_wait+0x4d _sleep+0x357 getdirtybuf+0x21e >> flush_deplist+0x6f softdep_sync_metadata+0x153 ffs_syncvnode+0x213 >> ffs_fsync+0x43 fsync+0x148 syscallenter+0x1b5 syscall+0x4c >> Xfast_syscall+0xe2 > Firefox is the usual first sign: acer % ps ax|grep firefox 82117 v0 T 16:24,08 /usr/local/lib/firefox3/firefox-bin 13416 3 S+ 0:00,00 grep firefox acer % procstat -kk 82117 PID TID COMM TDNAME KSTACK 82117 100195 firefox-bin - mi_switch+0x219 thread_suspend_switch+0x103 thread_single+0x25c exit1+0x81 sigexit+0x84 cursig+0 ast+0x1aa doreti_ast+0x1f 82117 100221 firefox-bin initial thread mi_switch+0x219 sleepq_switch+0xfa sleepq_wait+0x46 _sleep+0x256 getdirtybuf+0x1af flush_deplist+0x6a softdep_sync_metadata+0x153 ffs_syncvnode+0x22d ffs_fsync+0x43 fsync+0x13d syscallenter+0x194 syscall+0x41 Xfast_syscall+0xe2 acer % > A bit more detail: it does look like whatever is supposed to periodically > flush the journal just stops doing it's job. Presumably this is also the > root cause of the "softdep: Out of journal space!" messages I have been > seeing in the past, which I had assumed may have been fixed by r209717. > I haven't seen any "softdep: Out of journal space!" messages since June 24, but I've indeed seen it once before (somewhere after June 11). > (I'm running r209723 at the moment) > > While processes are starting to hang, "sh ffs" from ddb shows: > > db> sh ffs > mp 0xffffff0002c45be0 / devvp 0xffffff0002c51000 fs 0xffffff0002c67000 su_wl 0 su_wl_in 0 su_deps 0 su_req 0 > mp 0xffffff0002d705f0 /tmp devvp 0xffffff0002d48780 fs 0xffffff0002c64800 su_wl 0 su_wl_in 0 su_deps 0 su_req 0 > mp 0xffffff0002c458e8 /usr devvp 0xffffff0002d485a0 fs 0xffffff0002c66000 su_wl 0 su_wl_in 0 su_deps 17345 su_req 0 > mp 0xffffff0002c455f0 /var devvp 0xffffff0002d483c0 fs 0xffffff0002c66800 su_wl 0 su_wl_in 0 su_deps 55 su_req 0 > > Leaving it another couple of hours, I then see: > > db> sh ffs > mp 0xffffff0002c45be0 / devvp 0xffffff0002c51000 fs 0xffffff0002c67000 su_wl 0 su_wl_in 0 su_deps 0 su_req 0 > mp 0xffffff0002d705f0 /tmp devvp 0xffffff0002d48780 fs 0xffffff0002c64800 su_wl 0 su_wl_in 0 su_deps 36 su_req 0 > mp 0xffffff0002c458e8 /usr devvp 0xffffff0002d485a0 fs 0xffffff0002c66000 su_wl 0 su_wl_in 0 su_deps 31899 su_req 0 > mp 0xffffff0002c455f0 /var devvp 0xffffff0002d483c0 fs 0xffffff0002c66800 su_wl 0 su_wl_in 0 su_deps 95 su_req 0 > > so, su_deps is increasing significantly. > > During reboot, vnlru failed to stop within 60 seconds, and gave up on > syncing 125 vnodes and 140 buffers (no idea if these are related). On > reboot, SU+J fsck shows for /usr: > > ** SU+J Recovering /dev/ad4s1f > ** Reading 33554432 byte journal from inode 150. > ** Building recovery table. > ** Resolving unreferenced inode list. > ** Processing journal entries. > ** 405991 journal records in 18194944 bytes for 71.40% utilization > ** Freed 3872 inodes (0 dirs) 48157 blocks, and 8744 frags. > Similar here. > So it seems clear that somehow the journal is filling up, and never being > written. > > Any other suggestions as to where I should go from here? > Disabling the journal would be a "solution", but not desirable. Maybe any lock order reversals to look out for (most are ufs-related) ? I don't know if it is related, but yesterday a full fsck on /usr cleared up two unallocated files in /usr/ports/editors/openoffice-3/work/ (they were in userland as having a bad file descriptor), which the journal didn't catch. Regards, Rene -- http://www.rene-ladan.nl/ GPG fingerprint = ADBC ECCD EB5F A6B4 549F 600D 8C9E 647A E564 2BFC (subkeys.pgp.net)