From owner-freebsd-current@FreeBSD.ORG Tue Jan 3 02:35:46 2012 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C9AF4106564A; Tue, 3 Jan 2012 02:35:46 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 87BCB8FC12; Tue, 3 Jan 2012 02:35:46 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id q032ZY4V006462; Mon, 2 Jan 2012 18:35:38 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <201201030235.q032ZY4V006462@gw.catspoiler.org> Date: Mon, 2 Jan 2012 18:35:33 -0800 (PST) From: Don Lewis To: flo@FreeBSD.org In-Reply-To: <201201022047.q02Kl3IM005792@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: mckusick@mckusick.com, current@FreeBSD.org, attilio@FreeBSD.org, phk@phk.freebsd.dk, kib@FreeBSD.org, seanbru@yahoo-inc.com Subject: Re: dogfooding over in clusteradm land X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jan 2012 02:35:46 -0000 On 2 Jan, Don Lewis wrote: > On 2 Jan, Florian Smeets wrote: >> This does not make a difference. I tried on 32K/4K with/without journal >> and on 16K/2K all exhibit the same problem. At some point during the >> cvs2svn conversion the sycer starts to use 100% CPU. The whole process >> hangs at that point sometimes for hours, from time to time it does >> continue doing some work, but really really slow. It's usually between >> revision 210000 and 220000, when the resulting svn file gets bigger than >> about 11-12Gb. At that point an ls in the target dir hangs in state ufs. >> >> I broke into ddb and ran all commands which i thought could be useful. >> The output is at http://tb.smeets.im/~flo/giant-ape_syncer.txt > > Tracing command syncer pid 9 tid 100183 td 0xfffffe00120e9000 > cpustop_handler() at cpustop_handler+0x2b > ipi_nmi_handler() at ipi_nmi_handler+0x50 > trap() at trap+0x1a8 > nmi_calltrap() at nmi_calltrap+0x8 > --- trap 0x13, rip = 0xffffffff8082ba43, rsp = 0xffffff8000270fe0, rbp = 0xffffff88c97829a0 --- > _mtx_assert() at _mtx_assert+0x13 > pmap_remove_write() at pmap_remove_write+0x38 > vm_object_page_remove_write() at vm_object_page_remove_write+0x1f > vm_object_page_clean() at vm_object_page_clean+0x14d > vfs_msync() at vfs_msync+0xf1 > sync_fsync() at sync_fsync+0x12a > sync_vnode() at sync_vnode+0x157 > sched_sync() at sched_sync+0x1d1 > fork_exit() at fork_exit+0x135 > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffff88c9782d00, rbp = 0 --- > > I thinks this explains why the r228838 patch seems to help the problem. > Instead of an application call to msync(), you're getting bitten by the > syncer doing the equivalent. I don't know why the syncer is CPU bound, > though. From my understanding of the patch it only optimizes the I/O. > Without the patch, I would expect that the syncer would just spend a lot > of time waiting on I/O. My guess is that this is actually a vm problem. > There are nested loops in vm_object_page_clean() and > vm_object_page_remove_write(), so you could be doing something that's > causing lots of looping in that code. Does the machine recover if you suspend cvs2svn? I think what is happening is that cvs2svn is continuing to dirty pages while the syncer is trying to sync the file. From my limited understanding of this code, it looks to me like every time cvs2svn dirties a page, it will trigger a call to vm_object_set_writeable_dirty(), which will increment object->generation. Whenever vm_object_page_clean() detects a change in the generation count, it restarts its scan of the pages associated with the object. This is probably not optimal ...