Date: Mon, 2 Jan 2012 12:47:03 -0800 (PST) From: Don Lewis <truckman@FreeBSD.org> To: flo@FreeBSD.org Cc: attilio@FreeBSD.org, current@FreeBSD.org, mckusick@mckusick.com, phk@phk.freebsd.dk, kib@FreeBSD.org, seanbru@yahoo-inc.com Subject: Re: dogfooding over in clusteradm land Message-ID: <201201022047.q02Kl3IM005792@gw.catspoiler.org> In-Reply-To: <4F01F8FD.4020901@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2 Jan, Florian Smeets wrote: > On 29.12.11 01:04, Kirk McKusick wrote: >> Rather than changing BKVASIZE, I would try running the cvs2svn >> conversion on a 16K/2K filesystem and see if that sorts out the >> problem. If it does, it tells us that doubling the main block >> size and reducing the number of buffers by half is the problem. >> If that is the problem, then we will have to increase the KVM >> allocated to the buffer cache. >> > > This does not make a difference. I tried on 32K/4K with/without journal > and on 16K/2K all exhibit the same problem. At some point during the > cvs2svn conversion the sycer starts to use 100% CPU. The whole process > hangs at that point sometimes for hours, from time to time it does > continue doing some work, but really really slow. It's usually between > revision 210000 and 220000, when the resulting svn file gets bigger than > about 11-12Gb. At that point an ls in the target dir hangs in state ufs. > > I broke into ddb and ran all commands which i thought could be useful. > The output is at http://tb.smeets.im/~flo/giant-ape_syncer.txt Tracing command syncer pid 9 tid 100183 td 0xfffffe00120e9000 cpustop_handler() at cpustop_handler+0x2b ipi_nmi_handler() at ipi_nmi_handler+0x50 trap() at trap+0x1a8 nmi_calltrap() at nmi_calltrap+0x8 --- trap 0x13, rip = 0xffffffff8082ba43, rsp = 0xffffff8000270fe0, rbp = 0xffffff88c97829a0 --- _mtx_assert() at _mtx_assert+0x13 pmap_remove_write() at pmap_remove_write+0x38 vm_object_page_remove_write() at vm_object_page_remove_write+0x1f vm_object_page_clean() at vm_object_page_clean+0x14d vfs_msync() at vfs_msync+0xf1 sync_fsync() at sync_fsync+0x12a sync_vnode() at sync_vnode+0x157 sched_sync() at sched_sync+0x1d1 fork_exit() at fork_exit+0x135 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff88c9782d00, rbp = 0 --- I thinks this explains why the r228838 patch seems to help the problem. Instead of an application call to msync(), you're getting bitten by the syncer doing the equivalent. I don't know why the syncer is CPU bound, though. From my understanding of the patch it only optimizes the I/O. Without the patch, I would expect that the syncer would just spend a lot of time waiting on I/O. My guess is that this is actually a vm problem. There are nested loops in vm_object_page_clean() and vm_object_page_remove_write(), so you could be doing something that's causing lots of looping in that code. I think that ls is hanging because it's stumbling across the vnode that the syncer has locked.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201201022047.q02Kl3IM005792>