From owner-freebsd-current@FreeBSD.ORG Mon Jan 2 21:45:49 2012 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D3783106566C; Mon, 2 Jan 2012 21:45:49 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 52E388FC27; Mon, 2 Jan 2012 21:45:48 +0000 (UTC) Received: from alf.home (alf.kiev.zoral.com.ua [10.1.1.177]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q02LjRG6030143 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 2 Jan 2012 23:45:28 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from alf.home (kostik@localhost [127.0.0.1]) by alf.home (8.14.5/8.14.5) with ESMTP id q02LjR9K037665; Mon, 2 Jan 2012 23:45:27 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by alf.home (8.14.5/8.14.5/Submit) id q02LjRFp037664; Mon, 2 Jan 2012 23:45:27 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: alf.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 2 Jan 2012 23:45:27 +0200 From: Kostik Belousov To: Don Lewis Message-ID: <20120102214527.GJ50300@deviant.kiev.zoral.com.ua> References: <4F01F8FD.4020901@FreeBSD.org> <201201022047.q02Kl3IM005792@gw.catspoiler.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="KV5J30b1nBuIoEpe" Content-Disposition: inline In-Reply-To: <201201022047.q02Kl3IM005792@gw.catspoiler.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: mckusick@mckusick.com, flo@freebsd.org, current@freebsd.org, attilio@freebsd.org, phk@phk.freebsd.dk, kib@freebsd.org, seanbru@yahoo-inc.com Subject: Re: dogfooding over in clusteradm land X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Jan 2012 21:45:49 -0000 --KV5J30b1nBuIoEpe Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jan 02, 2012 at 12:47:03PM -0800, Don Lewis wrote: > On 2 Jan, Florian Smeets wrote: > > On 29.12.11 01:04, Kirk McKusick wrote: > >> Rather than changing BKVASIZE, I would try running the cvs2svn > >> conversion on a 16K/2K filesystem and see if that sorts out the > >> problem. If it does, it tells us that doubling the main block > >> size and reducing the number of buffers by half is the problem. > >> If that is the problem, then we will have to increase the KVM > >> allocated to the buffer cache. > >>=20 > >=20 > > This does not make a difference. I tried on 32K/4K with/without journal > > and on 16K/2K all exhibit the same problem. At some point during the > > cvs2svn conversion the sycer starts to use 100% CPU. The whole process > > hangs at that point sometimes for hours, from time to time it does > > continue doing some work, but really really slow. It's usually between > > revision 210000 and 220000, when the resulting svn file gets bigger than > > about 11-12Gb. At that point an ls in the target dir hangs in state ufs. > >=20 > > I broke into ddb and ran all commands which i thought could be useful. > > The output is at http://tb.smeets.im/~flo/giant-ape_syncer.txt >=20 > Tracing command syncer pid 9 tid 100183 td 0xfffffe00120e9000 > cpustop_handler() at cpustop_handler+0x2b > ipi_nmi_handler() at ipi_nmi_handler+0x50 > trap() at trap+0x1a8 > nmi_calltrap() at nmi_calltrap+0x8 > --- trap 0x13, rip =3D 0xffffffff8082ba43, rsp =3D 0xffffff8000270fe0, rb= p =3D 0xffffff88c97829a0 --- > _mtx_assert() at _mtx_assert+0x13 > pmap_remove_write() at pmap_remove_write+0x38 > vm_object_page_remove_write() at vm_object_page_remove_write+0x1f > vm_object_page_clean() at vm_object_page_clean+0x14d > vfs_msync() at vfs_msync+0xf1 > sync_fsync() at sync_fsync+0x12a > sync_vnode() at sync_vnode+0x157 > sched_sync() at sched_sync+0x1d1 > fork_exit() at fork_exit+0x135 > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip =3D 0, rsp =3D 0xffffff88c9782d00, rbp =3D 0 --- >=20 > I thinks this explains why the r228838 patch seems to help the problem. > Instead of an application call to msync(), you're getting bitten by the > syncer doing the equivalent. I don't know why the syncer is CPU bound, > though. From my understanding of the patch it only optimizes the I/O. > Without the patch, I would expect that the syncer would just spend a lot > of time waiting on I/O. My guess is that this is actually a vm problem. > There are nested loops in vm_object_page_clean() and > vm_object_page_remove_write(), so you could be doing something that's > causing lots of looping in that code. r228838 allows the system to skip 50-70% of the code when initiating a write of the UFS file page, due to async clustering. The system has to maintain 75% less amount of writes in progress. > I think that ls is hanging because it's stumbling across the vnode that > the syncer has locked. This is the only reasonable explanation. Low-tech profile is to periodically break out into ddb and do backtrace for the syncer thread. More advanced techniques is to use dtrace or normal profiling. --KV5J30b1nBuIoEpe Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk8CJXYACgkQC3+MBN1Mb4hAhwCgnXR4RBhr8tclLUzeF3NYg/OX PRkAnjqHmH2duLg7tqvS/llmmjzaI2nb =VUIS -----END PGP SIGNATURE----- --KV5J30b1nBuIoEpe--