From owner-freebsd-hackers@FreeBSD.ORG Fri Oct 19 11:45:49 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E8A641BA; Fri, 19 Oct 2012 11:45:49 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 61E5D8FC0C; Fri, 19 Oct 2012 11:45:48 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q9JBjuAl018276; Fri, 19 Oct 2012 14:45:56 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q9JBjiCD093779; Fri, 19 Oct 2012 14:45:44 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q9JBjiPp093778; Fri, 19 Oct 2012 14:45:44 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 19 Oct 2012 14:45:44 +0300 From: Konstantin Belousov To: John Baldwin Subject: Re: syncing large mmaped files Message-ID: <20121019114544.GX35915@deviant.kiev.zoral.com.ua> References: <201210180939.34861.jhb@freebsd.org> <20121018164218.GR35915@deviant.kiev.zoral.com.ua> <201210181543.25191.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="/1aF9qoWKhphZS4n" Content-Disposition: inline In-Reply-To: <201210181543.25191.jhb@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-hackers@freebsd.org, Tristan Verniquet X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Oct 2012 11:45:50 -0000 --/1aF9qoWKhphZS4n Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Oct 18, 2012 at 03:43:25PM -0400, John Baldwin wrote: > On Thursday, October 18, 2012 12:42:18 pm Konstantin Belousov wrote: > > On Thu, Oct 18, 2012 at 09:39:34AM -0400, John Baldwin wrote: > > > On Thursday, October 18, 2012 4:35:37 am Konstantin Belousov wrote: > > > > On Thu, Oct 18, 2012 at 10:08:22AM +1000, Tristan Verniquet wrote: > > > > >=20 > > > > > I want to work with large (1-10G) files in memory but eventually = sync > > > > > them back out to disk. The problem is that the sync process appea= rs to > > > > > lock the file in kernel for the duration of the sync, which can r= un > > > > > into minutes. This prevents other processes from reading from the= file > > > > > (unless they already have it mapped) for this whole time. Is there > > > > > any way to prevent this? I think I read in a post somewhere about > > > > > openbsd implementing partial-writes when it hits a file with lots= of > > > > > dirty pages in order to prevent this. Is there anything available= for > > > > > FreeBSD or is there another way around it? > > > > > > > > > No, currently the vnode lock is held exclusive for the whole durati= on > > > > of the msync(2) syscall or its analog from the syncer. > > > >=20 > > > > Making a change to periodically drop the vnode lock in > > > > vm_object_page_clean() might be possible, but requires the benchmar= king > > > > to make sure that we do not pessimize the common case. Also, this o= pens > > > > a possibility for the vnode reclamation meantime. > > >=20 > > > You can simulate this in userland by breaking up your msync() into mu= ltiple > > > msync() calls where each call just syncs a portion of the file. > > Be aware that this is much-much slower than msyncing the whole file, ev= en > > if file is very large. The reason is that pager initiates asynchronous > > _immediate_ clustered write for such situations. Async writes (AKA > > bdwrite()) are only specified for full range msyncing. >=20 > Ugh. It would seem to me that msync(MS_ASYNC) should be doing delayed > writes. The vm_pager_putpages() is called with the VM_PAGER_CLUSTER_OK flag for MS_ASYNC, according to my reading of the code. This results in neither IO_SYNC nor IO_ASYNC flags passed to VOP_WRITE() from vnode_pager_generic_putpages(). Since the mapped regions are typically large enough to mmap the whole fs blocks, the code in ffs_vnops.c:ffs_write() ends up in the cluster_write= (). Usually, fully populated cluster is written asynchronously. >=20 > > > > Anyway, note that you cannot 'work with large files in memory', eve= n if > > > > you have enough RAM and no pressure to hold all the file pages resi= dent. > > > > The syncer will do a writeback periodically regardless of the appli= cation > > > > calling msync(2) or not, with the interval of approximately 30 seco= nds. > > >=20 > > > You can mmap with MAP_NOSYNC to prevent the syncer from writing the f= ile out > > > every 30 seconds. > >=20 > > This also prevents msync(2) from syncing the region. The flag is fine > > for throw-away data, but not for the scenario that was described, I > > think. >=20 > Oof. I could see that in certain situations you might want to control th= is > behavior from an application (similar to how I now make use of fadvise() = at > work). Having a way to disable syncer but having msync(MS_ASYNC) do > something useful would be good. I was wrong there, sorry. Only syncer and fsync(2) would ignore VPO_NOSYNC pages. --/1aF9qoWKhphZS4n Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEUEARECAAYFAlCBPWcACgkQC3+MBN1Mb4jXnACeJAiNxO9S+ZVcJnKBzcxgwDT0 MfAAl1QgedvFLssA2kWLONoF7QJgX4o= =cxYS -----END PGP SIGNATURE----- --/1aF9qoWKhphZS4n--