From owner-freebsd-hackers@FreeBSD.ORG Fri Oct 19 00:11:42 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C1AABB91; Fri, 19 Oct 2012 00:11:42 +0000 (UTC) (envelope-from tris_vern@hotmail.com) Received: from snt0-omc3-s48.snt0.hotmail.com (snt0-omc3-s48.snt0.hotmail.com [65.54.51.85]) by mx1.freebsd.org (Postfix) with ESMTP id 8A9518FC1B; Fri, 19 Oct 2012 00:11:42 +0000 (UTC) Received: from SNT124-W23 ([65.55.90.137]) by snt0-omc3-s48.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 18 Oct 2012 17:11:36 -0700 Message-ID: X-Originating-IP: [165.228.7.150] From: Tristan Verniquet To: , Subject: RE: syncing large mmaped files Date: Fri, 19 Oct 2012 10:11:35 +1000 Importance: Normal In-Reply-To: <201210181543.25191.jhb@freebsd.org> References: , <201210180939.34861.jhb@freebsd.org>, <20121018164218.GR35915@deviant.kiev.zoral.com.ua>, <201210181543.25191.jhb@freebsd.org> MIME-Version: 1.0 X-OriginalArrivalTime: 19 Oct 2012 00:11:36.0301 (UTC) FILETIME=[4DA199D0:01CDAD8E] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd hackers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Oct 2012 00:11:43 -0000 > From: jhb@freebsd.org > To: kostikbel@gmail.com > Subject: Re: syncing large mmaped files > Date: Thu=2C 18 Oct 2012 15:43:25 -0400 > CC: freebsd-hackers@freebsd.org=3B tris_vern@hotmail.com >=20 > On Thursday=2C October 18=2C 2012 12:42:18 pm Konstantin Belousov wrote: > > On Thu=2C Oct 18=2C 2012 at 09:39:34AM -0400=2C John Baldwin wrote: > > > On Thursday=2C October 18=2C 2012 4:35:37 am Konstantin Belousov wrot= e: > > > > On Thu=2C Oct 18=2C 2012 at 10:08:22AM +1000=2C Tristan Verniquet w= rote: > > > > >=20 > > > > > I want to work with large (1-10G) files in memory but eventually = sync > > > > > them back out to disk. The problem is that the sync process appea= rs to > > > > > lock the file in kernel for the duration of the sync=2C which can= run > > > > > into minutes. This prevents other processes from reading from the= file > > > > > (unless they already have it mapped) for this whole time. Is ther= e > > > > > any way to prevent this? I think I read in a post somewhere about > > > > > openbsd implementing partial-writes when it hits a file with lots= of > > > > > dirty pages in order to prevent this. Is there anything available= for > > > > > FreeBSD or is there another way around it? > > > > > > > > > No=2C currently the vnode lock is held exclusive for the whole dura= tion > > > > of the msync(2) syscall or its analog from the syncer. > > > >=20 > > > > Making a change to periodically drop the vnode lock in > > > > vm_object_page_clean() might be possible=2C but requires the benchm= arking > > > > to make sure that we do not pessimize the common case. Also=2C this= opens > > > > a possibility for the vnode reclamation meantime. > > >=20 > > > You can simulate this in userland by breaking up your msync() into mu= ltiple > > > msync() calls where each call just syncs a portion of the file. > > Be aware that this is much-much slower than msyncing the whole file=2C = even > > if file is very large. The reason is that pager initiates asynchronous > > _immediate_ clustered write for such situations. Async writes (AKA > > bdwrite()) are only specified for full range msyncing. >=20 > Ugh. It would seem to me that msync(MS_ASYNC) should be doing delayed > writes. Ahh=2C using MS_ASYNC seems to get me the behaviour I was looking for. It i= s just as fast as fsync for cases when all the pages are dirtied but it rel= eases the lock allowing other programs to open and read the file. So it see= ms to be doing what I would expect. > > > > Anyway=2C note that you cannot 'work with large files in memory'=2C= even if > > > > you have enough RAM and no pressure to hold all the file pages resi= dent. > > > > The syncer will do a writeback periodically regardless of the appli= cation > > > > calling msync(2) or not=2C with the interval of approximately 30 se= conds. > > >=20 > > > You can mmap with MAP_NOSYNC to prevent the syncer from writing the f= ile out > > > every 30 seconds. > >=20 > > This also prevents msync(2) from syncing the region. The flag is fine > > for throw-away data=2C but not for the scenario that was described=2C I > > think. >=20 > Oof. I could see that in certain situations you might want to control th= is > behavior from an application (similar to how I now make use of fadvise() = at > work). Having a way to disable syncer but having msync(MS_ASYNC) do > something useful would be good. >=20 When I map using MAP_NOSYNC I still seem to be able to msync(2) the regions= ? I see memory move from Wired/Active to Invalid and the disk is busy. The madvise man page has a MADV_AUTOSYNC section which says that pages that= are already dirtied can be guaranteed to be reverted using msync(2) or fsy= nc(2). This is FreeBSD 8.3. So even if there is something wrong with sync'= ing MAP_NOSYNC pages=2C I guess I could always madvise MADV_AUTOSYNC them f= irst. > --=20 > John Baldwin > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe=2C send any mail to "freebsd-hackers-unsubscribe@freebsd.o= rg" =