Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Oct 2012 10:11:35 +1000
From:      Tristan Verniquet <tris_vern@hotmail.com>
To:        <jhb@freebsd.org>, <kostikbel@gmail.com>
Cc:        freebsd hackers <freebsd-hackers@freebsd.org>
Subject:   RE: syncing large mmaped files
Message-ID:  <SNT124-W23A8A38DF1467ECDA41DD883750@phx.gbl>
In-Reply-To: <201210181543.25191.jhb@freebsd.org>
References:  <SNT124-W20F26CF7B468F7F09B9B4983760@phx.gbl>, <201210180939.34861.jhb@freebsd.org>, <20121018164218.GR35915@deviant.kiev.zoral.com.ua>, <201210181543.25191.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help



> From: jhb@freebsd.org
> To: kostikbel@gmail.com
> Subject: Re: syncing large mmaped files
> Date: Thu=2C 18 Oct 2012 15:43:25 -0400
> CC: freebsd-hackers@freebsd.org=3B tris_vern@hotmail.com
>=20
> On Thursday=2C October 18=2C 2012 12:42:18 pm Konstantin Belousov wrote:
> > On Thu=2C Oct 18=2C 2012 at 09:39:34AM -0400=2C John Baldwin wrote:
> > > On Thursday=2C October 18=2C 2012 4:35:37 am Konstantin Belousov wrot=
e:
> > > > On Thu=2C Oct 18=2C 2012 at 10:08:22AM +1000=2C Tristan Verniquet w=
rote:
> > > > >=20
> > > > > I want to work with large (1-10G) files in memory but eventually =
sync
> > > > > them back out to disk. The problem is that the sync process appea=
rs to
> > > > > lock the file in kernel for the duration of the sync=2C which can=
 run
> > > > > into minutes. This prevents other processes from reading from the=
 file
> > > > > (unless they already have it mapped) for this whole time. Is ther=
e
> > > > > any way to prevent this? I think I read in a post somewhere about
> > > > > openbsd implementing partial-writes when it hits a file with lots=
 of
> > > > > dirty pages in order to prevent this. Is there anything available=
 for
> > > > > FreeBSD or is there another way around it?
> > > > >
> > > > No=2C currently the vnode lock is held exclusive for the whole dura=
tion
> > > > of the msync(2) syscall or its analog from the syncer.
> > > >=20
> > > > Making a change to periodically drop the vnode lock in
> > > > vm_object_page_clean() might be possible=2C but requires the benchm=
arking
> > > > to make sure that we do not pessimize the common case. Also=2C this=
 opens
> > > > a possibility for the vnode reclamation meantime.
> > >=20
> > > You can simulate this in userland by breaking up your msync() into mu=
ltiple
> > > msync() calls where each call just syncs a portion of the file.
> > Be aware that this is much-much slower than msyncing the whole file=2C =
even
> > if file is very large. The reason is that pager initiates asynchronous
> > _immediate_ clustered write for such situations. Async writes (AKA
> > bdwrite()) are only specified for full range msyncing.
>=20
> Ugh.  It would seem to me that msync(MS_ASYNC) should be doing delayed
> writes.

Ahh=2C using MS_ASYNC seems to get me the behaviour I was looking for. It i=
s just as fast as fsync for cases when all the pages are dirtied but it rel=
eases the lock allowing other programs to open and read the file. So it see=
ms to be doing what I would expect.

> > > > Anyway=2C note that you cannot 'work with large files in memory'=2C=
 even if
> > > > you have enough RAM and no pressure to hold all the file pages resi=
dent.
> > > > The syncer will do a writeback periodically regardless of the appli=
cation
> > > > calling msync(2) or not=2C with the interval of approximately 30 se=
conds.
> > >=20
> > > You can mmap with MAP_NOSYNC to prevent the syncer from writing the f=
ile out
> > > every 30 seconds.
> >=20
> > This also prevents msync(2) from syncing the region. The flag is fine
> > for throw-away data=2C but not for the scenario that was described=2C I
> > think.
>=20
> Oof.  I could see that in certain situations you might want to control th=
is
> behavior from an application (similar to how I now make use of fadvise() =
at
> work).  Having a way to disable syncer but having msync(MS_ASYNC) do
> something useful would be good.
>=20

When I map using MAP_NOSYNC I still seem to be able to msync(2) the regions=
? I see memory move from Wired/Active to Invalid and the disk is busy.

The madvise man page has a MADV_AUTOSYNC section which says that pages that=
 are already dirtied can be guaranteed to be reverted using msync(2) or fsy=
nc(2).  This is FreeBSD 8.3. So even if there is something wrong with sync'=
ing MAP_NOSYNC pages=2C I guess I could always madvise MADV_AUTOSYNC them f=
irst.

> --=20
> John Baldwin
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe=2C send any mail to "freebsd-hackers-unsubscribe@freebsd.o=
rg"
 		 	   		  =



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?SNT124-W23A8A38DF1467ECDA41DD883750>