Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 9 Apr 2012 15:22:24 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Andrey Zonov <andrey@zonov.org>
Cc:        alc@freebsd.org, freebsd-hackers@freebsd.org, Alan Cox <alc@rice.edu>
Subject:   Re: problems with mmap() and disk caching
Message-ID:  <20120409122224.GN2358@deviant.kiev.zoral.com.ua>
In-Reply-To: <CANU_PUEMJcV_ysPhhi3%2BMkGmM60bSTsRKgEH5Rj4KkrRPTw%2B9Q@mail.gmail.com>
References:  <4F7B495D.3010402@zonov.org> <20120404071746.GJ2358@deviant.kiev.zoral.com.ua> <4F7DC037.9060803@rice.edu> <4F7DF39A.3000500@zonov.org> <20120405194122.GC2358@deviant.kiev.zoral.com.ua> <4F7DF88D.2050907@zonov.org> <20120406081349.GE2358@deviant.kiev.zoral.com.ua> <4F828D15.8080604@zonov.org> <20120409091839.GH2358@deviant.kiev.zoral.com.ua> <CANU_PUEMJcV_ysPhhi3%2BMkGmM60bSTsRKgEH5Rj4KkrRPTw%2B9Q@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--8ng/5ht+ekz7N9c2
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Apr 09, 2012 at 03:35:30PM +0400, Andrey Zonov wrote:
> On Mon, Apr 9, 2012 at 1:18 PM, Konstantin Belousov <kostikbel@gmail.com>=
 wrote:
> > On Mon, Apr 09, 2012 at 11:17:41AM +0400, Andrey Zonov wrote:
> >> On 06.04.2012 12:13, Konstantin Belousov wrote:
> >> >On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote:
> [snip]
> >> >>I always thought that active memory this is a sum of resident memory=
 of
> >> >>all processes, inactive shows disk cache and wired shows kernel itse=
lf.
> >> >So you are wrong. Both active and inactive memory can be mapped and
> >> >not mapped, both can belong to vnode or to anonymous objects etc.
> >> >Active/inactive distinction is only the amount of references that was
> >> >noted by pagedaemon, or some other page history like the way it was
> >> >unwired.
> >> >
> >> >Wired is not neccessary means kernel-used pages, user processes can
> >> >wire their pages as well.
> >>
> >> Let's talk about that in details.
> >>
> >> My understanding is the following:
> >>
> >> Active memory: the memory which is referenced by application. =9AAn
> > Assuming the part 'by application' is removed, this sentence is almost =
right.
> > Any managed mapping of the page participates in the active references.
> >
> >> application may get memory only through mmap() (allocator don't use
> >> brk()/sbrk() any more). =9AThe resident memory of an application is the
> >> sum of physical used memory. =9ASo, sum of RSS is active memory.
> > First, brk/sbrk is still used. Second, there is no requirement that
> > resident pages are referenced. E.g. page could have participated in the
> > buffer, and unwiring on the buffer dissolve put it into inactive state.
> > Or pagedaemon cleared the reference and moved the page to inactive queu=
e.
> > Or the page was prefaulted by different optimizations.
> >
> > More, there is subtle difference between 'resident' and 'not causing fa=
ult
> > on access'. Page may be resident, but pte was not preinstalled, or pte
> > was flushed etc.
>=20
> >From the user point of view: how can the memory be active if no-one (I
> mean application) use it?
>=20
> What I really saw not at once is that the program for a long time
> worked with big mmap()'ed file, couldn't work well (many page faults)
> with new version of the file, until I manually flushed active memory
> by FS re-mounting.  New version couldn't force out the old one.  In my
> opinion if VM moved cached objects to inactive queue after program
> termination I wouldn't see this problem.
Moving pages to inactive just because some mapping was destroyed is plain
silly. The pages migrate between active/inactive/cache/free by the
pagedaemon algorithms.

BTW, you do not need to actually remount filesystem to flush pages of its
vnodes. It is enough to try to unmount it while cd to filesystem root.
>=20
> >>
> >> Inactive memory: the memory which has no references. =9AOnce we call
> >> read() on the file, the file is in inactive memory, because we have no
> >> references to this object, we just read it. =9AThis is also released
> >> memory by free().
> > On buffers dissolve, buffer cache explicitely puts pages constituing
> > the buffer, into the inactive queue. In fact, this is not quite right,
> > e.g. if the same pages are mapped and actively referenced, then
> > pagedaemon has slightly more work now to move the page from inactive
> > to active.
> >
>=20
> Yes, sure, if someone else use the object it should be active and even
> better to introduce new "SHARED" counter, like one is in MacOSX and
> Linux.
Counter for what ? There is already the ref counter for a vm object.

>=20
> > And, free(3) operates at so much higher level then vm subsystem that
> > describing the interaction between these two is impossible in any
> > definitive mood. Old naive mallocs put block description at the beggini=
ng
> > of the block, actually causing free() to reference at least the first
> > page of the block. Jemalloc often does madvise(MADV_FREE) for large
> > freed allocations. MADV_FREE =9Amoves pages between queues probabalisti=
cally.
> >
>=20
> That's exactly what I meant by free().  We drop act_count to 0 and
> move page to inactive queue by vm_page_dontneed()
>=20
> >>
> >> Cache memory: I don't know what is it. It's always small enough to not
> >> think about it.
> > This was the bug you reported, and which Alan fixed on Sunday.
> >
>=20
> I've tested this patch under 9.0-STABLE and should say that it
> introduces problems with interactivity on heavy disk loaded machines.
> With the patch that I tested before I didn't observe such problems.
>=20
> >>
> >> Wired memory: kernel memory and yes, application may get wired memory
> >> through mlock()/mlockall(), but I haven't seen any real application
> >> which calls mlock().
> > ntpd, amd from the base system. gpg and similar programs try to mlock
> > key store to avoid sensitive material leakage to the swap. cdrecord(8)
> > tried to mlock itself to avoid indefinite stalls during write.
> >
>=20
> Nice catch ;-)
>=20
> >
> >>
> >> >>
> >> >>>>
> >> >>>>Read the file:
> >> >>>>$ cat /mnt/random> =9A /dev/null
> >> >>>>
> >> >>>>Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free
> >> >>>>
> >> >>>>Now the file is in wired memory. =9AI do not understand why so.
> >> >>>You do use UFS, right ?
> >> >>
> >> >>Yes.
> >> >>
> >> >>>There is enough buffer headers and buffer KVA
> >> >>>to have buffers allocated for the whole file content. Since buffers=
 wire
> >> >>>corresponding pages, you get pages migrated to wired.
> >> >>>
> >> >>>When there appears a buffer pressure (i.e., any other i/o started),
> >> >>>the buffers will be repurposed and pages moved to inactive.
> >> >>>
> >> >>
> >> >>OK, how can I get amount of disk cache?
> >> >You cannot. At least I am not aware of any counter that keeps track
> >> >of the resident pages belonging to vnode pager.
> >> >
> >> >Buffers should not be thought as disk cache, pages cache disk content.
> >> >Instead, VMIO buffers only provide bread()/bwrite() compatible interf=
ace
> >> >to the page cache (*) for filesystems.
> >> >(*) - The cache term is used in generic term, not to confuse with
> >> >cached pages counter from top etc.
> >> >
> >>
> >> Yes, I know that. =9AI try once again to ask my question about buffers.
> >> Is this reasonable to use for them 10% of the physical memory or we may
> >> set rational upper limit automatically?
> >>
>=20
> This question is still without answer :)
What is the rational upper limit ? Buffers do not consume separate
physical memory, the amount of memory reported as used by buffers is
KVA. Mostly, amount of buffer space limits the amount of outstanding i/o,
including delayed write requests.
Buffers maps the cached vnode pages into kernel address space
for filesystems.

>=20
> >> >>
> >> >>>>
> >> >>>>Could you please give me explanation about active/inactive/wired m=
emory?
> >> >>>>
> >> >>>>
> >> >>>>>because I suspect that the current code does more harm than good.=
 In
> >> >>>>>theory, it saves activations of the page daemon. However, more of=
ten
> >> >>>>>than not, I suspect that we are spending more on page reactivatio=
ns
> >> >>>>>than
> >> >>>>>we are saving on page daemon activations. The sequential access
> >> >>>>>detection heuristic is just too easily triggered. For example, I'=
ve
> >> >>>>>seen
> >> >>>>>it triggered by demand paging of the gcc text segment. Also, I th=
ink
> >> >>>>>that pmap_remove_all() and especially vm_page_cache() are too sev=
ere
> >> >>>>>for
> >> >>>>>a detection heuristic that is so easily triggered.
> >> >>>>>
> >> >>>>[snip]
> >> >>>>
> >> >>>>--
> >> >>>>Andrey Zonov
> >> >>
> >> >>--
> >> >>Andrey Zonov
> >>
> >> --
> >> Andrey Zonov
>=20
>=20
>=20
> --=20
> Andrey Zonov

--8ng/5ht+ekz7N9c2
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk+C1H8ACgkQC3+MBN1Mb4iYdwCdGv91bxbVBzIRmN8YPX1VNLji
zmgAoMgHknz0KMLNIfsvwSa4cayOdCN9
=ye/o
-----END PGP SIGNATURE-----

--8ng/5ht+ekz7N9c2--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120409122224.GN2358>