Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 9 Apr 2012 15:35:30 +0400
From:      Andrey Zonov <andrey@zonov.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        alc@freebsd.org, freebsd-hackers@freebsd.org, Alan Cox <alc@rice.edu>
Subject:   Re: problems with mmap() and disk caching
Message-ID:  <CANU_PUEMJcV_ysPhhi3%2BMkGmM60bSTsRKgEH5Rj4KkrRPTw%2B9Q@mail.gmail.com>
In-Reply-To: <20120409091839.GH2358@deviant.kiev.zoral.com.ua>
References:  <4F7B495D.3010402@zonov.org> <20120404071746.GJ2358@deviant.kiev.zoral.com.ua> <4F7DC037.9060803@rice.edu> <4F7DF39A.3000500@zonov.org> <20120405194122.GC2358@deviant.kiev.zoral.com.ua> <4F7DF88D.2050907@zonov.org> <20120406081349.GE2358@deviant.kiev.zoral.com.ua> <4F828D15.8080604@zonov.org> <20120409091839.GH2358@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Apr 9, 2012 at 1:18 PM, Konstantin Belousov <kostikbel@gmail.com> w=
rote:
> On Mon, Apr 09, 2012 at 11:17:41AM +0400, Andrey Zonov wrote:
>> On 06.04.2012 12:13, Konstantin Belousov wrote:
>> >On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote:
[snip]
>> >>I always thought that active memory this is a sum of resident memory o=
f
>> >>all processes, inactive shows disk cache and wired shows kernel itself=
.
>> >So you are wrong. Both active and inactive memory can be mapped and
>> >not mapped, both can belong to vnode or to anonymous objects etc.
>> >Active/inactive distinction is only the amount of references that was
>> >noted by pagedaemon, or some other page history like the way it was
>> >unwired.
>> >
>> >Wired is not neccessary means kernel-used pages, user processes can
>> >wire their pages as well.
>>
>> Let's talk about that in details.
>>
>> My understanding is the following:
>>
>> Active memory: the memory which is referenced by application. =A0An
> Assuming the part 'by application' is removed, this sentence is almost ri=
ght.
> Any managed mapping of the page participates in the active references.
>
>> application may get memory only through mmap() (allocator don't use
>> brk()/sbrk() any more). =A0The resident memory of an application is the
>> sum of physical used memory. =A0So, sum of RSS is active memory.
> First, brk/sbrk is still used. Second, there is no requirement that
> resident pages are referenced. E.g. page could have participated in the
> buffer, and unwiring on the buffer dissolve put it into inactive state.
> Or pagedaemon cleared the reference and moved the page to inactive queue.
> Or the page was prefaulted by different optimizations.
>
> More, there is subtle difference between 'resident' and 'not causing faul=
t
> on access'. Page may be resident, but pte was not preinstalled, or pte
> was flushed etc.

>From the user point of view: how can the memory be active if no-one (I
mean application) use it?

What I really saw not at once is that the program for a long time
worked with big mmap()'ed file, couldn't work well (many page faults)
with new version of the file, until I manually flushed active memory
by FS re-mounting.  New version couldn't force out the old one.  In my
opinion if VM moved cached objects to inactive queue after program
termination I wouldn't see this problem.

>>
>> Inactive memory: the memory which has no references. =A0Once we call
>> read() on the file, the file is in inactive memory, because we have no
>> references to this object, we just read it. =A0This is also released
>> memory by free().
> On buffers dissolve, buffer cache explicitely puts pages constituing
> the buffer, into the inactive queue. In fact, this is not quite right,
> e.g. if the same pages are mapped and actively referenced, then
> pagedaemon has slightly more work now to move the page from inactive
> to active.
>

Yes, sure, if someone else use the object it should be active and even
better to introduce new "SHARED" counter, like one is in MacOSX and
Linux.

> And, free(3) operates at so much higher level then vm subsystem that
> describing the interaction between these two is impossible in any
> definitive mood. Old naive mallocs put block description at the beggining
> of the block, actually causing free() to reference at least the first
> page of the block. Jemalloc often does madvise(MADV_FREE) for large
> freed allocations. MADV_FREE =A0moves pages between queues probabalistica=
lly.
>

That's exactly what I meant by free().  We drop act_count to 0 and
move page to inactive queue by vm_page_dontneed()

>>
>> Cache memory: I don't know what is it. It's always small enough to not
>> think about it.
> This was the bug you reported, and which Alan fixed on Sunday.
>

I've tested this patch under 9.0-STABLE and should say that it
introduces problems with interactivity on heavy disk loaded machines.
With the patch that I tested before I didn't observe such problems.

>>
>> Wired memory: kernel memory and yes, application may get wired memory
>> through mlock()/mlockall(), but I haven't seen any real application
>> which calls mlock().
> ntpd, amd from the base system. gpg and similar programs try to mlock
> key store to avoid sensitive material leakage to the swap. cdrecord(8)
> tried to mlock itself to avoid indefinite stalls during write.
>

Nice catch ;-)

>
>>
>> >>
>> >>>>
>> >>>>Read the file:
>> >>>>$ cat /mnt/random> =A0 /dev/null
>> >>>>
>> >>>>Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free
>> >>>>
>> >>>>Now the file is in wired memory. =A0I do not understand why so.
>> >>>You do use UFS, right ?
>> >>
>> >>Yes.
>> >>
>> >>>There is enough buffer headers and buffer KVA
>> >>>to have buffers allocated for the whole file content. Since buffers w=
ire
>> >>>corresponding pages, you get pages migrated to wired.
>> >>>
>> >>>When there appears a buffer pressure (i.e., any other i/o started),
>> >>>the buffers will be repurposed and pages moved to inactive.
>> >>>
>> >>
>> >>OK, how can I get amount of disk cache?
>> >You cannot. At least I am not aware of any counter that keeps track
>> >of the resident pages belonging to vnode pager.
>> >
>> >Buffers should not be thought as disk cache, pages cache disk content.
>> >Instead, VMIO buffers only provide bread()/bwrite() compatible interfac=
e
>> >to the page cache (*) for filesystems.
>> >(*) - The cache term is used in generic term, not to confuse with
>> >cached pages counter from top etc.
>> >
>>
>> Yes, I know that. =A0I try once again to ask my question about buffers.
>> Is this reasonable to use for them 10% of the physical memory or we may
>> set rational upper limit automatically?
>>

This question is still without answer :)

>> >>
>> >>>>
>> >>>>Could you please give me explanation about active/inactive/wired mem=
ory?
>> >>>>
>> >>>>
>> >>>>>because I suspect that the current code does more harm than good. I=
n
>> >>>>>theory, it saves activations of the page daemon. However, more ofte=
n
>> >>>>>than not, I suspect that we are spending more on page reactivations
>> >>>>>than
>> >>>>>we are saving on page daemon activations. The sequential access
>> >>>>>detection heuristic is just too easily triggered. For example, I've
>> >>>>>seen
>> >>>>>it triggered by demand paging of the gcc text segment. Also, I thin=
k
>> >>>>>that pmap_remove_all() and especially vm_page_cache() are too sever=
e
>> >>>>>for
>> >>>>>a detection heuristic that is so easily triggered.
>> >>>>>
>> >>>>[snip]
>> >>>>
>> >>>>--
>> >>>>Andrey Zonov
>> >>
>> >>--
>> >>Andrey Zonov
>>
>> --
>> Andrey Zonov



--=20
Andrey Zonov



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANU_PUEMJcV_ysPhhi3%2BMkGmM60bSTsRKgEH5Rj4KkrRPTw%2B9Q>