From owner-freebsd-hackers@FreeBSD.ORG Mon Apr 9 11:35:31 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B9C3D106564A for ; Mon, 9 Apr 2012 11:35:31 +0000 (UTC) (envelope-from andrey@zonov.org) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 41FF88FC1A for ; Mon, 9 Apr 2012 11:35:31 +0000 (UTC) Received: by wgbds12 with SMTP id ds12so4152478wgb.31 for ; Mon, 09 Apr 2012 04:35:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding:x-gm-message-state; bh=YYYIghceHTRlro25UHZveP2p2WfnWbETuM3C5LrfvRw=; b=iCdktrK3f9o70sOR1+YgGVvWnAdL2kPIxVWkALFyh2vsZpebgXpdRyecWJG8LI7TYI 2CTlknk7z5QNWedp9/Cs+7h0EIzJwVa5oKoDn8JqRyJlfnhmBWDDXcl3CZ0K3M6dB7Jg 4TRrEypTneAlOjg9pvGxZLkfVxHNmpWqZhvG1P4fO1E5m+AyHru+pR9ARMW7fenhLkMZ oGWwWXU8wIr1nDpCRhcCoKn7Sc5phpliEul3H7Nhm9I9UhtYjuEeVJJ/KZ0n1lgkhX3u MxohWyK9YYfz/n/zNp6EgS82YTay0TYCDAQBebuyZ2ZxKxskwiOiD/Lt6TESun4swrTt I3PA== MIME-Version: 1.0 Received: by 10.180.104.137 with SMTP id ge9mr15044031wib.20.1333971330264; Mon, 09 Apr 2012 04:35:30 -0700 (PDT) Received: by 10.180.80.230 with HTTP; Mon, 9 Apr 2012 04:35:30 -0700 (PDT) X-Originating-IP: [95.108.170.198] In-Reply-To: <20120409091839.GH2358@deviant.kiev.zoral.com.ua> References: <4F7B495D.3010402@zonov.org> <20120404071746.GJ2358@deviant.kiev.zoral.com.ua> <4F7DC037.9060803@rice.edu> <4F7DF39A.3000500@zonov.org> <20120405194122.GC2358@deviant.kiev.zoral.com.ua> <4F7DF88D.2050907@zonov.org> <20120406081349.GE2358@deviant.kiev.zoral.com.ua> <4F828D15.8080604@zonov.org> <20120409091839.GH2358@deviant.kiev.zoral.com.ua> Date: Mon, 9 Apr 2012 15:35:30 +0400 Message-ID: From: Andrey Zonov To: Konstantin Belousov Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQl7EamPvol5lPTHFM034snuqXpuTPAgWkPkkueZyGc4VlqUSsUEI0xtsaxlT8NN/uMYgYXD Cc: alc@freebsd.org, freebsd-hackers@freebsd.org, Alan Cox Subject: Re: problems with mmap() and disk caching X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 11:35:31 -0000 On Mon, Apr 9, 2012 at 1:18 PM, Konstantin Belousov w= rote: > On Mon, Apr 09, 2012 at 11:17:41AM +0400, Andrey Zonov wrote: >> On 06.04.2012 12:13, Konstantin Belousov wrote: >> >On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote: [snip] >> >>I always thought that active memory this is a sum of resident memory o= f >> >>all processes, inactive shows disk cache and wired shows kernel itself= . >> >So you are wrong. Both active and inactive memory can be mapped and >> >not mapped, both can belong to vnode or to anonymous objects etc. >> >Active/inactive distinction is only the amount of references that was >> >noted by pagedaemon, or some other page history like the way it was >> >unwired. >> > >> >Wired is not neccessary means kernel-used pages, user processes can >> >wire their pages as well. >> >> Let's talk about that in details. >> >> My understanding is the following: >> >> Active memory: the memory which is referenced by application. =A0An > Assuming the part 'by application' is removed, this sentence is almost ri= ght. > Any managed mapping of the page participates in the active references. > >> application may get memory only through mmap() (allocator don't use >> brk()/sbrk() any more). =A0The resident memory of an application is the >> sum of physical used memory. =A0So, sum of RSS is active memory. > First, brk/sbrk is still used. Second, there is no requirement that > resident pages are referenced. E.g. page could have participated in the > buffer, and unwiring on the buffer dissolve put it into inactive state. > Or pagedaemon cleared the reference and moved the page to inactive queue. > Or the page was prefaulted by different optimizations. > > More, there is subtle difference between 'resident' and 'not causing faul= t > on access'. Page may be resident, but pte was not preinstalled, or pte > was flushed etc. >From the user point of view: how can the memory be active if no-one (I mean application) use it? What I really saw not at once is that the program for a long time worked with big mmap()'ed file, couldn't work well (many page faults) with new version of the file, until I manually flushed active memory by FS re-mounting. New version couldn't force out the old one. In my opinion if VM moved cached objects to inactive queue after program termination I wouldn't see this problem. >> >> Inactive memory: the memory which has no references. =A0Once we call >> read() on the file, the file is in inactive memory, because we have no >> references to this object, we just read it. =A0This is also released >> memory by free(). > On buffers dissolve, buffer cache explicitely puts pages constituing > the buffer, into the inactive queue. In fact, this is not quite right, > e.g. if the same pages are mapped and actively referenced, then > pagedaemon has slightly more work now to move the page from inactive > to active. > Yes, sure, if someone else use the object it should be active and even better to introduce new "SHARED" counter, like one is in MacOSX and Linux. > And, free(3) operates at so much higher level then vm subsystem that > describing the interaction between these two is impossible in any > definitive mood. Old naive mallocs put block description at the beggining > of the block, actually causing free() to reference at least the first > page of the block. Jemalloc often does madvise(MADV_FREE) for large > freed allocations. MADV_FREE =A0moves pages between queues probabalistica= lly. > That's exactly what I meant by free(). We drop act_count to 0 and move page to inactive queue by vm_page_dontneed() >> >> Cache memory: I don't know what is it. It's always small enough to not >> think about it. > This was the bug you reported, and which Alan fixed on Sunday. > I've tested this patch under 9.0-STABLE and should say that it introduces problems with interactivity on heavy disk loaded machines. With the patch that I tested before I didn't observe such problems. >> >> Wired memory: kernel memory and yes, application may get wired memory >> through mlock()/mlockall(), but I haven't seen any real application >> which calls mlock(). > ntpd, amd from the base system. gpg and similar programs try to mlock > key store to avoid sensitive material leakage to the swap. cdrecord(8) > tried to mlock itself to avoid indefinite stalls during write. > Nice catch ;-) > >> >> >> >> >>>> >> >>>>Read the file: >> >>>>$ cat /mnt/random> =A0 /dev/null >> >>>> >> >>>>Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free >> >>>> >> >>>>Now the file is in wired memory. =A0I do not understand why so. >> >>>You do use UFS, right ? >> >> >> >>Yes. >> >> >> >>>There is enough buffer headers and buffer KVA >> >>>to have buffers allocated for the whole file content. Since buffers w= ire >> >>>corresponding pages, you get pages migrated to wired. >> >>> >> >>>When there appears a buffer pressure (i.e., any other i/o started), >> >>>the buffers will be repurposed and pages moved to inactive. >> >>> >> >> >> >>OK, how can I get amount of disk cache? >> >You cannot. At least I am not aware of any counter that keeps track >> >of the resident pages belonging to vnode pager. >> > >> >Buffers should not be thought as disk cache, pages cache disk content. >> >Instead, VMIO buffers only provide bread()/bwrite() compatible interfac= e >> >to the page cache (*) for filesystems. >> >(*) - The cache term is used in generic term, not to confuse with >> >cached pages counter from top etc. >> > >> >> Yes, I know that. =A0I try once again to ask my question about buffers. >> Is this reasonable to use for them 10% of the physical memory or we may >> set rational upper limit automatically? >> This question is still without answer :) >> >> >> >>>> >> >>>>Could you please give me explanation about active/inactive/wired mem= ory? >> >>>> >> >>>> >> >>>>>because I suspect that the current code does more harm than good. I= n >> >>>>>theory, it saves activations of the page daemon. However, more ofte= n >> >>>>>than not, I suspect that we are spending more on page reactivations >> >>>>>than >> >>>>>we are saving on page daemon activations. The sequential access >> >>>>>detection heuristic is just too easily triggered. For example, I've >> >>>>>seen >> >>>>>it triggered by demand paging of the gcc text segment. Also, I thin= k >> >>>>>that pmap_remove_all() and especially vm_page_cache() are too sever= e >> >>>>>for >> >>>>>a detection heuristic that is so easily triggered. >> >>>>> >> >>>>[snip] >> >>>> >> >>>>-- >> >>>>Andrey Zonov >> >> >> >>-- >> >>Andrey Zonov >> >> -- >> Andrey Zonov --=20 Andrey Zonov