From owner-freebsd-hackers@FreeBSD.ORG Fri Oct 7 00:46:52 2011 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9FA2B106566B for ; Fri, 7 Oct 2011 00:46:52 +0000 (UTC) (envelope-from wojtek@wojtek.tensor.gdynia.pl) Received: from wojtek.tensor.gdynia.pl (wojtek.tensor.gdynia.pl [89.206.35.99]) by mx1.freebsd.org (Postfix) with ESMTP id 1313A8FC13 for ; Fri, 7 Oct 2011 00:46:51 +0000 (UTC) Received: from wojtek.tensor.gdynia.pl (localhost [127.0.0.1]) by wojtek.tensor.gdynia.pl (8.14.5/8.14.4) with ESMTP id p970kpfH027259; Fri, 7 Oct 2011 02:46:51 +0200 (CEST) (envelope-from wojtek@wojtek.tensor.gdynia.pl) Received: from localhost (wojtek@localhost) by wojtek.tensor.gdynia.pl (8.14.5/8.14.4/Submit) with ESMTP id p970kp5I027255; Fri, 7 Oct 2011 02:46:51 +0200 (CEST) (envelope-from wojtek@wojtek.tensor.gdynia.pl) Date: Fri, 7 Oct 2011 02:46:51 +0200 (CEST) From: Wojciech Puchar To: Kostik Belousov In-Reply-To: <20111006160159.GQ1511@deviant.kiev.zoral.com.ua> Message-ID: References: <20111006160159.GQ1511@deviant.kiev.zoral.com.ua> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.7 (wojtek.tensor.gdynia.pl [127.0.0.1]); Fri, 07 Oct 2011 02:46:51 +0200 (CEST) Cc: hackers@freebsd.org, Grzegorz Kulewski Subject: Re: mmap performance and memory use X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Oct 2011 00:46:52 -0000 >> page. how much memory is used to manage this? > I am not sure how deep the enumeration you want to know, but the first > approximation will be: > one struct vm_map_entry > one struct vm_object > one pv_entry actually i don't need precise answer but algorithms. > > Page table structures need four pages for directories and page table proper. >> >> 2) suppose we have 1TB file on disk without holes and 100000 processes >> mmaps this file to it's address space. are just pages shared or can >> pagetables be shared too? how much memory is used to manage such >> situation? > Only pages are shared. Pagetables are not. this is what i really asked, thank you for an answer. My example was rather extreme but datasets of tens of gigabytes would be used. > superpages are due to more efficient use of TLB. actually this is not really working at least a while ago (but already in FreeBSD 8) i tested it. Even with 1GB squid process without any swapping it wasn't often allocating them. Even with working case it probably will not help much here unless completely all data is in RAM, and following explains why > accurate tracking of the accesses and writes, which can result in better > pageout performance. > > For the situation 1TB/100000 processes, you will probably need to tune > the amount of pv entries, see sysctl vm.pmap.pv*. so there is a workaround but causing lots of soft page faults as there would be no more than few hundreds or so instructions between touching different pages. What i want to do is database library (but no SQL!). It will be something alike (but definitely not the same and NOT compatible) CA-Clipper/Harbour or harbour but with higher performance and to use it including heavy cases. With this system one user is one process, one thread. if used as WWW/something alike it will be this+some other thing doing WWW interface but still one logged user=exactly one process As properly planned database tables should not be huge i assume most of them (possibly excluded parts that are mostly not used) will be kept in memory by VM subsystem. So hard faults and disk I/O will not be a deciding factor. To avoid system calls i just want to mmap tables and indexes. All semaphores can be done from userspace too, and i already know how to avoid lock contention well. Using indexes means doing lots of memory reads from different pages, but for every process it will be usually not all pages touched but small subset. So it MAY work well this way, or may end with 95% system CPU time mostly doing soft faults. But future question - is something for that case planned in FreeBSD? I think i am not the only one about that, not all people on earth use computers for few processes or personal usage and there are IMHO many cases when programs need to share huge dataset using mmap, while doing heavy timesharing. I understand that mmap works that way because it may be mapped in different places and even with parts of single file in different places as this is what mmap allows. But is it possible to make different mmap in kernel like that mmap_fullfile(fd,maxsize) which (taking amd64 case) will map file at 2MB boundary if maxsize<=2MB, 1GB boundary if maxsize<=1GB, 512GB boundary otherwise, with subsequent multiple 512GB address blocks if needed, and sharing everything? it is completely no problem that things like madvise from one process will clean madvise setting from other process, or other problems - as only one type of programs that are aware of this would use it. this way there will be practicaly no pagetable mapping overhead and actually simpler/faster OS duties. I don't really know how exactly VM subsystem works under FreeBSD but if it is not hard i may do this with some help from you. And no - i don't want to use any popular database systems for good reasons.