From owner-freebsd-stable@FreeBSD.ORG Sat Mar 25 18:29:36 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 98C8216A41F; Sat, 25 Mar 2006 18:29:36 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 31C8043D45; Sat, 25 Mar 2006 18:29:36 +0000 (GMT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.13.4.20060308/8.13.4) with ESMTP id k2PITM5w014735; Sat, 25 Mar 2006 10:29:22 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.13.4.20060308/8.13.4/Submit) id k2PITH5D014732; Sat, 25 Mar 2006 10:29:17 -0800 (PST) Date: Sat, 25 Mar 2006 10:29:17 -0800 (PST) From: Matthew Dillon Message-Id: <200603251829.k2PITH5D014732@apollo.backplane.com> To: Peter Jeremy References: <200603211607.30372.mi+mx@aldan.algebra.com> <200603231403.36136.mi+mx@aldan.algebra.com> <200603232048.k2NKm4QL067644@apollo.backplane.com> <200603231626.19102.mi+mx@aldan.algebra.com> <200603232316.k2NNGBka068754@apollo.backplane.com> <20060324084940.GA703@turion.vk2pj.dyndns.org> <200603241800.k2OI0KF8005579@apollo.backplane.com> <20060325094207.GD703@turion.vk2pj.dyndns.org> Cc: alc@freebsd.org, Mikhail Teterin , stable@freebsd.org Subject: Re: Reading via mmap stinks (Re: weird bugs with mmap-ing via NFS) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Mar 2006 18:29:36 -0000 :The results here are weird. With 1GB RAM and a 2GB dataset, the :timings seem to depend on the sequence of operations: reading is :significantly faster, but only when the data was mmap'd previously :There's one outlier that I can't easily explain. :... :Peter Jeremy Really odd. Note that if your disk can only do 25 MBytes/sec, the calculation is: 2052167894 / 25MB = ~80 seconds, not ~60 seconds as you would expect from your numbers. So that would imply that the 80 second numbers represent read-ahead, and the 60 second numbers indicate that some of the data was retained from a prior run (and not blown out by the sequential reading in the later run). This type of situation *IS* possible as a side effect of other heuristics. It is particularly possible when you combine read() with mmap because read() uses a different heuristic then mmap() to implement the read-ahead. There is also code in there which depresses the page priority of 'old' already-read pages in the sequential case. So, for example, if you do a linear grep of 2GB you might end up with a cache state that looks like this: l = low priority page m = medium priority page h = high priority page FILE: [---------------------------mmmmmmmmmmmmm] Then when you rescan using mmap, FILE: [lllllllll------------------mmmmmmmmmmmmm] [------lllllllll------------mmmmmmmmmmmmm] [---------lllllllll---------mmmmmmmmmmmmm] [------------lllllllll------mmmmmmmmmmmmm] [---------------lllllllll---mmmmmmmmmmmmm] [------------------lllllllllmmmmmmmmmmmmm] [---------------------llllllHHHmmmmmmmmmm] [------------------------lllHHHHHHmmmmmmm] [---------------------------HHHHHHHHHmmmm] [---------------------------mmmHHHHHHHHHm] The low priority pages don't bump out the medium priority pages from the previous scan, so the grep winds up doing read-ahead until it hits the large swath of pages already cached from the previous scan, without bumping out those pages. There is also a heuristic in the system (FreeBSD and DragonFly) which tries to randomly retain pages. It clearly isn't working :-) I need to change it to randomly retain swaths of pages, the idea being that it should take repeated runs to rebalance the VM cache rather then allowing a single run to blow it out or allowing a static set of pages to be retained indefinitely, which is what your tests seem to show is occuring. -Matt Matthew Dillon