From owner-freebsd-stable@FreeBSD.ORG Thu Jan 14 21:06:58 2010 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1430F106566B; Thu, 14 Jan 2010 21:06:58 +0000 (UTC) (envelope-from mi+thun@aldan.algebra.com) Received: from mail2.timeinc.net (mail2.timeinc.net [64.236.74.30]) by mx1.freebsd.org (Postfix) with ESMTP id 6F9C88FC0A; Thu, 14 Jan 2010 21:06:57 +0000 (UTC) Received: from mail.timeinc.net (mail.timeinc.net [64.12.55.166]) by mail2.timeinc.net (8.13.8/8.13.8) with ESMTP id o0EKMFFH021420 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 14 Jan 2010 15:22:16 -0500 Received: from ws-mteterin.dev.pathfinder.com (ws-mteterin.dev.pathfinder.com [209.251.223.173]) by mail.timeinc.net (8.13.8/8.13.8) with SMTP id o0EKMEvo001575; Thu, 14 Jan 2010 15:22:14 -0500 Message-ID: <4B4F7CF5.4040307@aldan.algebra.com> Date: Thu, 14 Jan 2010 15:22:13 -0500 From: "Mikhail T." Organization: Virtual Estates, Inc. User-Agent: Mozilla/5.0 (X11; U; Linux i686; uk; rv:1.9.1.5) Gecko/20091204 Thunderbird/3.0 MIME-Version: 1.0 To: Peter Jeremy , alc@freebsd.org, stable@freebsd.org References: <200603232352.k2NNqPS8018729@gate.bitblocks.com> <200603241518.01027.mi+mx@aldan.algebra.com> <20060325103927.GE703@turion.vk2pj.dyndns.org> <200603250920.14208@aldan> <20060325190333.GD7001@funkthat.com> In-Reply-To: <20060325190333.GD7001@funkthat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: dillon@backplane.com Subject: An old gripe: Reading via mmap stinks X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Jan 2010 21:06:58 -0000 03/25/06 14:03, John-Mark Gurney wrote: > The other useful/interesting number would be to compare system time > between the mmap case and the read case to see how much work the > kernel is doing in each case... After adding begin- and end-offset options to md5(1) -- implemented using mmap (see bin/142814) -- I, once again, am upset over the slowness of pagefaulting-in compared to the reading-in. (To reproduce my results, patch your /usr/src/sbin/md5/ with http://aldan.algebra.com/~mi/tmp/md5-offsets.patch Then use plain ``md5 LARGE_FILE'' to use read and ``md5 -b 0 LARGE_FILE'' to use the mmap-method.) The times for processing an 8Gb file residing on a reasonable IDE drive (on a recent FreeBSD-7.2-StABLE/i386) are thus: mmap: 43.400u 9.439s 2:35.19 34.0% 16+184k 0+0io 106994pf+0w read: 41.358u 23.799s 2:12.04 49.3% 16+177k 67677+0io 0pf+0w Observe, that even though read-ing is quite taxing on the kernel (high sys-time), the mmap-ing loses overall -- at least, on an otherwise idle system -- because read gets the full throughput of the drive (systat -vm shows 100% disk utilization), while pagefaulting gets only about 69%. When I last brought this up in 2006, it was "revealed", that read(2) uses heuristics to perform a read-ahead. Why can't the pagefaulting-in implementation use the same or similar "trickery" was never explained... Now, without a clue on how these things are implemented, I'll concede, that, probably, it may /sometimes/ be difficult for VM to predict, where the next pagefault will strike, but in the cases, when the process: a) mmaps up to 1Gb at a time; b) issues an madvise MADV_SEQUENTIAL over the entire mmap-ed region mmaping ought to offer the same -- or better -- performance, than read. For example, a hit on a page inside a region marked as SEQUENTIAL ought to bring in the next page or two. VM has all the information and the hints, just does not use them... Shame, is not it? -mi P.S. If it is any consolation, on Linux things seem to be even worse. Processing a 9Gb file on kernel 2.6.18/i386: mmap: 26.222u 6.336s 6:01.75 8.9% 0+0k 0+0io 61032pf+0w read: 25.991u 7.686s 3:43.70 15.0% 0+0k 0+0io 23pf+0w although the absolute times can't be compared with us due to hardware differences, the mmap being nearly twice slower is a shame...