From owner-freebsd-performance@FreeBSD.ORG Sun Jan 15 23:32:56 2012 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F23931065678 for ; Sun, 15 Jan 2012 23:32:56 +0000 (UTC) (envelope-from dieterbsd@engineer.com) Received: from mailout-us.gmx.com (mailout-us.gmx.com [74.208.5.67]) by mx1.freebsd.org (Postfix) with SMTP id B15538FC14 for ; Sun, 15 Jan 2012 23:32:56 +0000 (UTC) Received: (qmail 20408 invoked by uid 0); 15 Jan 2012 23:32:55 -0000 Received: from 67.206.162.29 by rms-us009.v300.gmx.net with HTTP Content-Type: text/plain; charset="utf-8" Date: Sun, 15 Jan 2012 18:32:53 -0500 From: "Dieter BSD" Message-ID: <20120115233255.218250@gmx.com> MIME-Version: 1.0 To: freebsd-performance@freebsd.org X-Authenticated: #74169980 X-Flags: 0001 X-Mailer: GMX.com Web Mailer x-registered: 0 Content-Transfer-Encoding: 8bit X-GMX-UID: cAdhbyU03zOlNR3dAHAhI7t+IGRvb8Cp Subject: Re: cmp(1) has a bottleneck, but where? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Jan 2012 23:32:57 -0000 > posix_fadvise() should probably be used for large files to tell the > system not to cache the data. Its man page reminded me of the O_DIRECT > flag. Certainly if the combined size exceeds the size of main memory, > O_DIRECT would be good (even for benchmarks that cmp the same files :-). > But cmp and cp are too old to use it. 8.2 says: man -k posix_fadvise posix_fadvise: nothing appropriate The FreeBSD man pages web page says it is not in 9.0 either. google found: http://lists.freebsd.org/pipermail/freebsd-hackers/2011-May/035333.html So what is this posix_fadvise() man page you mention? O_DIRECT looked interesting, but I haven't found an explaination of exactly what it does, and find /usr/src/sys | xargs grep O_DIRECT | wc -l 188 was a bit much to wade through, so I didn't try O_DIRECT. >> I wrote a prototype no-features cmp using read(2) and memcmp(3). >> For large files it is faster than the base cmp and uses less cpu. >> It is I/O bound rather than CPU bound. > > What about using mmap() and memcmp()? mmap() shouldn't be inherently > much worse than read(). I think it shouldn't and doesn't not read > ahead the whole mmap()ed size (8MB here), since that would be bad for > latency. So it must page it in when it is accessed, and read ahead > for that. cmp 4GB 4GB 52.06 real 14.68 user 5.26 sys cmp 4GB - < 4GB 44.37 real 33.87 user 5.53 sys my_cmp 4GB 4GB 41.22 real 5.26 user 5.09 sys > there is another thread about how bad mmap() and sendfile() are with > zfs, because zfs is not merged with the buffer cache so using mmap() > with it wastes about a factor of 2 of memory; sendfile() uses mmap() > so using it with zfs is bad too. Apparently no one uses cp or cmp > with zfs :-), or they would notice its slowness there too. I recently read somewhere that zfs needs 5 GB memory for each 1 TB of disk. People that run zfs obviously don't care about using lots of memory. I only noticed the problem because cmp wasn't reading as fast as expected, but wasn't cpu bound either. > I think memcmp() instead of byte comparision for cmp -lx is not very > complex. More interesting is memcmp() for the general case. For > small files (<= mmap()ed size), mmap() followed by memcmp(), then > go back to a byte comp to count the line number when memcmp() fails > seems good. Going back is messier and slower for large files. In > the worst case of files larger than memory with a difference at the > end, it involves reading everything twice, so it is twice as slow > if it is i/o bound. Studying the cmp man page, it is... unfortunate. The default prints the byte and line number if the files differ, so it needs that info. The -l and -x options just keep going after the first difference. If you want the first byte to be indexed 0 or 1 you can't choose the radix independantly. If we only needed the byte count it wouldn't be so bad, but needing the line count really throws a wrench in the works if we want to use memcpy(). The only way to avoid needing the line count is -s. From owner-freebsd-performance@FreeBSD.ORG Mon Jan 16 01:50:59 2012 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 42CC91065670 for ; Mon, 16 Jan 2012 01:50:59 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id D1DD68FC08 for ; Mon, 16 Jan 2012 01:50:58 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q0G1ophL013460 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 16 Jan 2012 12:50:54 +1100 Date: Mon, 16 Jan 2012 12:50:51 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Dieter BSD In-Reply-To: <20120115233255.218250@gmx.com> Message-ID: <20120116115800.R1541@besplex.bde.org> References: <20120115233255.218250@gmx.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-performance@freebsd.org Subject: Re: cmp(1) has a bottleneck, but where? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Jan 2012 01:50:59 -0000 On Sun, 15 Jan 2012, Dieter BSD wrote: >> posix_fadvise() should probably be used for large files to tell the >> system not to cache the data. Its man page reminded me of the O_DIRECT >> flag. Certainly if the combined size exceeds the size of main memory, >> O_DIRECT would be good (even for benchmarks that cmp the same files :-). >> But cmp and cp are too old to use it. > > 8.2 says: > man -k posix_fadvise > posix_fadvise: nothing appropriate > > The FreeBSD man pages web page says it is not in 9.0 either. > > google found: > http://lists.freebsd.org/pipermail/freebsd-hackers/2011-May/035333.html > > So what is this posix_fadvise() man page you mention? Standard in 10.0-current. Not that I normally run that. I thought I remembered an older feature that gave this, and didn't notice that the man page was so new. Now I remember that the older feature is madvise(), which is spelled posix_madvise() in POSIX-speak. So mmap() may be good for large files after all, but only with use of madvise() for large files and complications to determine what is a large file. Recent mail about this was whether to the primary syscall for the new API should be spelled correctly (as fadvise(), corresponding to madvise()). Currently, there is only the verbose() posix_fadvise(). The options for posix_fadvise() are a large subset of the ones for madvise(), but spelled with F instead of M and a verbose POSIX prefix (e.g., MADV_NORMAL for madavise() and even for posix_madvise() becomes POSIX_FADV_NORMAL for posix_fadvise()). > O_DIRECT looked interesting, but I haven't found an explaination of > exactly what it does, and > find /usr/src/sys | xargs grep O_DIRECT | wc -l > 188 > was a bit much to wade through, so I didn't try O_DIRECT. I have no experience using it, but think it is safe to try to see if it helps. >> I think memcmp() instead of byte comparision for cmp -lx is not very >> complex. More interesting is memcmp() for the general case. For >> small files (<= mmap()ed size), mmap() followed by memcmp(), then >> go back to a byte comp to count the line number when memcmp() fails >> seems good. Going back is messier and slower for large files. In >> the worst case of files larger than memory with a difference at the >> end, it involves reading everything twice, so it is twice as slow >> if it is i/o bound. > > Studying the cmp man page, it is... unfortunate. The default > prints the byte and line number if the files differ, so it needs > that info. The -l and -x options just keep going after the first > difference. If you want the first byte to be indexed 0 or 1 you can't > choose the radix independantly. > > If we only needed the byte count it wouldn't be so bad, but needing > the line count really throws a wrench in the works if we want to use > memcpy(). The only way to avoid needing the line count is -s. -l or -x also. The FreeBSD man page isn't clear about when the line number is printed. It doesn't say that -l and -x cancel the general requirement of printing the line number, but they do in practice. POSIX doesn't have -x, at least in 2001, but it gives the precise format for -l and there is no line number in it. Maybe line counting is supposed to be pessimized further by supporting wide characters. wc is already fully pessimized for this, but it has a not-quite-so-slow mode in which it doesn't call mbrtowc() and checks for '\n' instead of L\'n'. It also has an extremely fast mode for wc -c and wc -m, in which for regular files, it just stats the file. This is another indication that cmp is completely unsuitable for comparing files for equality. I couldn't find where POSIX says that either wc or cmp must support wide characters or multi-byte characters, but for cmp it says that if the file is not a text file then the line count is simply the number of characters. Clearly non-text files consist of just bytes, so the s in them must be simply '\n' characters which we don't want to count anyway. Bruce From owner-freebsd-performance@FreeBSD.ORG Mon Jan 16 12:05:54 2012 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EE4BD106566B for ; Mon, 16 Jan 2012 12:05:54 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id 9C4258FC1A for ; Mon, 16 Jan 2012 12:05:54 +0000 (UTC) Received: by vbbey12 with SMTP id ey12so445224vbb.13 for ; Mon, 16 Jan 2012 04:05:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=uYiTot9Ji2kxfbK9Nl01Tuw1PXhMdZJNaY4VNVWn2kc=; b=seWAlQymZZOLhkyoC2ixTSCJ2RZng8dxcovXTzMpGxhawCcdxZJs6hZ7w83iKBbz3n iGTjHwzIuuzRBvfIIlbhNyVEFiRitLoPxCcg+S6e8uYbbsu393AOXAGM7dh8gciy8FzY GU7bT3KpAzvaaSnGnM+y3Ud+kDi4uLZHhV+oU= MIME-Version: 1.0 Received: by 10.52.88.193 with SMTP id bi1mr5525662vdb.105.1326715553736; Mon, 16 Jan 2012 04:05:53 -0800 (PST) Received: by 10.52.109.106 with HTTP; Mon, 16 Jan 2012 04:05:53 -0800 (PST) In-Reply-To: <20120115233255.218250@gmx.com> References: <20120115233255.218250@gmx.com> Date: Mon, 16 Jan 2012 12:05:53 +0000 Message-ID: From: Tom Evans To: Dieter BSD Content-Type: text/plain; charset=UTF-8 Cc: freebsd-performance@freebsd.org Subject: Re: cmp(1) has a bottleneck, but where? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Jan 2012 12:05:55 -0000 On Sun, Jan 15, 2012 at 11:32 PM, Dieter BSD wrote: > I recently read somewhere that zfs needs 5 GB memory for each 1 TB of disk. > People that run zfs obviously don't care about using lots of memory. You read incorrectly. To run zfs with dedup needs ~ 5GB of RAM per TB, but this depends upon file size. However, the majority of ZFS users do not use dedup. My pool is 18 TB with 8 GB of RAM, of which ZFS can only access 4 GB. Cheers Tom