From owner-freebsd-performance@FreeBSD.ORG  Sun Jan 15 23:32:56 2012
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F23931065678
	for <freebsd-performance@freebsd.org>;
	Sun, 15 Jan 2012 23:32:56 +0000 (UTC)
	(envelope-from dieterbsd@engineer.com)
Received: from mailout-us.gmx.com (mailout-us.gmx.com [74.208.5.67])
	by mx1.freebsd.org (Postfix) with SMTP id B15538FC14
	for <freebsd-performance@freebsd.org>;
	Sun, 15 Jan 2012 23:32:56 +0000 (UTC)
Received: (qmail 20408 invoked by uid 0); 15 Jan 2012 23:32:55 -0000
Received: from 67.206.162.29 by rms-us009.v300.gmx.net with HTTP
Content-Type: text/plain; charset="utf-8"
Date: Sun, 15 Jan 2012 18:32:53 -0500
From: "Dieter BSD" <dieterbsd@engineer.com>
Message-ID: <20120115233255.218250@gmx.com>
MIME-Version: 1.0
To: freebsd-performance@freebsd.org
X-Authenticated: #74169980
X-Flags: 0001
X-Mailer: GMX.com Web Mailer
x-registered: 0
Content-Transfer-Encoding: 8bit
X-GMX-UID: cAdhbyU03zOlNR3dAHAhI7t+IGRvb8Cp
Subject: Re: cmp(1) has a bottleneck, but where?
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 15 Jan 2012 23:32:57 -0000

> posix_fadvise() should probably be used for large files to tell the
> system not to cache the data. Its man page reminded me of the O_DIRECT
> flag. Certainly if the combined size exceeds the size of main memory,
> O_DIRECT would be good (even for benchmarks that cmp the same files :-).
> But cmp and cp are too old to use it.

8.2 says:
man -k posix_fadvise
posix_fadvise: nothing appropriate

The FreeBSD man pages web page says it is not in 9.0 either.

google found:
http://lists.freebsd.org/pipermail/freebsd-hackers/2011-May/035333.html

So what is this posix_fadvise() man page you mention?

O_DIRECT looked interesting, but I haven't found an explaination of
exactly what it does, and
find /usr/src/sys | xargs grep O_DIRECT | wc -l
188
was a bit much to wade through, so I didn't try O_DIRECT.

>> I wrote a prototype no-features cmp using read(2) and memcmp(3).
>> For large files it is faster than the base cmp and uses less cpu.
>> It is I/O bound rather than CPU bound.
>
> What about using mmap() and memcmp()? mmap() shouldn't be inherently
> much worse than read(). I think it shouldn't and doesn't not read
> ahead the whole mmap()ed size (8MB here), since that would be bad for
> latency. So it must page it in when it is accessed, and read ahead
> for that.

cmp 4GB 4GB
52.06 real 14.68 user 5.26 sys

cmp 4GB - < 4GB
44.37 real 33.87 user 5.53 sys

my_cmp 4GB 4GB
41.22 real 5.26 user 5.09 sys

> there is another thread about how bad mmap() and sendfile() are with
> zfs, because zfs is not merged with the buffer cache so using mmap()
> with it wastes about a factor of 2 of memory; sendfile() uses mmap()
> so using it with zfs is bad too. Apparently no one uses cp or cmp
> with zfs :-), or they would notice its slowness there too.

I recently read somewhere that zfs needs 5 GB memory for each 1 TB of disk.
People that run zfs obviously don't care about using lots of memory.

I only noticed the problem because cmp wasn't reading as fast as expected,
but wasn't cpu bound either.

> I think memcmp() instead of byte comparision for cmp -lx is not very
> complex. More interesting is memcmp() for the general case. For
> small files (<= mmap()ed size), mmap() followed by memcmp(), then
> go back to a byte comp to count the line number when memcmp() fails
> seems good. Going back is messier and slower for large files. In
> the worst case of files larger than memory with a difference at the
> end, it involves reading everything twice, so it is twice as slow
> if it is i/o bound.

Studying the cmp man page, it is... unfortunate. The default
prints the byte and line number if the files differ, so it needs
that info. The -l and -x options just keep going after the first
difference. If you want the first byte to be indexed 0 or 1 you can't
choose the radix independantly.

If we only needed the byte count it wouldn't be so bad, but needing
the line count really throws a wrench in the works if we want to use
memcpy(). The only way to avoid needing the line count is -s.

From owner-freebsd-performance@FreeBSD.ORG  Mon Jan 16 01:50:59 2012
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 42CC91065670
	for <freebsd-performance@freebsd.org>;
	Mon, 16 Jan 2012 01:50:59 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
	[211.29.132.184])
	by mx1.freebsd.org (Postfix) with ESMTP id D1DD68FC08
	for <freebsd-performance@freebsd.org>;
	Mon, 16 Jan 2012 01:50:58 +0000 (UTC)
Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au
	(c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136])
	by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q0G1ophL013460
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 16 Jan 2012 12:50:54 +1100
Date: Mon, 16 Jan 2012 12:50:51 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Dieter BSD <dieterbsd@engineer.com>
In-Reply-To: <20120115233255.218250@gmx.com>
Message-ID: <20120116115800.R1541@besplex.bde.org>
References: <20120115233255.218250@gmx.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-performance@freebsd.org
Subject: Re: cmp(1) has a bottleneck, but where?
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jan 2012 01:50:59 -0000

On Sun, 15 Jan 2012, Dieter BSD wrote:

>> posix_fadvise() should probably be used for large files to tell the
>> system not to cache the data. Its man page reminded me of the O_DIRECT
>> flag. Certainly if the combined size exceeds the size of main memory,
>> O_DIRECT would be good (even for benchmarks that cmp the same files :-).
>> But cmp and cp are too old to use it.
>
> 8.2 says:
> man -k posix_fadvise
> posix_fadvise: nothing appropriate
>
> The FreeBSD man pages web page says it is not in 9.0 either.
>
> google found:
> http://lists.freebsd.org/pipermail/freebsd-hackers/2011-May/035333.html
>
> So what is this posix_fadvise() man page you mention?

Standard in 10.0-current.  Not that I normally run that.  I thought I
remembered an older feature that gave this, and didn't notice that
the man page was so new.  Now I remember that the older feature is
madvise(), which is spelled posix_madvise() in POSIX-speak.  So
mmap() may be good for large files after all, but only with use of
madvise() for large files and complications to determine what is a
large file.

Recent mail about this was whether to the primary syscall for the new
API should be spelled correctly (as fadvise(), corresponding to
madvise()).  Currently, there is only the verbose() posix_fadvise().
The options for posix_fadvise() are a large subset of the ones for
madvise(), but spelled with F instead of M and a verbose POSIX prefix
(e.g., MADV_NORMAL for madavise() and even for posix_madvise() becomes
POSIX_FADV_NORMAL for posix_fadvise()).

> O_DIRECT looked interesting, but I haven't found an explaination of
> exactly what it does, and
> find /usr/src/sys | xargs grep O_DIRECT | wc -l
> 188
> was a bit much to wade through, so I didn't try O_DIRECT.

I have no experience using it, but think it is safe to try to see if
it helps.

>> I think memcmp() instead of byte comparision for cmp -lx is not very
>> complex. More interesting is memcmp() for the general case. For
>> small files (<= mmap()ed size), mmap() followed by memcmp(), then
>> go back to a byte comp to count the line number when memcmp() fails
>> seems good. Going back is messier and slower for large files. In
>> the worst case of files larger than memory with a difference at the
>> end, it involves reading everything twice, so it is twice as slow
>> if it is i/o bound.
>
> Studying the cmp man page, it is... unfortunate. The default
> prints the byte and line number if the files differ, so it needs
> that info. The -l and -x options just keep going after the first
> difference. If you want the first byte to be indexed 0 or 1 you can't
> choose the radix independantly.
>
> If we only needed the byte count it wouldn't be so bad, but needing
> the line count really throws a wrench in the works if we want to use
> memcpy(). The only way to avoid needing the line count is -s.

-l or -x also.  The FreeBSD man page isn't clear about when the line
number is printed.  It doesn't say that -l and -x cancel the general
requirement of printing the line number, but they do in practice.
POSIX doesn't have -x, at least in 2001, but it gives the precise
format for -l and there is no line number in it.

Maybe line counting is supposed to be pessimized further by supporting
wide characters.  wc is already fully pessimized for this, but it has
a not-quite-so-slow mode in which it doesn't call mbrtowc() and checks
for '\n' instead of L\'n'.  It also has an extremely fast mode for
wc -c and wc -m, in which for regular files, it just stats the file.

This is another indication that cmp is completely unsuitable for
comparing files for equality.  I couldn't find where POSIX says that
either wc or cmp must support wide characters or multi-byte characters,
but for cmp it says that if the file is not a text file then the line
count is simply the number of <newline> characters.  Clearly non-text
files consist of just bytes, so the <newline>s in them must be simply
'\n' characters which we don't want to count anyway.

Bruce

From owner-freebsd-performance@FreeBSD.ORG  Mon Jan 16 12:05:54 2012
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EE4BD106566B
	for <freebsd-performance@freebsd.org>;
	Mon, 16 Jan 2012 12:05:54 +0000 (UTC)
	(envelope-from tevans.uk@googlemail.com)
Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com
	[209.85.212.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 9C4258FC1A
	for <freebsd-performance@freebsd.org>;
	Mon, 16 Jan 2012 12:05:54 +0000 (UTC)
Received: by vbbey12 with SMTP id ey12so445224vbb.13
	for <freebsd-performance@freebsd.org>;
	Mon, 16 Jan 2012 04:05:53 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=googlemail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=uYiTot9Ji2kxfbK9Nl01Tuw1PXhMdZJNaY4VNVWn2kc=;
	b=seWAlQymZZOLhkyoC2ixTSCJ2RZng8dxcovXTzMpGxhawCcdxZJs6hZ7w83iKBbz3n
	iGTjHwzIuuzRBvfIIlbhNyVEFiRitLoPxCcg+S6e8uYbbsu393AOXAGM7dh8gciy8FzY
	GU7bT3KpAzvaaSnGnM+y3Ud+kDi4uLZHhV+oU=
MIME-Version: 1.0
Received: by 10.52.88.193 with SMTP id bi1mr5525662vdb.105.1326715553736; Mon,
	16 Jan 2012 04:05:53 -0800 (PST)
Received: by 10.52.109.106 with HTTP; Mon, 16 Jan 2012 04:05:53 -0800 (PST)
In-Reply-To: <20120115233255.218250@gmx.com>
References: <20120115233255.218250@gmx.com>
Date: Mon, 16 Jan 2012 12:05:53 +0000
Message-ID: <CAFHbX1LH1CW4a1XMkMpdDttbTSTnhDL65VW=UzyC6qFjKGnS2Q@mail.gmail.com>
From: Tom Evans <tevans.uk@googlemail.com>
To: Dieter BSD <dieterbsd@engineer.com>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-performance@freebsd.org
Subject: Re: cmp(1) has a bottleneck, but where?
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jan 2012 12:05:55 -0000

On Sun, Jan 15, 2012 at 11:32 PM, Dieter BSD <dieterbsd@engineer.com> wrote:
> I recently read somewhere that zfs needs 5 GB memory for each 1 TB of disk.
> People that run zfs obviously don't care about using lots of memory.

You read incorrectly. To run zfs with dedup needs ~ 5GB of RAM per TB,
but this depends upon file size.

However, the majority of ZFS users do not use dedup. My pool is 18 TB
with 8 GB of RAM, of which ZFS can only access 4 GB.

Cheers

Tom