From owner-freebsd-performance@FreeBSD.ORG Thu Jan 12 19:31:46 2012 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1A7A5106566C for ; Thu, 12 Jan 2012 19:31:46 +0000 (UTC) (envelope-from dieterbsd@engineer.com) Received: from mailout-us.mail.com (mailout-us.gmx.com [74.208.5.67]) by mx1.freebsd.org (Postfix) with SMTP id B73BC8FC18 for ; Thu, 12 Jan 2012 19:31:45 +0000 (UTC) Received: (qmail 25510 invoked by uid 0); 12 Jan 2012 19:31:44 -0000 Received: from 67.206.186.17 by rms-us004.v300.gmx.net with HTTP Content-Type: text/plain; charset="utf-8" Date: Thu, 12 Jan 2012 14:31:41 -0500 From: "Dieter BSD" Message-ID: <20120112193142.218240@gmx.com> MIME-Version: 1.0 To: freebsd-performance@freebsd.org X-Authenticated: #74169980 X-Flags: 0001 X-Mailer: GMX.com Web Mailer x-registered: 0 Content-Transfer-Encoding: 8bit X-GMX-UID: SVB9byQ03zOlNR3dAHAh+ot+IGRvbwAL Subject: Re: cmp(1) has a bottleneck, but where? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 19:31:46 -0000 > The hard \xc2\xa0 certainly deserves a :-(. Agreed. Brain damaged guity-until-proven-innocent anti-spam measures force the use of webmail for outgoing email. Which amoung other problems inserts garbage. Sorry. >> A) Should the default vfs.read_max be increased? > > Maybe, but I don't buy most claims that larger block sizes are better. I didn't say anything about block sizes. There needs to be enough data in memory so that the CPU doesn't run out while the disk is seeking. >> B) Can the mmap case be fixed? What is the aledged benefit of >> using mmap anyway? All I've even seen are problems. > > It is much faster for cases where the file is already in memory. It > is unclear whether this case is common enough to matter. I guess it > isn't. Is there a reasonably efficient way to tell if a file is already in memory or not? If not, then we have to guess. If the file is larger than memory it cannot already be in memory. For real world uses, there are 2 files, and not all memory can be used for buffering files. So cmp could check the file sizes and if larger than x% of main memory then assume not in memory. There could be a command line argument specifying which method to use, or providing a guess whether the files are in memory or not. I wrote a prototype no-features cmp using read(2) and memcmp(3). For large files it is faster than the base cmp and uses less cpu. It is I/O bound rather than CPU bound. So perhaps use memcmp when possible and decide between read and mmap based on (something)? Assuming the added performance justifies the added complexity?