From owner-freebsd-arch@FreeBSD.ORG Thu Nov 10 01:47:59 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E61B6106564A; Thu, 10 Nov 2011 01:47:58 +0000 (UTC) (envelope-from peter@wemm.org) Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com [209.85.161.182]) by mx1.freebsd.org (Postfix) with ESMTP id 7C7408FC13; Thu, 10 Nov 2011 01:47:58 +0000 (UTC) Received: by ggnk3 with SMTP id k3so3356527ggn.13 for ; Wed, 09 Nov 2011 17:47:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wemm.org; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=8+Ux4IIGfluUEknWx63r6PiRJRoCt/8dmgWqwyDF380=; b=el3XHnJ8PsHgRMH+AhifVNbAQZLLRgUS8jvZGuFCBLE9WLemJxr75bBGgs5D3Y7XNY X/NihSsVVOrvJeVcMcZvGYv3yR2kp7H+swqpv62DvMdLXWQ+A/sycZ+GXoQkWvHn2yZn cxm/hmtf97Cq1dpj1I/71udmCsqr0B6O6alns= MIME-Version: 1.0 Received: by 10.50.161.131 with SMTP id xs3mr5799318igb.23.1320889677469; Wed, 09 Nov 2011 17:47:57 -0800 (PST) Received: by 10.50.186.233 with HTTP; Wed, 9 Nov 2011 17:47:57 -0800 (PST) In-Reply-To: <20111110012542.GA6110@elvis.mu.org> References: <201110281426.00013.jhb@freebsd.org> <4EB2C9DD.9090606@FreeBSD.org> <20111104160319.GD6110@elvis.mu.org> <201111080800.32717.jhb@freebsd.org> <4EBB104F.5010000@cran.org.uk> <20111110012542.GA6110@elvis.mu.org> Date: Wed, 9 Nov 2011 17:47:57 -0800 Message-ID: From: Peter Wemm To: Alfred Perlstein Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Bruce Cran , Ed Schouten , Paul Saab , Jilles Tjoelker , arch@freebsd.org, freebsd-arch@freebsd.org Subject: Re: [PATCH] fadvise(2) system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Nov 2011 01:47:59 -0000 On Wed, Nov 9, 2011 at 5:25 PM, Alfred Perlstein wrote= : > * Paul Saab [111109 16:32] wrote: >> On Wed, Nov 9, 2011 at 3:44 PM, Bruce Cran wrote: >> > On 08/11/2011 13:00, John Baldwin wrote: >> >> >> >> I think it would be fine to add flags to applications like 'tar' to a= llow >> >> users to alter their behavior in specific use cases when it makes sen= se. >> >> However, I think there are more workloads for 'tar' than the ones you= are >> >> thinking of and we should be hesitant to change applications to use n= on- >> >> default settings. >> > >> > Someone's done that for GNU tar on Linux, adding a --no-oscache switch= : >> > http://www.mysqlperformanceblog.com/2010/04/02/fadvise-may-be-not-what= -you-expect/ >> >> So adding this support is good, but not for general purpose. =A0It's >> really only good when you're pumping gigs of data through tar. =A0I did >> this for libarchive =A0(plus other work for O_DIRECT reading and >> creating the archive) for copying large amounts of data without >> impacting a running system.. It worked great for this, but then it >> absolutely fails when extracting a tar archive with millions of little >> files because of all the sync operations. > > I've thought about this and it almost makes sense to have a secondary > LRU that such pages would wind up in that is much smaller than the system > one. =A0I'm pretty sure there are a number of papers on this, but I've no= t > looked over them in a long while. We actually do have a fairly extensive anti-swamping mechanisms in place, but they are in somewhat obscure places or are side effects of other policies elsewhere. eg: the vmio kva mapping space is limited and the dirty fraction is enforced there. This provides the file write based anti-swamping back pressure. This of course is completely bypassed with mmap reads/writes. That's why writing to a huge mmap file hits like a truck, but doing write() to the same file does not. >> >> Anyway, this is a good option to enable and has very practical uses >> out there, but it should be turned on with an option and not on by >> default. > > What about the operation of just reading the tar archive itself? Personally, I really don't want to blow away useful cache contents with a tar file unless I explicitly say so. I'm generally more worried about keeping usefully cached random bits of files that are scattered all over the drive than with a sequentially readable tar file that is a best case scenario for file reads. I'd rather read a 2GB file sequentially twice than to kick out 2GB of random access data. In any case, fadvise() is a tool that we should have. Deciding on tar(1) policy is an entirely separate thing. --=20 Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV "All of this is for nothing if we don't go to the stars" - JMS/B5 "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell