From owner-freebsd-arch@FreeBSD.ORG  Sat May 28 04:12:24 2005
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@FreeBSD.org
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D9F2816A41C
	for <freebsd-arch@FreeBSD.org>; Sat, 28 May 2005 04:12:24 +0000 (GMT)
	(envelope-from bde@zeta.org.au)
Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 54F5443D48
	for <freebsd-arch@FreeBSD.org>; Sat, 28 May 2005 04:12:24 +0000 (GMT)
	(envelope-from bde@zeta.org.au)
Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au
	[61.8.0.87])
	by mailout1.pacific.net.au (8.12.3/8.12.3/Debian-7.1) with ESMTP id
	j4S4CJrI008497; Sat, 28 May 2005 14:12:19 +1000
Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246])
	by mailproxy2.pacific.net.au (8.12.3/8.12.3/Debian-7.1) with ESMTP id
	j4S4CFMC009070; Sat, 28 May 2005 14:12:16 +1000
Date: Sat, 28 May 2005 14:12:16 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@delplex.bde.org
To: Ken Smith <kensmith@cse.Buffalo.EDU>
In-Reply-To: <1117195655.88498.9.camel@opus.cse.buffalo.edu>
Message-ID: <20050528125326.F81578@delplex.bde.org>
References: <1117139065.82793.20.camel@opus.cse.buffalo.edu>
	<20050527091750.GB91258@stack.nl>
	<1117195655.88498.9.camel@opus.cse.buffalo.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Marc Olzheim <marcolz@stack.nl>, freebsd-arch@FreeBSD.org
Subject: Re: Modifying file access time upon exec...
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 28 May 2005 04:12:25 -0000

On Fri, 27 May 2005, Ken Smith wrote:

> On Fri, 2005-05-27 at 11:17 +0200, Marc Olzheim wrote:
>> On Thu, May 26, 2005 at 04:24:25PM -0400, Ken Smith wrote:
>>> Any thoughts before I commit it?  The patch itself is pretty small.  But
>>> given the sections of code it's mucking with combined with it adding a
>>> little 'nit' filesystem implementers should be aware of I wanted to run
>>> it by as many clueful eyes as possible before doing the final commit.
>>
>> Has this been run through some kind of real world performance test ? I
>> can imagine for instance /bin/sh's vnode is being updated a lot... Would
>> it be eligible to a becoming a mount option ?
>
> Bruce did some benchmarking and this approach seemed to be the minimal
> hit on performance of the options we have.  The other things that got
> tested were things like "fake reads".  The whole issue started when the
> exec mechanisms were shifted away from doing file reads in favor of a
> more mmap based mechanism for starting the executables.

The impact is so small that it is hard to see in real world tests.  Hence
microbenchmarks to increase its effect to 10% or so.

>> From his tests the hit seemed minimal.  The noatime mount option seems
> to be the most appropriate thing to use for turning it off, and in that
> case the only cost involved with this addition is the check in exec to
> see if the file is coming from a filesystem that's either noatime or
> readonly.
>
>> I don't see any real problems with it, but perhaps people running
>> executables over NFS filesystems that cannot be mounted with noatime
>> might have an issue, like netbooting diskless machines...

[In a reply, you clarified this to say that another flag might be needed
to disable this new pess^Wfeature since -noatime might not be available
for all file systems.]

Well, if -noatime is not available then you may have already lost
significantly except on write-mostly or exec-mostly file systems.  The
new behaviour only loses significantly in the exec-mostly case, and
then only when execs mostly don't cause reads as a side effect.

For nfs, the -noatime option and atime timestamps generally are horribly
broken.  This brokenness significantly limits the overheads from the
change unless we add to the patch to make atime timestamps on exec
actually work for nfs without changing nfs's basic mishandling of atimes.

An early version of the patch did make atime timestamps sort of work
for nfs.  It did this by setting the atime in vattr (where the current
code intentionally leaves the atime as VNOVAL so that the VOP_SETATTR()
call has no effect fof file systems that haven't been changed to
understand VA_EXEWCVE_ATIME).  This made VOP_SETATTR() set the atime
in the same way as for utimes(2), except there was the VA_EXECVE_ATIME
flag to modify the behaviour.  A modification is needed to bypass
permissions checking.  I only implemented the modification for ffs.
Thus for nfs, the change had much the same overhead as utimes(2) after
every exec and permissions stuff was broken.  For ffs, atimes are
cached and are written by delayed writes so utimes(2) has a relatively
low overhead, but for nfs the timestamps written by utimes(2) are
considered much more precious than most other timestamps -- they are
synced immediately, and this involves a slow nfs transaction and a
synchronous write on the server (modulo sync/normal/async mounts and
bugs in these), so everything is slowed down; OTOH, other timestamps
in nfs are mostly handled more efficiently by not doing them right.

More on broken -noatime mount option and atime timestamps in nfs:
- Mounting with -noatime on the client has no effect.  It is a general
   bug in the mount utilities that some flags which don't apply to the
   particular file system are silently ignored.  -noatime is one of the
   generic flags which could in theory work for all file systems, so it
   is passed to all sub-mounts and is then confusing for the file systems
   that don't support it.
- Mounting with -noatime on the server has an an effect.  It stops normal
   atime timestamps for reads (only).  This is usually what is wanted, but
   strictly it breaks clients mounted without -noatime.
- Reads on the client are mostly cached, and nfs apparently isn't aware
   that _all_ reads should set atime (unless the client is mounted with
   -noatime), so it doesn't tell the server anything and most reads don't
   change the atime on the server.  It would be too expensive to tell the
   server about all atime changes, so a cache on the client is needed.
   A simple local cache would only work if nothing else looks at the
   timestamps.  The cache must somehow be flushed to the server when
   necessary.  Syncing every second might work OK.  The thing to avoid is
   thousands of transactions every second -- a modern system can easily do
   thousands of reads and/or execs per second provded they are mostly from
   a local cache.
- Execs on the client involve reads on the server unless the file is cached,
   since although exec() uses mmap() and not read(), uncached files can only
   be read using read() on the server.  Thus for nfs, nothing needs to be
   changed for atimes to be set for exec() in the same (wrong) way that they
   are set for read(). 
- For utimes(2) and some other metdata changes on the client, the client
   normally wants to force a synchronous change of the metadata on the
   server.  The client has sufficient control of the details in nfs >=3.
   However, FreeBSD doesn't implement metadata-only sync, so FreeBSD
   servers have to fake it by syncing everything for the file.  This adds
   to overheads and defeats the server's policy of not carin much about
   timestamps except for their efficiency.

Bruce