From owner-freebsd-arch@FreeBSD.ORG Sat May 28 04:12:24 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D9F2816A41C for ; Sat, 28 May 2005 04:12:24 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 54F5443D48 for ; Sat, 28 May 2005 04:12:24 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87]) by mailout1.pacific.net.au (8.12.3/8.12.3/Debian-7.1) with ESMTP id j4S4CJrI008497; Sat, 28 May 2005 14:12:19 +1000 Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy2.pacific.net.au (8.12.3/8.12.3/Debian-7.1) with ESMTP id j4S4CFMC009070; Sat, 28 May 2005 14:12:16 +1000 Date: Sat, 28 May 2005 14:12:16 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Ken Smith In-Reply-To: <1117195655.88498.9.camel@opus.cse.buffalo.edu> Message-ID: <20050528125326.F81578@delplex.bde.org> References: <1117139065.82793.20.camel@opus.cse.buffalo.edu> <20050527091750.GB91258@stack.nl> <1117195655.88498.9.camel@opus.cse.buffalo.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Marc Olzheim , freebsd-arch@FreeBSD.org Subject: Re: Modifying file access time upon exec... X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 May 2005 04:12:25 -0000 On Fri, 27 May 2005, Ken Smith wrote: > On Fri, 2005-05-27 at 11:17 +0200, Marc Olzheim wrote: >> On Thu, May 26, 2005 at 04:24:25PM -0400, Ken Smith wrote: >>> Any thoughts before I commit it? The patch itself is pretty small. But >>> given the sections of code it's mucking with combined with it adding a >>> little 'nit' filesystem implementers should be aware of I wanted to run >>> it by as many clueful eyes as possible before doing the final commit. >> >> Has this been run through some kind of real world performance test ? I >> can imagine for instance /bin/sh's vnode is being updated a lot... Would >> it be eligible to a becoming a mount option ? > > Bruce did some benchmarking and this approach seemed to be the minimal > hit on performance of the options we have. The other things that got > tested were things like "fake reads". The whole issue started when the > exec mechanisms were shifted away from doing file reads in favor of a > more mmap based mechanism for starting the executables. The impact is so small that it is hard to see in real world tests. Hence microbenchmarks to increase its effect to 10% or so. >> From his tests the hit seemed minimal. The noatime mount option seems > to be the most appropriate thing to use for turning it off, and in that > case the only cost involved with this addition is the check in exec to > see if the file is coming from a filesystem that's either noatime or > readonly. > >> I don't see any real problems with it, but perhaps people running >> executables over NFS filesystems that cannot be mounted with noatime >> might have an issue, like netbooting diskless machines... [In a reply, you clarified this to say that another flag might be needed to disable this new pess^Wfeature since -noatime might not be available for all file systems.] Well, if -noatime is not available then you may have already lost significantly except on write-mostly or exec-mostly file systems. The new behaviour only loses significantly in the exec-mostly case, and then only when execs mostly don't cause reads as a side effect. For nfs, the -noatime option and atime timestamps generally are horribly broken. This brokenness significantly limits the overheads from the change unless we add to the patch to make atime timestamps on exec actually work for nfs without changing nfs's basic mishandling of atimes. An early version of the patch did make atime timestamps sort of work for nfs. It did this by setting the atime in vattr (where the current code intentionally leaves the atime as VNOVAL so that the VOP_SETATTR() call has no effect fof file systems that haven't been changed to understand VA_EXEWCVE_ATIME). This made VOP_SETATTR() set the atime in the same way as for utimes(2), except there was the VA_EXECVE_ATIME flag to modify the behaviour. A modification is needed to bypass permissions checking. I only implemented the modification for ffs. Thus for nfs, the change had much the same overhead as utimes(2) after every exec and permissions stuff was broken. For ffs, atimes are cached and are written by delayed writes so utimes(2) has a relatively low overhead, but for nfs the timestamps written by utimes(2) are considered much more precious than most other timestamps -- they are synced immediately, and this involves a slow nfs transaction and a synchronous write on the server (modulo sync/normal/async mounts and bugs in these), so everything is slowed down; OTOH, other timestamps in nfs are mostly handled more efficiently by not doing them right. More on broken -noatime mount option and atime timestamps in nfs: - Mounting with -noatime on the client has no effect. It is a general bug in the mount utilities that some flags which don't apply to the particular file system are silently ignored. -noatime is one of the generic flags which could in theory work for all file systems, so it is passed to all sub-mounts and is then confusing for the file systems that don't support it. - Mounting with -noatime on the server has an an effect. It stops normal atime timestamps for reads (only). This is usually what is wanted, but strictly it breaks clients mounted without -noatime. - Reads on the client are mostly cached, and nfs apparently isn't aware that _all_ reads should set atime (unless the client is mounted with -noatime), so it doesn't tell the server anything and most reads don't change the atime on the server. It would be too expensive to tell the server about all atime changes, so a cache on the client is needed. A simple local cache would only work if nothing else looks at the timestamps. The cache must somehow be flushed to the server when necessary. Syncing every second might work OK. The thing to avoid is thousands of transactions every second -- a modern system can easily do thousands of reads and/or execs per second provded they are mostly from a local cache. - Execs on the client involve reads on the server unless the file is cached, since although exec() uses mmap() and not read(), uncached files can only be read using read() on the server. Thus for nfs, nothing needs to be changed for atimes to be set for exec() in the same (wrong) way that they are set for read(). - For utimes(2) and some other metdata changes on the client, the client normally wants to force a synchronous change of the metadata on the server. The client has sufficient control of the details in nfs >=3. However, FreeBSD doesn't implement metadata-only sync, so FreeBSD servers have to fake it by syncing everything for the file. This adds to overheads and defeats the server's policy of not carin much about timestamps except for their efficiency. Bruce