From owner-freebsd-fs@FreeBSD.ORG Tue Apr 14 10:08:56 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8B0C110656F0 for ; Tue, 14 Apr 2009 10:08:56 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au [211.29.132.191]) by mx1.freebsd.org (Postfix) with ESMTP id 23D528FC08 for ; Tue, 14 Apr 2009 10:08:55 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c122-107-120-227.carlnfd1.nsw.optusnet.com.au (c122-107-120-227.carlnfd1.nsw.optusnet.com.au [122.107.120.227]) by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id n3EA8pmG014702 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 14 Apr 2009 20:08:53 +1000 Date: Tue, 14 Apr 2009 20:08:51 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Rick Macklem In-Reply-To: Message-ID: <20090414180826.J53102@delplex.bde.org> References: <20090413193936.A52183@delplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org Subject: Re: changing semantics of the va_filerev (code review) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Apr 2009 10:08:57 -0000 On Mon, 13 Apr 2009, Rick Macklem wrote: > On Mon, 13 Apr 2009, Bruce Evans wrote: >> va_gen/i_gen/di_gen/st_gen seems to be even more suitable for this >> purpose, but it isn't actually a file generation number like its >> comments say (it is normally set to a random value on file creation >> then never changed) and it is exposed to userland (st_gen). >> > i_gen is used by NFS to create T-stable (valid for a long time, including > a long time after the file is removed) file handles. It is used by > ffs_vptofh() to create the file handles for NFS that are recognized as > representing removed files, even after an i-node gets reused such that > the i-node number now represents another file. Oops, I missed that since nfs's use of i_gen is indirect. What does nfs do for file systems that don't detect removed files, e.g., msdosfs. vptofh and fhtovp routines seem to have too many differences. E.g., file systems based on ffs return ESTALE for removed files, but zfs_fhtovp() returns EINVAL. I just noticed than the increment of i_gen was slightly broken for ffs by a type mismatch in ffs2 (affects ffs1 too). Originally, i_gen had the same type as di_gen (int32_t). Now i_gen has type int64_t but in ffs1, di_gen of course still has type int32_t, and in ffs2, di_gen still has type int32_t (apparently there was insufficient space to expand it). This makes the overflow check in ffs_alloc.c (++ip->i_gen == 0) more broken than before. Previously it only gave undefined behaviour followed by a bogus check when overflow occurs for incrementing from INT32_T_MAX. Now it has no effect, since it takes 293 years of incrementing at a rate of 1GHz to reach overflow at INT64_T_MAX. Overflow now occurs on assignment to di_gen. The result of this bug is almost the the same as removing the silly part of the security code -- the re-randomization on overflow. i_gen may grow larger than UINT32_T_MAX, but usually refresh from the dinode will keep it smaller. When it starts near UINT32_T_MAX and grows larger, the overflow on assignment and a subsequent refresh will make it nearly 0. Except, in 1 in every 2**32 cases, when the overflow makes di_gen exactly 0, the subsequent refresh will randomize i_gen. >> va_ctime should give what you want for all file systems, since it >> should be increased whenever anything changes. However, most file > There are some places where IN_UPDATE gets set, but IN_CHANGE doesn't. Are there? This would be a bug. I checked that ffs doesn't have this bug. > Since the Change attribute must change for every file modification, I > feel safer incrementing it for both IN_UPDATE and IN_CHANGE. (It's 64bits, > so it won't wrap around for a little while.) It would be a large and obvious bug to modify the file data (IN_UPDATE) without setting IN_CHANGE. >> systems always set the nsec part to 0, so va_ctime doesn't track >> all file changes. This is a problem for things like make(1) too, >> so if nsec timestamps aren't available or are take too long or are >> not fine-grained enough, the nsec part should be abused as a generation >> counter so that any change gives a strictly larger timestamp. The >> case where someone sets the clock backwards is broken but won't >> happen often. >> >> Many nonstandard file systems, e.g., msdosfs, have >> no space for an on-disk ctime, so they fake va_ctime using an on-disk >> mtime. Since such file systems don't have many attributes, only a >> few more cases are broken. >> > Yep, that's why ctime/mtime aren't sufficient. > If a read/write file system doesn't have support for it, all you > can do is fake it and hope the client works ok. I suspect the Linux They need to be fixed or faked well enough for make(1) too. When the dinode has no space to spare, something can be done by keeping state in the inode or vnode. This won't work across reboots of course (except by hashing a reboot counter into the generation counts or timestamps) but might be enough for all short-term uses. I'm not sure how much is safe here. Bruce