Date: Tue, 14 Apr 2009 11:59:02 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Bruce Evans <brde@optusnet.com.au> Cc: freebsd-fs@freebsd.org Subject: Re: changing semantics of the va_filerev (code review) Message-ID: <Pine.GSO.4.63.0904141110590.13047@muncher.cs.uoguelph.ca> In-Reply-To: <20090414180826.J53102@delplex.bde.org> References: <Pine.GSO.4.63.0904121553050.3872@muncher.cs.uoguelph.ca> <20090413193936.A52183@delplex.bde.org> <Pine.GSO.4.63.0904131036310.27001@muncher.cs.uoguelph.ca> <20090414180826.J53102@delplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 14 Apr 2009, Bruce Evans wrote: [stuff snipped] > > Oops, I missed that since nfs's use of i_gen is indirect. What does > nfs do for file systems that don't detect removed files, e.g., msdosfs. > vptofh and fhtovp routines seem to have too many differences. E.g., An nfs client can always think that a file exists for a short period of time (until client side caches time out) after it has been removed locally or by another client, on the server. The more serious failure occurs when the i-node/directory entry gets reallocated. At that point, the client might access the attributes/data of the new file, thinking it was the old file. (In the worst case, this could persist until the client does a umount() of the file system.) However, typically, unless it has the file open when the file is removed locally on the server or by another client, nothing nasty will happen. (And I think if the client has name caching disabled, nothing nasty can happen.) At least, that's my best guess at an answer. > file systems based on ffs return ESTALE for removed files, but > zfs_fhtovp() returns EINVAL. > I don't know why zfs would choose a different errno, but I don't think that a different errno will have much effect. It's a terminal error in either case. (I can't think of anything clever that a client can do for ESTALE. I wouldn't be surprised if some clients end up translating ESTALE to EINVAL, since POSIX apps don't expect ESTALE.) I suppose someone could argue it violates the RFC, but only if they know that the server should generate NFS3ERR_STALE for that case. > I just noticed than the increment of i_gen was slightly broken for ffs > by a type mismatch in ffs2 (affects ffs1 too). Originally, i_gen had > the same type as di_gen (int32_t). Now i_gen has type int64_t but in > ffs1, di_gen of course still has type int32_t, and in ffs2, di_gen > still has type int32_t (apparently there was insufficient space to > expand it). This makes the overflow check in ffs_alloc.c (++ip->i_gen > == 0) more broken than before. Previously it only gave undefined > behaviour followed by a bogus check when overflow occurs for incrementing > from INT32_T_MAX. Now it has no effect, since it takes 293 years of > incrementing at a rate of 1GHz to reach overflow at INT64_T_MAX. > Overflow now occurs on assignment to di_gen. > > The result of this bug is almost the the same as removing the silly > part of the security code -- the re-randomization on overflow. i_gen > may grow larger than UINT32_T_MAX, but usually refresh from the dinode > will keep it smaller. When it starts near UINT32_T_MAX and grows > larger, the overflow on assignment and a subsequent refresh will make > it nearly 0. Except, in 1 in every 2**32 cases, when the overflow makes > di_gen exactly 0, the subsequent refresh will randomize i_gen. > Sounds like you have a better understanding of this than I. Since all nfs really cares about is that the value of i_gen has changed after the i-node is re-allocated, I doubt this causes grief in practice. Personally, I'd just leave it as a 32bit number and initialize it to some pseudo-random value in a range that is a small fraction of UINT32_T_MAX (maybe 1<->1000000) if it is 0, otherwise just increment it by a small value. (I've already noted that I'm not a big fan of security by obscurity anyhow:-) >>> va_ctime should give what you want for all file systems, since it >>> should be increased whenever anything changes. However, most file >> There are some places where IN_UPDATE gets set, but IN_CHANGE doesn't. > > Are there? This would be a bug. I checked that ffs doesn't have this > bug. > Oops, my mistake. I grep'd again and see it is IN_CHANGE that gets set without IN_UPDATE and not the other way around, which makes sense, since I can't think of how you can modify the data without modifying some attribute. So, the Change attribute only needs to change for IN_CHANGE (with all those uses of "change", it must be good:-). Thanks for pointing this out. > > They need to be fixed or faked well enough for make(1) too. > > When the dinode has no space to spare, something can be done by keeping > state in the inode or vnode. This won't work across reboots of course > (except by hashing a reboot counter into the generation counts or > timestamps) but might be enough for all short-term uses. I'm not sure > how much is safe here. > Yes, definitely. I think doing something like having an in-memory field for va_filerev/i_modrev where the high order bits are initialized by ctime (using whatever bits are valid, given tod clock resolution) when read in and then incrementing by 1 for each change, would be a good compromise. rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.0904141110590.13047>