From owner-freebsd-arch Mon Mar 11 23:20:38 2002 Delivered-To: freebsd-arch@freebsd.org Received: from pintail.mail.pas.earthlink.net (pintail.mail.pas.earthlink.net [207.217.120.122]) by hub.freebsd.org (Postfix) with ESMTP id 3401B37B402; Mon, 11 Mar 2002 23:20:33 -0800 (PST) Received: from pool0134.cvx40-bradley.dialup.earthlink.net ([216.244.42.134] helo=mindspring.com) by pintail.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 16kgaD-0002FR-00; Mon, 11 Mar 2002 23:20:29 -0800 Message-ID: <3C8DAC19.B1ED58B@mindspring.com> Date: Mon, 11 Mar 2002 23:19:53 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Garance A Drosihn Cc: Robert Watson , Harti Brandt , Poul-Henning Kamp , arch@FreeBSD.ORG Subject: Re: Increasing the size of dev_t and ino_t References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Garance A Drosihn wrote: > If UFS2 requires a 64-bit (u)ino_t, then we're going to have to > make some kind of change to the struct returned by stat(). We > have also talked about wanting 64-bit fields for time values in > that same struct. The more I think about it, the more I think > we should just move towards a 64-bit field for (u)dev_t at the > same time. Maybe we should wrap these all up into one major > change, so we can have a st_dev+st_ino which can handle all > existing filesystems (with some room for expansion). It's more complicated than just struct stat, I think. The "struct dirent" that's returned by getdirentries(2) contains a 32 bit file ID on the FS (d_fileno). This is actually equal to the value of st_ino from the stat, and the man page makes it clear that this is the case, at least as far as the st_ino semantics that Garrett Wollman quoted out of the POSIX spec. (more recent than my 1988 copy). The man page says: The d_fileno entry is a number which is unique for each distinct file in the filesystem. Files that are linked by hard links (see link(2)) have the same d_fileno. The struct dirent is actually a system version of the directory information, intended to be FS independent. It's externalized from the on disk directory structure. It's "coincidental" that it matches the FFS on disk structure. One of the issues here is that it *does not* match the externalized value that's sent over NFS; among other things, this is the reason for the "cookie" argument to the VOP_READDIR per VFS interface. A side issue (not worth discussing at this point, but worth keeping in mind) is that there is also a fundamental assumption in this interface that all directory entries within a directory are on the same volume. THis actually is not true for the entries which are directories which have been used as mount points, and may also not be true for a translucent FS with e.g. a CDROM and a seperate FFS image unioned to make the CDROM image writeable. It may also not be true on a per file basis, if the moral equivalent of symlinks are implemented in the lookup space, rather than in the FS namespace (e.g. folding of the namespace for various purposes). So it's probably a good idea to rethink this interface in any case, to externalize the per-file st_dev information, as well (if the interface is going to be changing anyway, it might as well be more correctly "a collection of stat information"). And that's one example of an exposure other than the "stat" interface. THe POSIX file locking semantics are another, though the translation (if any) would be internalized in a layer in the kernel. So you're not just talking a change to the "stat" structure, you are talking, minimally, either a conversion function, or a change to the system representation (to maintain the historical "coincidental" match between the on disk structure for UFS2 and the system structure). This has translational implications, both for the NFS mapping space and for the ABI modules (e.g. the Linux ABI). I'll suggest (again) that what wants to happen here is that the VOP_READDIR needs to be broken into two operations: one to get a block reference, atomically, and another, to take a block reference in native format, and convert entries on a case-by-case basis to an externalized format. I don't know the UFS2 intent on namespace, but it's likely that if it's well thought out, it will be two byte Unicode, so that fixed field length guarantees are maintained (UTF-7 or UTF-8 encoded data with escapes for the path component seperator "/" and the ASCII NUL character are unacceptable, both from the need to escape the character when it occurs in a valid multibyte character, and from the inability to make path component length guarantees, per POSIX). So to sum up: 1) It's not just struct stat 2) There are real client FS's in common use that will be impacted by such a change 3) There are FS consumers that aren't FS's that will also be impacted by such a change, including the ABI code I expect that we will see changes in these areas anyway, but it's a good idea to keep in mind that these changes are not as trivial as they might appear on casual inspection. It would be nice if the brekage were a single event that were not often repeated (everything at once), and if there were some backward compatability strategy well thought out before it became a dire need. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message