Date: Tue, 7 Oct 2014 18:01:50 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: FreeBSD Filesystems <freebsd-fs@freebsd.org> Cc: Kirk McKusick <mckusick@mckusick.com>, Gleb Kurtsou <gleb@freebsd.org> Subject: RFC: hackish way to support 64bit d_fileno values Message-ID: <134743690.60044002.1412719310927.JavaMail.root@uoguelph.ca>
next in thread | raw e-mail | index | archive | help
Hi, Some file systems (ZFS, NFS) can support 64bit filenos (i-node #s) and in fact I think ZFS will just do so when/if the file system gets large enough. As such, I think FreeBSD needs to somehow support these someday. Changing d_fileno to 64bits the obvious way (defining it as uint64_t) is fraught with challenges. For example, even if UFS copies the on-disk format to this structure, the directory offsets all change and even move to different block numbers for cases where the on-disk directory is tightly packed. As such, I've thought up this (somewhat hackish;-) alternative that I think might allow this to be done without introducing serious backwards compatibility (POLA) violations. - Define a new field I'll call d_filenohigh, that can live at the end of "struct dirent" (after d_name). Define a flag for d_type called DT_FILENOEXT as 0x80, that could be or'd into d_type to indicate that d_filenohigh exists. (Since d_filenohigh lives after d_name, it would actually have to be accessed via a macro of accessor function, but I'll just call it d_filenohigh for simplicity. It would be a uint32_t properly aligned and positioned after the end of d_name. It would be included in d_reclen when it exists.) Now, file systems like UFS which use 32bit filenos wouldn't change at all. They would just generate "struct dirent"s like they do now. File systems that use 64bit filenos (ZFS, NFS, ..) would put the high order 32bits of their fileno in d_dilenohigh and set DT_FILENOEXT. (They could do it always or only when the high order 32bits are non-zero.) The NFS server could check for DT_FILENOEXT set and handle the high order 32bits if it is set. getdirentries() { and compatibility code for Linux... } would just clear the DT_FINENOEXT flag. (ZFS, NFS dirents would appear less densely packed, but I don't think that should matter, so long as they still obey the "don't let a struct dirent straddle 2 DIRBLKSIZ blocks".) getdirentries64() would be implemented that would return directory structures that look like: struct dirent64 { uint64_t d_fileno; uint16_t d_reclen; uint8_t d_type; uint8_t d_namlen; char d_name[MAXNAMELEN + 1]; /* actually variable length */ uint64_t d_diroff; }; (I think POSIX will allow the size of d_fileno to be whatever the OS chooses.) d_diroff - This holds the directory offset "cookie" for the directory entry in the underlying file system this entry was created from. (For example, the byte offset in the UFS directory for UFS or...) --> This field can be used to get to the correct position in the directory via lseek(). I think "basep" isn't sufficient, since the size of each directory entry isn't the same as on-disk, so I think you need one for every directory entry. In turn, telldir(), seekdir() should be able to use these values. (They need to be defined as returning and using uint64_t instead of "long". Hopefully this can be done without too much pain?) I haven't looked at the libc functions yet, so I don't know if there are gotchas. File systems could generate this new "struct dirent64" natively (some MNTK_xx flag could indicate this) and then the old getdirentries(2) would have to do the copying/conversion to old "struct dirent". It could leave "gaps" (ie d_reclen wouldn't change) so the directory offsets remain the same as for the underlying file system that generated them. For file systems that don't generate this new format, getdirentries64(2) would have to do the copying out to the new format, including filling in d_diroff correctly. At some point, I think getdirentries64(2) could be renamed getdirentries(2) and getdirentries(2) renamed to ogetdirentries(2) along with a renaming of the old and new "struct dirent"s. I believe this transition could be delayed until the basic changes had settled in. Comments anyone? rick ps: Gleb, if you weren't the guy who already did some 64bit ino_t stuff, please let me know, so I can forward this to the right guy.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?134743690.60044002.1412719310927.JavaMail.root>