Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 Oct 2014 18:01:50 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Cc:        Kirk McKusick <mckusick@mckusick.com>, Gleb Kurtsou <gleb@freebsd.org>
Subject:   RFC: hackish way to support 64bit d_fileno values
Message-ID:  <134743690.60044002.1412719310927.JavaMail.root@uoguelph.ca>

next in thread | raw e-mail | index | archive | help
Hi,

Some file systems (ZFS, NFS) can support 64bit filenos (i-node #s)
and in fact I think ZFS will just do so when/if the file system
gets large enough. As such, I think FreeBSD needs to somehow support
these someday.

Changing d_fileno to 64bits the obvious way (defining it as
uint64_t) is fraught with challenges. For example, even if
UFS copies the on-disk format to this structure, the directory
offsets all change and even move to different block numbers
for cases where the on-disk directory is tightly packed.

As such, I've thought up this (somewhat hackish;-) alternative
that I think might allow this to be done without introducing
serious backwards compatibility (POLA) violations.

- Define a new field I'll call d_filenohigh, that can live at the
  end of "struct dirent" (after d_name).
  Define a flag for d_type called DT_FILENOEXT as 0x80, that could
  be or'd into d_type to indicate that d_filenohigh exists.
  (Since d_filenohigh lives after d_name, it would actually have to
   be accessed via a macro of accessor function, but I'll just call
   it d_filenohigh for simplicity. It would be a uint32_t properly
   aligned and positioned after the end of d_name. It would be
   included in d_reclen when it exists.)

Now, file systems like UFS which use 32bit filenos wouldn't change
at all. They would just generate "struct dirent"s like they do now.

File systems that use 64bit filenos (ZFS, NFS, ..) would put the
high order 32bits of their fileno in d_dilenohigh and set DT_FILENOEXT.
(They could do it always or only when the high order 32bits are non-zero.)

The NFS server could check for DT_FILENOEXT set and handle the high
order 32bits if it is set.

getdirentries() { and compatibility code for Linux... } would just clear
the DT_FINENOEXT flag. (ZFS, NFS dirents would appear less densely packed,
but I don't think that should matter, so long as they still obey the
"don't let a struct dirent straddle 2 DIRBLKSIZ blocks".)

getdirentries64() would be implemented that would return directory
structures that look like:
   struct dirent64 {
      uint64_t d_fileno;
      uint16_t d_reclen;
      uint8_t  d_type;
      uint8_t  d_namlen;
      char  d_name[MAXNAMELEN + 1]; /* actually variable length */
      uint64_t d_diroff;
   };
(I think POSIX will allow the size of d_fileno to be whatever the
 OS chooses.)

d_diroff - This holds the directory offset "cookie" for the directory
           entry in the underlying file system this entry was created from.
           (For example, the byte offset in the UFS directory for UFS or...)
           --> This field can be used to get to the correct position in
               the directory via lseek(). I think "basep" isn't sufficient,
               since the size of each directory entry isn't the same as
               on-disk, so I think you need one for every directory entry.
               In turn, telldir(), seekdir() should be able
               to use these values. (They need to be defined as returning
               and using uint64_t instead of "long". Hopefully this can
               be done without too much pain?)
            I haven't looked at the libc functions yet, so I don't know
            if there are gotchas.

File systems could generate this new "struct dirent64" natively (some MNTK_xx
flag could indicate this) and then the old getdirentries(2) would have to do the
copying/conversion to old "struct dirent". It could leave "gaps"
(ie d_reclen wouldn't change) so the directory offsets remain the same as for
the underlying file system that generated them.
For file systems that don't generate this new format, getdirentries64(2) would
have to do the copying out to the new format, including filling in d_diroff
correctly.

At some point, I think getdirentries64(2) could be renamed getdirentries(2) and
getdirentries(2) renamed to ogetdirentries(2) along with a renaming of the
old and new "struct dirent"s.
I believe this transition could be delayed until the basic changes had settled in.

Comments anyone? rick
ps: Gleb, if you weren't the guy who already did some 64bit ino_t stuff, please
    let me know, so I can forward this to the right guy.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?134743690.60044002.1412719310927.JavaMail.root>