Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Nov 2014 18:45:52 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: RFC: patch to make d_fileno 64bits
Message-ID:  <420608613.5215411.1416613552066.JavaMail.root@uoguelph.ca>
In-Reply-To: <20141121155754.GN17068@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Kostik wrote:
> On Thu, Nov 20, 2014 at 10:19:14PM -0500, Rick Macklem wrote:
> > The attached patch covers the basics of a way to
> > convert the d_fileno field of "struct dirent" to
> > 64bits. This patch is incomplete and won't even
> > build, but I thought I'd post it in case anyone
> > wanted to take a look and comment on the approach
> > it uses.
> > 
> > - renames the old/current one "struct dirent32"
> > - changes d_fileno to 64bits and adds a 64bit
> >   d_off field for the offset of the underlying
> >   file system
> > - defines a new VOP_READDIR() that will return
> >   the new "struct dirent" that is used as the
> >   default one for a new getdirentries(2).
> > - the old/current getdirentries(2) uses the old
> >   VOP_READDIR32() by default.
> > 
> > For the case of a file system that supports both
> > the new and old VOP_READDIR(), they are used by
> > the corresponding new and old getdirentries(2)
> > syscalls.
> > 
> > For a file system that only supports one of
> > the VOP_READDIR()s, the "struct dirent32"
> > is copied to "struct dirent" (or vice versa).
> > 
> > At this point, all file systems would support
> > the old VOP_READDIR() and I think the new
> > VOP_READDIR() can easily be added for NFS,
> > ZFS. (OpenBSD already has UFS code for
> > essentially a new struct dirent and hopefully
> > that code could be ported easily, too.)
> > 
> > Anyhow, any comments on this approach? rick
> 
> I do not think we need to have in-kernel compatibility shims.
> The work, big but relatively trivial, is to convert filesystems to
> use the new ino_t, even if the on-disk structures still use 32bit
> inode number.
> 
What about old binaries that do getdirentries(2) and expect the old
structure with 32bit d_fileno or the linux compatibility stuff?
I suspect that there are some old staticly linked binaries out there
that does/expects the old getdirentries.

Having said that, most apps will use readdir(3). Do we need to somehow
allow old binaries work with a newer libc? (If so, that's going to be
really nasty. I had assumed that old libc code would do old
getdirentries(2) and, as such, having a working old and new getdirentries(2)
would handle old binaries?

I was trying to avoid data copying for the case of an old getdirentries(2)
by having file systems provide VOP_READDIR() calls for both old and new
structures.
It is certainly possible to have all file systems only produce the new
"struct dirent" and then just do data copying/conversion to the old one.

Btw, I think the new getdirentries(2) will need additional arguments,
since the offset for the underlying file system needs to be provided
along with the "logical offset", which is the byte offset within the
directory being returned as "struct dirent"s.

> Really problematic part of this change is the usermode ABI breakage.
> The struct dirent is only the start of the whole issue. ino_t is
> embedded into more structures which are part of the contract, e.g.
> struct stat.  We have to provide new syscalls which accept or return
> the affected structures.
> 
> And then, there are libraries which embed ino_t into their ABI.
> Immediate example is fts(3) in libc. Look at the FTSENT.fts_ino. Even
> after the base system is fixed by properly providing the compat shims
> and symbol versions for the affected libraries, we get the same
> problem
> with the binaries not from base.
> 
> Summary of the issue with ino_t is that it is not too hard to fix the
> kernel, comparing with the ABI issues which must be solved in
> usermode.
> 
> 
Yes, I was just going to look at d_fileno as a starting point.
(For whatever reason d_fileno isn't defined as ino_t?)

I was specifically avoiding any use of "ino_t" and saw it as something
that needed to eventually change to 64 bits at the very end.
I was aware of Gleb Kurtsou's work, but didn't realize it lived
in projects/ino64 and he had mentioned that he was busy, but
would try and find time to update the patch.
I will look at projects/ino64 and it sounds like Kirk
would like to figure it all out in projects/ino64 and
eventually do a "super patch" to head. This sounds fine
to me, if we can pull it off.

rick




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?420608613.5215411.1416613552066.JavaMail.root>