Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 26 Mar 2005 16:30:48 -0500
From:      David Schultz <das@FreeBSD.ORG>
To:        Scott Long <scottl@samsco.org>
Cc:        freebsd-fs@FreeBSD.ORG
Subject:   Re: UFS Subdirectory limit.
Message-ID:  <20050326213048.GA33703@VARK.MIT.EDU>
In-Reply-To: <4244EAFD.1030304@samsco.org>
References:  <200503260011.aa53448@salmon.maths.tcd.ie> <20050326031018.GB41481@VARK.MIT.EDU> <4244EAFD.1030304@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Mar 25, 2005, Scott Long wrote:
> David Schultz wrote:
> >On Sat, Mar 26, 2005, David Malone wrote:
> >
> >>There was a discussion on comp.unix.bsd.freebsd.misc about two weeks
> >>ago, where someone had an application that used about 150K
> >>subdirectories of a single directory. They wanted to move this
> >>application to FreeBSD, but discovered that UFS is limited to 32K
> >>subdirectories, because UFS's link count field is a signed 16 bit
> >>quantity. Rewriting the application wasn't an option for them.
> >>
> >>I had a look at how hard it would be to fix this. The obvious route
> >>of increasing the size of the link count field is trickly because
> >>it means changing the struct stat, which has a 16 bit link count
> >>field. This would imply ABI breakage, though it might be worth it.
> >
> >
> >Why not just...
> >
> >- make a new st_nlink field that's 32 bits and put it in the spare
> >  32-bit field in struct stat
> >
> >- rename the old st_nlink to st_onlink and leave it at 16 bits
> >
> >- the kernel would fill in st_onlink with max(st_nlink,SHORT_MAX)
> 
> I thought that we already discussed this in the past year.  There are
> significant compatibility concerns here.  What happens if you use an
> old fsck binary on a new filesystem?  Since you haven't changed the
> magic, it has no way of knowing that nlink needs to be handled
> differently.  It would make it impossible to share a filesystem between
> different versions of FreeBSD, let alone any other BSD.

First of all, I was only talking about how to avoid badly breaking
the stat ABI, not about how to avoid breaking the on-disk FS
format.  However, I think a similar trick could be applied to the
disk inode.

There are 24 bytes of reserved space in the UFS2 inode that
current versions of fsck ignore, and four of them could be used to
store a larger nlink field.  The old nlink field would still be
kept up-to-date by newer kernels, which would provide reverse
compatibility for older kernels and versions of fsck *provided*
that no directories have more than 32767 files.  Clearly there's a
fundamental limitation that older software won't be able to
properly handle large directories, but at least small directories
in the new format would be backwards compatible.

The only other problem that comes to mind is that older versions
of fsck and older kernels could cause the two nlink fields to get
out of date.  However, for directories, new kernels should be able
to figure out the correct nlink value from the directory contents
when this happens, since hard links to directories are not
allowed.  For regular files, it should be safe to assume the
larger nlink value is the correct one; this may leak storage, but
a new version of fsck would be able to reclaim it.  Furthermore,
this benign inconsistency would only happen in bizarre situations,
such as switching from a new kernel to an old kernel, adding or
removing hard links using the older kernel, and then switching
back to the new kernel.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050326213048.GA33703>