Date: Fri, 24 Jun 2011 18:09:04 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Garance A Drosehn <gad@FreeBSD.org> Cc: freebsd-fs@FreeBSD.org, Robert Watson <rwatson@FreeBSD.org> Subject: Re: [rfc] 64-bit inode numbers Message-ID: <1656190156.1051008.1308953344203.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <4E04FC7F.6090801@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Garance A Drosehn wrote: > On 6/23/11 6:26 PM, Kostik Belousov wrote: > > On Thu, Jun 23, 2011 at 06:05:56PM -0400, Garance A Drosehn wrote: > > > >> Consider the thread "Increasing the size of dev_t and ino_t" from > >> freebsd-arch in 2002: > >> > >> http://docs.freebsd.org/mail/archive/2002/freebsd-arch/20020317.freebsd-arch.html > >> > >> In particular, this message by Robert Watson: > >> > >> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=139853+0+archive/2002/freebsd-arch/20020317.freebsd-arch > >> > >> I just participated in an online conference for OpenAFS, and while > >> it > >> isn't exactly taking the world by storm, I keep thinking it would > >> be > >> useful if FreeBSD could map individual AFS volumes to unique dev_t > >> identifiers. And given the way AFS is implemented (as a global FS > >> with many cells all reachable at the same time), and given the way > >> most > >> sites deploy AFS (with thousands or tens-of-thousands of individual > >> AFS volumes *per site*), that adds up to a lot of values for dev_t. > >> > >> The upcoming release of OpenAFS should include a working and pretty > >> stable AFS client for FreeBSD, so having a larger dev_t would have > >> a more immediate application than it did back in 2002. > >> > > Am I right that the issue is the uniqueness of the dev_t for each > > AFS volume, as reported by stat(2) ? > > > > Shouldn't the AFS client synthesize the dev_t for each new volume > > mounted ? It seems that the current 32bit dev_t would be enough, > > since I do not expect to see hundreds of thousands of mounts > > on an single system. > > > > Please note that we do not guarantee dev_t stability across reboots > > even for real devices. > > > The AFS cell at RPI has approximately 40,000 AFS volumes, and each > volume should have it's own dev_t (IMO). That's just counting the > collection of AFS volumes which are on RPI file servers, and any > user sitting on one computer could access AFS volumes which are > made available by other sites (aka "AFS cells"). Most RPI users > would only have access to maybe 1/4 of those volumes which exist > at RPI, but we do know that individual users have run 'find' over > the entire RPI cell looking for whatever they're looking for. I > once did a run of 'md5deep' on the entire RPI cell, thanks to a > symlink which I didn't realize was in my home directory! > Note that it the value in mnt_stat.f_fsid that needs to be unique w.r.t other mount points in the machine. If AFS appears to be one mount point in the FreeBSD client, then the only issue I know of is how the client is expected to handle changes in dev_t within the mount point. fts(3) and friends will assume that it is a mount point crossing when st_dev changes. It will then expect that the funny rule that the d_ino in dirent will not be the same as st_ino. What I do for NFSv4 is sythesize the mnt_stat.f_fsid value and return that as st_dev for the mounted volume until I see the fsid returned by the server change. Below that point, I return the fsid from the server as st_dev so long as it isn't the same as the synthesized one. That way, fts(3) and friends figure out the mount point crossings within the server. "ls -lR" will usually find problems if this is broken. > So one person can easily trigger the access of 10,000 AFS volumes > on one computer using one command. That might sound terrifying if > you imagine it as being 10,000 NFS mounts, but accessing AFS volumes > isn't the same amount of work as auto-mounting NFS filesystems. > So ignore whatever problems you might expect to see with 10,000 > filesystems mounted on one computer. Just realize that it is very > easy for a single user to access tens of thousands of AFS volumes > from one computer, and it would be "most correct" (programming wise) > if all of those AFS volumes were to get a unique value for dev_t. > And of course it's even easier for a remote-access system to access > tens-of-thousands of AFS volumes, since it would have a few dozen > users logged in at the same time. > > Obviously most computers never access even 30,000 AFS cells before > they (as the AFS client) will reboot, but I'm wondering how much > overhead is there in trying to make sure that many different volumes > are mapped to unique dev_t numbers. > > Please realize that I do not mind if people felt that there was no > need to increase the size of dev_t at this time, and that we should > wait until we see more of a demand for increasing it. But given the > project to increase the size of inode numbers, I thought this was a > good time to also ask about dev_t. I ask about it every few years :-) > > -- > Garance Alistair Drosehn = gad@gilead.netel.rpi.edu > Senior Systems Programmer or gad@freebsd.org > Rensselaer Polytechnic Institute or drosih@rpi.edu > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1656190156.1051008.1308953344203.JavaMail.root>