From owner-freebsd-fs@FreeBSD.ORG Fri Jun 24 22:09:05 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9FC161065672; Fri, 24 Jun 2011 22:09:05 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 399B78FC19; Fri, 24 Jun 2011 22:09:04 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap0EAFYKBU6DaFvO/2dsb2JhbABShEmjb7hXkHKBK4N4gQoEj32BfpAu X-IronPort-AV: E=Sophos;i="4.65,421,1304308800"; d="scan'208";a="125126348" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 24 Jun 2011 18:09:04 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 34BBEB3E96; Fri, 24 Jun 2011 18:09:04 -0400 (EDT) Date: Fri, 24 Jun 2011 18:09:04 -0400 (EDT) From: Rick Macklem To: Garance A Drosehn Message-ID: <1656190156.1051008.1308953344203.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <4E04FC7F.6090801@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - SAF3 (Mac)/6.0.10_GA_2692) Cc: freebsd-fs@FreeBSD.org, Robert Watson Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jun 2011 22:09:05 -0000 Garance A Drosehn wrote: > On 6/23/11 6:26 PM, Kostik Belousov wrote: > > On Thu, Jun 23, 2011 at 06:05:56PM -0400, Garance A Drosehn wrote: > > > >> Consider the thread "Increasing the size of dev_t and ino_t" from > >> freebsd-arch in 2002: > >> > >> http://docs.freebsd.org/mail/archive/2002/freebsd-arch/20020317.freebsd-arch.html > >> > >> In particular, this message by Robert Watson: > >> > >> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=139853+0+archive/2002/freebsd-arch/20020317.freebsd-arch > >> > >> I just participated in an online conference for OpenAFS, and while > >> it > >> isn't exactly taking the world by storm, I keep thinking it would > >> be > >> useful if FreeBSD could map individual AFS volumes to unique dev_t > >> identifiers. And given the way AFS is implemented (as a global FS > >> with many cells all reachable at the same time), and given the way > >> most > >> sites deploy AFS (with thousands or tens-of-thousands of individual > >> AFS volumes *per site*), that adds up to a lot of values for dev_t. > >> > >> The upcoming release of OpenAFS should include a working and pretty > >> stable AFS client for FreeBSD, so having a larger dev_t would have > >> a more immediate application than it did back in 2002. > >> > > Am I right that the issue is the uniqueness of the dev_t for each > > AFS volume, as reported by stat(2) ? > > > > Shouldn't the AFS client synthesize the dev_t for each new volume > > mounted ? It seems that the current 32bit dev_t would be enough, > > since I do not expect to see hundreds of thousands of mounts > > on an single system. > > > > Please note that we do not guarantee dev_t stability across reboots > > even for real devices. > > > The AFS cell at RPI has approximately 40,000 AFS volumes, and each > volume should have it's own dev_t (IMO). That's just counting the > collection of AFS volumes which are on RPI file servers, and any > user sitting on one computer could access AFS volumes which are > made available by other sites (aka "AFS cells"). Most RPI users > would only have access to maybe 1/4 of those volumes which exist > at RPI, but we do know that individual users have run 'find' over > the entire RPI cell looking for whatever they're looking for. I > once did a run of 'md5deep' on the entire RPI cell, thanks to a > symlink which I didn't realize was in my home directory! > Note that it the value in mnt_stat.f_fsid that needs to be unique w.r.t other mount points in the machine. If AFS appears to be one mount point in the FreeBSD client, then the only issue I know of is how the client is expected to handle changes in dev_t within the mount point. fts(3) and friends will assume that it is a mount point crossing when st_dev changes. It will then expect that the funny rule that the d_ino in dirent will not be the same as st_ino. What I do for NFSv4 is sythesize the mnt_stat.f_fsid value and return that as st_dev for the mounted volume until I see the fsid returned by the server change. Below that point, I return the fsid from the server as st_dev so long as it isn't the same as the synthesized one. That way, fts(3) and friends figure out the mount point crossings within the server. "ls -lR" will usually find problems if this is broken. > So one person can easily trigger the access of 10,000 AFS volumes > on one computer using one command. That might sound terrifying if > you imagine it as being 10,000 NFS mounts, but accessing AFS volumes > isn't the same amount of work as auto-mounting NFS filesystems. > So ignore whatever problems you might expect to see with 10,000 > filesystems mounted on one computer. Just realize that it is very > easy for a single user to access tens of thousands of AFS volumes > from one computer, and it would be "most correct" (programming wise) > if all of those AFS volumes were to get a unique value for dev_t. > And of course it's even easier for a remote-access system to access > tens-of-thousands of AFS volumes, since it would have a few dozen > users logged in at the same time. > > Obviously most computers never access even 30,000 AFS cells before > they (as the AFS client) will reboot, but I'm wondering how much > overhead is there in trying to make sure that many different volumes > are mapped to unique dev_t numbers. > > Please realize that I do not mind if people felt that there was no > need to increase the size of dev_t at this time, and that we should > wait until we see more of a demand for increasing it. But given the > project to increase the size of inode numbers, I thought this was a > good time to also ask about dev_t. I ask about it every few years :-) > > -- > Garance Alistair Drosehn = gad@gilead.netel.rpi.edu > Senior Systems Programmer or gad@freebsd.org > Rensselaer Polytechnic Institute or drosih@rpi.edu > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"