From owner-freebsd-fs Sat May 4 7:31: 9 2002 Delivered-To: freebsd-fs@freebsd.org Received: from gull.prod.itd.earthlink.net (gull.mail.pas.earthlink.net [207.217.120.84]) by hub.freebsd.org (Postfix) with ESMTP id 35BD437B41A for ; Sat, 4 May 2002 07:31:02 -0700 (PDT) Received: from pool0048.cvx22-bradley.dialup.earthlink.net ([209.179.198.48] helo=mindspring.com) by gull.prod.itd.earthlink.net with esmtp (Exim 3.33 #2) id 1740Yu-0006Rr-00; Sat, 04 May 2002 07:31:00 -0700 Message-ID: <3CD3F086.9F400956@mindspring.com> Date: Sat, 04 May 2002 07:30:30 -0700 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Eric Jacobs Cc: fs@freebsd.org, Bakul Shah Subject: Re: Filesystem References: <200205032031.QAA24496@repulse.cnchost.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Eric Jacobs wrote: > > Plan9 does ".." right. The same can be done in Unix by > > storing the rooted path in the kernel for a process'es > > current working dir. and by following some path rewrite > > rules: > > > > //.. == > > //../ == / > > /../ == / > > Those rules aren't valid on the account of syntax alone. You would > have to know which components are symbolic links. And once you take > into account symbolic links, you have essentially what namei does > anyway. > > I think what Terry Lambert was saying was that since hard-linking > directories isn't allowed anyway, there's no need to refcount them, > except for the subdirectory counting tricks. I meant that ".." being treated as a link is useful, because the link count itself can be useful information. However, the trade off is that it limits the number of subdirectories. The trade off in the other direction is that you have to be prepared to descend into the directory. This isn't really that big a deal these days, now that there is an attribute bit indicating the entry is a directory in the directory entry itself, so it's possible to both avoid the stat, and still get the information, if the link count is such that it "indicates" there are no subdirectories. Basically, some software will have to be hacked to traverse a directory for subdirectories, instead of just stat'ing the parent inode, and only traversing if the link count was > 2. The disallowing of hard links on directories was actually my suggestion from ~1994, on the basis of working around POSIX time update requirements for hosted file services. If you pretend that directories are special, and that they aren't files, you can escape from a number of time updates that would otherwise be a "SHALL update" vs. a "SHALL mark for update". Hard links on directories also fail to maintain parent/child relationships properly. Without such links, you are guarantted that you can cache the parent in the child inode, which can let you further speed reverse traversal. Since it was only ever an option for root, it's really no big loss. > > You would also have to deal with middle directories being > > renamed, filesystems being forcibly unmounted and so on. > > > > Not storing the entire path for cwd may have been the right > > decision for '70s but not since then.... > > The entire path is stored indirectly via the VFS name cache, so > getcwd() works _even_ for filesystems which do not implement "..". > Implementing ".." at the VFS level would be just as simple. Probably > the only reason it isn't is because it has been traditionally handled > at the FS level. The cache implementation LRU's it out. Saving the path-on-open works when not doing so fails, only because leaf nodes of type file don't maintain proper parent pointers. The implementation at the VFS level should be handled by having real vnodes/inodes for hard links. Maintaing the link-to-link relationship would require some additional overhead, but it's minor. Doing this would also allow you to store the parent inode of any inode... and since non-leaf inodes are always guaranteed to be directories, the recoverability of any open file's path to the root is guaranteed. If 128 bytes is too large a stretch, it can be done with smaller "link nodes", but the net effect is the same: by moving the link out to an abstract FS artifact, rather than an artifact of a count and a directory entry, you gain a lot of benefit. > > > In any case, it's still an incredibly bad idea to have even a tenth of > > > that man objects in a single directory, period. > > > > IMHO it is a bad idea to not have evolved directories to use a B-tree > > representation (at least when the number of entries exceed some > > threshold. Implement mechanisms and leave policies to the users! > > If you can handle access considerations yourself, one creative solution > might be to use getfh(2) and fhopen(2) and store the file handles however > way you want. This bypasses the kernel lookup entirely. I mentioned this, as a means of getting a flat (inode) name space. The only real problem with this (and it's a doozy!) is that the fsck process expects to have a real directory from which it can derive the reference count, or the inode is considered "lost" and will end up in "lost+found" on the next fsck, as an FS inconsistency. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message