From owner-freebsd-arch Sun Mar 10 13:16:42 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mail.rpi.edu (mail.rpi.edu [128.113.22.40]) by hub.freebsd.org (Postfix) with ESMTP id 880DE37B404 for ; Sun, 10 Mar 2002 13:15:58 -0800 (PST) Received: from [128.113.24.47] (gilead.acs.rpi.edu [128.113.24.47]) by mail.rpi.edu (8.12.1/8.12.1) with ESMTP id g2ALFuoX034080; Sun, 10 Mar 2002 16:15:57 -0500 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <35384.1015748266@critter.freebsd.dk> References: <35384.1015748266@critter.freebsd.dk> Date: Sun, 10 Mar 2002 16:15:55 -0500 To: Poul-Henning Kamp From: Garance A Drosihn Subject: Re: Increasing the size of dev_t and ino_t Cc: arch@FreeBSD.ORG Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: MIMEDefang 2.3 (www dot roaringpenguin dot com slash mimedefang) Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG At 9:17 AM +0100 3/10/02, Poul-Henning Kamp wrote: >In message Garance A Drosihn writes: > >I don't see how this would work for OpenAFS. By that I mean that >>I do not know how the dev_t-pointer that you're talking about is >>used when implementing something like OpenAFS or ARLA support. > >I have no idea what the problem would be, so you will have to tell >me before I can answer you... Well, this will be an answer from the user-land perspective. It is only an observation of the number of "devices" involved, because I don't know the details of the underlying implementation. So, pick up a few grains of salt, and let's try the following... First, my starting assumption on the significance of the st_dev value. My take on that value is that if two files have the same value for their device, then you could remove one of those files and hardlink the other file to the name of the removed file. Hardlinks can not cross device boundaries, but if these two files have the same value for st_dev then that hard link would not be crossing a device boundry. Or, another way to think of it is that if two files have the same device-number, and if they both have an st_nlink count of 1, then removing one of those files will result in more space being available for the expansion of the other file. (perhaps after a reboot, to eliminate the question of open file descriptors keeping that first file around even though you have unlinked it). I do not know if the appropriate standards would agree with me on these views, but they seem like a logical premise. Otherwise, a st_dev value would have no special meaning at all. In afs/openafs/arla, the "device" (in the above sense of the word) is the AFS-volume. Disk quotas are applied at the AFS-volume-level. AFS also has the notion that the administrator can move a volume around between disk-partitions, or even disk-servers, without the user noticing. So, for disk-balancing purposes (among other things), it is ideal to have a lot of small-ish volumes instead of trying to cram as much as possible in each volume. There is also the concept of "read-only" vs "read-write" volumes. Every "read-only" cell would have a matching "read-write" cell, but they would be different devices as far as this st_dev value is concerned. Each read-write volume can also have a "backup volume", which is the snapshot of that read-write volume as it was at the time of the most recent backup (it is also read-only in nature). Thus, an AFS-cell tends to have a lot of volumes. In the AFS cell at RPI, there are over 12,000 user accounts, each of which has it's own AFS-volume (for disk-quota purposes), and each of which has a AFS-backup-volume. That's 24,000 volumes just for home directories, and I am sure we have well over 32K AFS-volumes in the AFS-cell at RPI. It's possible we have over 64K distinct AFS-volumes in the cell, but I don't know how to come up with the exact count for that. When running AFS, the machine effectively mounts all AFS cells that are defined in a file called 'CellServDB'. On our public unix machines, we define 163 different AFS-cells. Most of those AFS-cells are smaller than RPI's AFS-cell, but certainly all the volumes in those cells add even more unique devices. One way around all these devices would be to just create st_dev numbers on the fly, as each volume is referenced, and cache that value until the next reboot. That is probably workable, but I am a little uneasy about it because we (RPI) also had at least one professor who liked to do a 'find' of EVERYTHING in RPI's afs-cell, looking for any publicly-readable files, which he then provided in a file listing for anyone who was curious. What would happen to the machine he runs that 'find' command on? So, let's drop back and say my initial premise is wrong. Maybe the st_dev value is just an arbitrary number with absolutely no special meaning. We then have the question of how to map all of these AFS-volumes into st_dev values (where you might map multiple AFS-volumes into a single st_dev value, just so you have fewer unique st_dev values). Some care would have to be taken in how that mapping is done, just to be sure that two files which are in different AFS-volumes are recognized as different files even if they have the same value for st_ino. I have not looked into the openAFS source code yet, but from what I can see I would guess that what AFS uses for a volume ID (what *it* uses to keep track of each volume) is a 32-bit number. I'm seeing volume-id values like 537,315,825, for instance. At the same time that we have all these volumes, we can't assume that all volumes will have less than (say) 32K distinct inodes in them. The AFS-volumes for user's home directories are pretty small, but we (RPI) have other AFS-volumes which are hundreds of megabytes, and which thus can contain a lot of files. I *think* we even have a few AFS-volumes which are gigabyte-sized, but I know we try our best to discourage larger AFS-volumes. So, my basic observation is that with UFS2 we're probably going to want to increase the size of st_ino, but I would argue that we can't do that by *shrinking* the size of st_dev, and that I would also argue that it would make more sense to increase the size of st_dev at the same time we increase st_ino. -- Garance Alistair Drosehn = gad@eclipse.acs.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message