From owner-freebsd-current@FreeBSD.ORG Thu Apr 7 21:13:02 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A469C16A4CE for ; Thu, 7 Apr 2005 21:13:02 +0000 (GMT) Received: from critter.freebsd.dk (f170.freebsd.dk [212.242.86.170]) by mx1.FreeBSD.org (Postfix) with ESMTP id D665043D2D for ; Thu, 7 Apr 2005 21:13:01 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.13.3/8.13.1) with ESMTP id j37LD0tF020332 for ; Thu, 7 Apr 2005 23:13:00 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: current@freebsd.org From: Poul-Henning Kamp Date: Thu, 07 Apr 2005 23:13:00 +0200 Message-ID: <20331.1112908380@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Subject: GEOM architecture and the (lack of) need for foot-shooting X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Apr 2005 21:13:02 -0000 I can see that almost everybody is short a crucial couple of pieces of the puzzle so let me try to straighten out some of the many points which have been fired across on the subject of GEOM and footshooting. First of all, there are tools which do not do the right thing. Amongst these are bsdlabel, fdisk, boot0cfg and sysinstall. Most of them do something right but none of them gets everything right. Yet. Let me try to explain what should happen using the deletion of a MBR partition as example: If the disk has an MBR which defines three partitions and one of these are open, then the MBR cannot be written to without informing the GEOM_MBR instance which implements the contents of that MBR. The correct way to do that is to use the g_ctl() api because what is needed is an out-of-band mechanism to tell that we want to loose one of the partitions. g_ctl() has not been fully implemented in all classes yet, and therefore what we currently do is open one of the partitions and issue an ioctl which hits the GEOM_MBR instance. This worked fine until recently where it was discovered that one could issue ioctls which did "write like" stuff on a filedescriptor open only for read. This is pretty counter to what people exepect and we fixed it. The problem with that is that there may not be any partition we can open for write, they may all be opened by something else (mounted) and therefore our attempt to open will fail. That is where things stand today. (I'll speak at length about the subject of ioctl and in-band/out-of-band communications at BSDcan. Be there!) Now, why havn't you finished GEOM ? I hear. Well, many reasons. For one thing I wanted to see how it panned out in all sorts of ways before I went any further, it is important to stop up every so often and see if the direction is still sound. Second, there was a lot of talk about sysinstallNG at the time and I thought that would be a great time to revise the entire userland-edit-disk-layout thing. Third, I needed a break from it. Finally, I would give others a chance to join in and help out. In a project like this, people tend not to disturb developers who seem to be on a roll, no matter how much the developer begs for assistance. The only effective way to get others to join in is to step away and make space for them. Now, here is a list of what needs to be done in this general area: 1. Find out which partition format we migrate to instead of BSDlabel which runs out of steam around 2TB. GPT has been proposed but seems to be a rather dead end with Itanic sinking fast. MBR(ext) is not really a solution, they suffer from the same thing as far as I know. Somebody needs to make a decision and it is not really a technical thing. 2. Implement g_ctl in the various slicers and in their userland tools. (geom_bsd, geom_mbr, geom_mbrext, geom_apple, geom_..., bsdlabel, fdisk, boot0cfg, sysinstall etc). Somebody smart would implement a sensible generic library and a semi-standardize set of request to all slicers. This of course would be hard work. 3. Implement orphan methods in our filesystems and teach them about media which disappear. 4. Stop thinking in quick hacks and start to think in long term architecture. Just because you could do it in SystemIII doesn't mean that it was the right way to do it, and just because you think a quick fix is all that is needed doesn't make it so either. GEOM is a bold step in disk-I/O architecture, but Rome wasn't built in one day and GEOM won't be built in one release-cycle either. Anybody who expect me to do all of this singlehandedly can take a peek here http://people.freebsd.org/%7Epeter/srcsys.window.txt and go stick their head in a bucket of cold water before telling me I have to work harder. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.