From owner-freebsd-fs Tue Apr 30 17:58:40 2002 Delivered-To: freebsd-fs@freebsd.org Received: from harrier.prod.itd.earthlink.net (harrier.mail.pas.earthlink.net [207.217.120.12]) by hub.freebsd.org (Postfix) with ESMTP id 9335037B417 for ; Tue, 30 Apr 2002 17:58:34 -0700 (PDT) Received: from pool0580.cvx40-bradley.dialup.earthlink.net ([216.244.44.70] helo=mindspring.com) by harrier.prod.itd.earthlink.net with esmtp (Exim 3.33 #2) id 172iRz-0005Aq-00; Tue, 30 Apr 2002 17:58:32 -0700 Message-ID: <3CCF3D98.3495D84D@mindspring.com> Date: Tue, 30 Apr 2002 17:58:00 -0700 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: utsl@quic.net Cc: "Andrew P. Lentvorski" , freebsd-fs@freebsd.org Subject: Re: Non-standard root filesystems References: <20020429153020.Q16532-100000@mail.allcaps.org> <3CCEC7D5.D22356A0@mindspring.com> <20020430204153.GB3603@quic.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org utsl@quic.net wrote: > On Tue, Apr 30, 2002 at 09:35:33AM -0700, Terry Lambert wrote: > > FreeBSD treats root mounts as "special", relative to all other > > mounts. This is a design error, but overcoming it requires a > > reorganization of the mount code that's not really politically > > easy to accomplish, even though it's technically very easy. > > > > Some of the stuff Poul is doing right now will probably help > > you in the future with assembing things like RAID-able > > volumes in the future -- but not help you right now. > > Linux has a syscall (pivot_root) to swap the root with another mounted > filesystem. It is occasionally quite useful, and I've been wondering > about implementing it (or something similar) on FreeBSD. > > Possibly you can tell me why that wouldn't work, or would be a bad > idea. Doing that would be very hard. The way mount points work won't exactly make it impossible, but it won't make it easy. Here's the architectural fix: 1) Seperate the mount point covering code from the per FS mounting code. 2) Add a seperate VOP for setting the "mounted on" information into the superblock (some FS's, like FFS, like to record the "last mounted on" information; this is actually not used for anything that I've ever seen (right now), so it would probably be OK to rip out completely (right now; it could later be useful for automounting and getting rid of /etc/fstab entirely). 3) When mounting an FS at the VFS_MOUNT layer, simply get a pointer into the list of mounted file systems. *DO NOT* deal with the mount point covering at all in the per FS code! 4) Deal with the mount point covering in the higher level code; this reduces the amount of crap you have to parse in a per FS manner anyway. The covering is done by referencing the FS in the system mounted FS layer from #3 (above). At this point, from the VFS perspective, all mounts -- root and non-root -- are exactly the same: you implement the one type of mount (the "fill in this mount table entry and set up the in core mount structure data" kind), and it's taken care of... the only difference between a root and a non-root mount is the vnode covering code for the mount, and that all uses the same code at a higher layer. This would also make your "pivot" FS work correctly... to do that, you would have to cover an opaque vnode. You could actually do this with any vnode, by revoking the vnode, and making it a deadfs vnode. > > As far as software RAID is concerned: it's a bad idea, from a > > performance perspective; I don't recommend it. Note that I'm > > the person who did the original user space RAIDframe port to > > FreeBSD in the mid 1990's, so I'm not just talking out my butt: > > the amount of overhead for parity calculation and storage is > > *considerable*, and makes RAID hardware a *much* better idea. > > I agree with you about the performance. Hardware RAID is faster, more > reliable, uses less resources, etc. However, many people don't have the > budget for it. I guess they don't get RAID. 8-) 8-) 8-) 8-). > In my case, I have production systems running Linux with software RAID. > I would much rather run hardware RAID and FreeBSD, but I have no budget > to buy SCSI RAID controllers. Switching to FreeBSD+Vinum would be a > reasonable solution, but I can't mirror root, and that creates a > political problem. I get, "If FreeBSD and Vinum will be better, how come > you can't mirror the root filesystem?" How does mirroring the root FS recover after an error? If you can't load the kernel to load the software RAID, then you can't run the software RAID to recover from a failure, right? How does Linux solve this problem? *Does* Linux solve this problem, or are we really talking about an unrecoverable condition that Linux lets you get yourself into, but FreeBSD doesn't? -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message