Date: Tue, 30 Mar 1999 22:07:43 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: toor@dyson.iquest.net (John S. Dyson) Cc: tlambert@primenet.com, unknown@riverstyx.net, dyson@iquest.net, freebsd-chat@FreeBSD.ORG Subject: Re: Linux vs. FreeBSD: The Storage Wars Message-ID: <199903302207.PAA05079@usr04.primenet.com> In-Reply-To: <199903302028.PAA16589@dyson.iquest.net> from "John S. Dyson" at Mar 30, 99 03:28:23 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> > Why doesn't FreeBSD FS stacking work? > > It never did, and there hasn't been much demand. Hey, speak for yourself. I've gone so far as to approach John Heidemann about rereleasing his code donated to the CSRG under the GPL for a Linux implementation (yes, I'm deadly serious). John's stuff worked before it was damaged into inoperability, and it currently works fine on BSDI. > It actually would be worthwhile to totally remove the stacking, or fix > it with a VM approach. It is totally wrong to use buffer/VP approach, > but there are those who advocate it (too many people are "bp" heads -- > bp's are good only for I/O, not object or caching representation.) It is wrong to think of vnodes as caching objects instead of backing objeccts. Yes, I know all of the unified VM and buffer cache centric arguments in favor of this, but the point of having a well defined framework and API is the ability to share FS code with other OS's. And not all other OS's have unified VM and buffer caches. Implemetnation of a common API must take into account the lowest common denominator, or you will be creating a FreeBSD specific API that is not generally useful. > FS stacking will not help to gain commercial work, but properly working > reasonbly sized file I/O does. It is (or should be, since the announcement on these lists last week) well known that Veritas is porting to Linux. This code would work in FreeBSD as well, if the Linux and FreeBSD VFS stacking frameworks were identical API's. Because the API's are not identical (in fact, both are sufficiently fluid and architecturally damaged as to render them nearly useless), this work is bound to be another checkmark in the Linux column that will remain absent from the FreeBSD column. > > Why was X.25 broken, and then > > not fixed? > > I don't know. I guess that it is an orphan, where the implementation hasn't > been commercially interesting (or the companies aren't contributing the work > back.) It certainly isn't interesting from a research or fun standpoint. > Since companies like Whistle do networking, it would be nice to see a > contribution in that area? (I know that there is supposedly alot of X.25 > stuff out there, but why hasn't it been supported? Answer: apparently > other types of X.25 interfacing methods are being used.) The real answer is that it was broken when someone was permitted to change interfaces upon which it depended, but was not thereafter held accountable for keeping the "unsexy" code working. To use your terminology, "it was a cowboy what done it". > > Why was LFS broken, and then not fixed? > > It was always broken, and has always been basically a festering mess. I think Margo Seltzer would take some issue with this. I would trust her authority as an FS expert above that of anyone in the core team; after all, file systems are her life's work. > LFS wasn't rewritten because softupdates has been the better answer > for most of what LFS can do. Soft updates as they are realized in FreeBSD are a tiny fraction of what they could, and should, have been. I have had discussions with Both Ganger and Patt via email, and discussions in person with Kirk about the general soloution for the problem. The FreeBSD soloution is far from general (or dependencies would be capable of spanning stacking layers, and it would be possible to build a transaction system into the kernel, accessible from user space, and making such implied data consistency guarantees as need to support true database systems). While Dr. McKusick has made some good points in favor of the less general soloution (including "that's not what Whistle paid him to do"), his arguments about dependency representation are not among them. There is no reason for the dependency representation to bloat up as a result of generalizing the relationships. The code that is lacking generality is not the dependency representation, nor the dependency conflict resoloution, but in fact the conflict and dependent event registration mechanisms. Right now, the edges and the nodal relationships are hard coded in the structure of the code. It is entirely possible to replace this code with code that implements resolver and event-of-interest registration at the time filesystems are instanced. Yes, this requires either a Warshall's algorithm at instance time to precalculate the relationships -- BUT THIS IS NOT RUNTIME OVERHEAD, any more than the VOP descriptor arrays should constitute runtime overhead. Clever use of Hamiltonians would allow incremental caluclation of Warshall's by precomputing everything but leaf nodes. Sedgewick discusses this algorithm in his book. Even so, there is still a reason for Journalling and Logging. If nothing else, it allows for deterministic failure recovery, whereas asoft updates merely guarantee consistency, without any recourse for software fault tolernace in the fact of implied relationships (e.g., the relationship between a "rerods" file and an "index" file in a simple relational database). These issues can not be resolved until it is possible to acknowledge a transaction as having completed ONLY AFTER SUFFICIENT INFORMATION IS COMMITTED TO STABLE STORAGE, SUCH THAT IT MAY BE ROLLED FORWARD AFTER A FAILURE. This distinction is of paramount importances, and can not be over-emphasized. > It is totally wrong to implement a bp > based LFS anyway, note the hacks in vfs_bio to support that travesty. With respect, these are historical artifacts that also applied to the FFS of the same code vintage, and which predate the unification of the VM and buffer cache code. This is a case of failure to cross "T"'s and dot "I"'s during the VM and buffer cache unification wherein the equivalent FFS issues *were* addressed. Code does not mutate. If code stops working, it is a failure in maintenance, not a failure of the code (presuming it worked beforehand -- and LFS did; it merely lacked a cleaner process to deal with issue like garbage collection and fragmentations -- issues addressed in later versions of Margo's code). > > Why does the VM > > system like to write password database pages back to the crontab, if > > you stress the system by running newsyslog once a minute from a cron > > that modifes copy-on-write pages mmap'ping the password database into > > code, as if the pointers in the pwent pointed back to static buffers > > in the C library? > > Which version? and please PR it. I have *never* seen it in person recently, > and locally hacked kernels can cause unexpected brokenness. The problem > of modified programs has been fixed a long time ago. Also, it has taken > awhile to find someone competent to work on the VM/VFS code. There > is a possibility now, but most of the people with the "balls" to work > on the code with commit frenzies, are often not careful enough to do so. I believe Matt has much of this in hand. But it is certainly not finding it's way back into 2.x-STABLE, per the developement model. Yes, I know that -current is 4.x now, and 3.1-STABLE is the maintenance target, but the fact remains that these problems were identified during the period of time when the 2.x-STABLE branch was *supposedly* being actively maintained. I *personnaly* identified two of these problems, in great gory detail, and their existance was "pooh-pooh"'ed until 2.x was no longer an active maintenance release. I had to fincd explicity demonstration cases for the people who didn't feel like bothering to try to follow my theoretical arguments, and refused to work from anything but concrete examples. > Time for cowboys is LONG LONG gone, and it seems that cowboys are the most > commonly available resource. What do you expect, when you set up camp outside Dodge City? Bankers? > > In a more general sense: Why are most of the Usenix papers scheduled > > this year not about work done on FreeBSD, if FreeBSD is the premeire > > research OS? Where is the research? > > Alot of work is done privately. Research != papers, there is NO advantage > for a FreeBSD team member to give away the mechanisms for FreeBSD's behavior. Malarkey. What do you care if the software running the ATM machine and using the correct algorithm is FreeBSD, or some other software using the correct algorithm? The point of the exercise is to increase overall correctness in the world. What's the point of using a BSD license, if the intent is not to spread the code as far and as wide as possible? C.v. TCP/IP. Obscurity hurts everyone. The obscurity of the VM algorithms (not to pick favorites, but the VM system is one place where complexity was allowed to grow in FreeBSD unshackled by the "we must understand this if you do" mentality) was, in fact, damaging to Matt's ability to contribute. It was not Matt's cowboy nature, but rather the inability of a core team to impose a vetting process on somone who could spend between 12 and 16 hours a day coding on nothing but FreeBSD. > > Linux has shown a willingness to implement design that FreeBSD has > > only given lip service to, time and again. Linux is, unfortunately, > > where research is taking place. > > The ones doing real work will continue > to use BSD for now. I don't consider the catchup game that you have alluded > to as "research", but only catchup. You are confusing "catchup" with > research. Do you see the difference? (Linux's VM research is a very > entertaining example: can you say lots of knobs that you need to tweak?) > FreeBSD's VM has lots of knobs, but those knobs are only desirable for > atypical configurations. I see the difference. However, the VM system is about the only place that this can be inexpertly defended. All other places, Linux is close enough that you have to defend such issues with very hard facts. But compare either to SVR4 ES/MP, or Dynix, of 5 years ago, and both FreeBSD and Linux have areas which are still *laughably* primative, with no apparent interest or desire to address them. SMP is one such area; a firm DDI/DKI is another. > You can talk a good talk, but I would have adopted your work if it > was worthwhile to do so (I wanted to, in fact.) I didn't have the > energy to maintain the mess that your changes would have caused. It > is better to deal with the mess one knows, rather than the mess > that one doesn't :-). The changes that did get adopted were good, > but did require support. > > (Sometimes your stuff was good, but much of the time, not complete > enough.) "Better the devil you know" has never been a sound technological argument. I don't need to reach into my own arse for my examples (though such examples abound); I can point at the networking stuff that Garret did, which was brilliant, but which was ripped out due to it not being completed in what someone arbitrarily decided was a timely pashion. There is code from Julian, PHK, and Bruce Evans that falls into this same category. William's serial driver code, or Vadim Antonov's floppy tape driver design (from BSDI). There are literally thousands of such examples. > > Julian's right; someone needs to do real architectural work. > > Time for Julian/Terry BSD. You've been reading too much advocacy. I have had sufficient opportunity for such a thing in the past. And I have resisted. I have resisted not only my own opportunity, but that of others, as well. Schism is not the answer, unless you have a social framework ready to go in the post-schism universe. I am frankly of the opinion now that much of "the FreeBSD problem" is a macro effect of a micro rule, imposed by the tools available, and, in fact, CVS in particular. Many macro behaviours derive from micro rules which prohibit individual behaviours which are, in fact, available to the group. Like the patchkit before it (something which, sociologically, I still deeply regret), the use of CVS in the current system limits the size, length, magnitude, and duration of branches which diverge from the common vision (and common visions, themselves, are myopic by their natures). > I am not really interested in armchair > quarterbacks unless they are willing and able to help solve the problems. And likewise, for people willing and able to accept that help. The sword of Damocles is a two edged blade, as were most gladius's, and that blade cuts both ways. > One reason why my code hadn't made it into FreeBSD's tree when I left, > was because of QC issues. It takes restraint to keep from hacking the > tree, and yet there is the need for architectural work. But who? Pick someone. Someone with a vision in excess of six months. Pick Kirk McKusick, if he's willing, or David Greenman, if he can be freed from the morass of crises an minutia into which has obviously been dragging him away from the architects drafting table. Don't involve the architect(s) (or allow them to involve themselves!) in the petty day-to-day infighting. But for God's sake, pick someone. > There are precious few people available to do the FreeBSD architectural > work (who are competent enough.) I do not include myself in that group, > but would support someone who is willing and able. If my work would not > be wasted, I would aggressively support such a developer (and continually > do in the background.) As would I. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199903302207.PAA05079>