Date: Wed, 11 Mar 1998 00:40:23 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: shimon@simon-shapiro.org Cc: tlambert@primenet.com, hackers@FreeBSD.ORG Subject: Re: Fault tolerance issues Message-ID: <199803110040.RAA13485@usr08.primenet.com> In-Reply-To: <XFMail.980310120648.shimon@simon-shapiro.org> from "Simon Shapiro" at Mar 10, 98 12:06:48 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> >> I always wondered why this is not so. Not even after sync(2). > > > > With the old sync process (updated, not syncer), it wasn't very > > cost effective. It would happen on every sync. > > ``Cost Effective'' in what way? Losing a critical file, or corrupting an > on-line database is a lot less effective than n% loss of speed. N can be > pretty large here, if you ask users who are in the know. > > Again, a switch will be the best solution. Dia in the level of security or > reliability you desire. You misunderstand. A soft read-only marking is only instituted if there is no dirty data to be written. The difference is in reboot time, since the *only* thing wrong with the disk is the clean flag isn't set and the superblock information that isn't automagically replicated is out of sync. It saves you fsck time after an ungraceful shutdown from a quiescent state, nothing more. It wasn't cost effective in the sense that if you marked and unmarked the thing after every sync, you were marking it frequently enough that the unmarking represented a significant start latency in the median to high load case (where the sync wrote all outstanding dirty data, but there would immediately be more dirty data that needed written the next time). With the syncer process, the sync clock puts a delay between when the last data in and the last data out -- a sliding window in which you would not "unship the heads" so to speak. The entire window would need to be emptied for you to mark the volume soft read-only, and you would have an entire sync clock in which to "unship the heads". Basically, you'd implement this by saying "at the next sync interval, immediatle write the superblock as dirty, and on completion, mark the FS non-soft-RO. > > The difference is SFT (*Software* Fault Tolerance); that's why Novell > > is still making money in the server market (or at least one of the > > reasons). > > How many Novell servers have you seen without a UPS behind them? Generally, or at Novell? Generally, quite a few. Novell has this luxury because their threading is coopertive tasking with explicit yield (unless you yield, all operations run to completion, so you are never more than one operation away from ground state; like running with sync mounts on the pre-soft updates FFS). > Again, MHO is that software should protect against abrupt termination as > well as it can. But, it is OK to clearly define the constraints, and say > ``For this I need at least n seconds of continued processing time''. "Seconds" is a *long* time. I was thinking no more than 30uS or even 25uS in those country too poor to afford 60 sine waves per duty cycle ;-). If you are thinking about checkpoint/restart -- well, that's a whole different ballgame. You will need to either revisit memory overcommit, or have a checkpoint reserve equal to the amount of kernel memory plus a startup reserve to less you restore state (or reserve a set amount of main RAM for the job, but that's wasteful). > > If I have to write all the code myself, it'll be a long time before > > it gets done. But If I'm serious, I'll write it in vanilla K&R so > > I can run the C++ branch path analysis tool from the comp.unix.sources > > archives on it. > > I may be blind, and behind the times, but, aside from formal prototypes, I > fail to see what really improved in the C language since K&R. That's not the point. The point is that I can automatically generate code coverage tests for K&R C but not for ANSI C. 8-(. > > One component that's being overlooked here is QA as opposed to QC. > > QC is management measurable (almost). QA is more of a moral issue. This is where the people who hate ISO 9000 begin to hate it. The management involvement always seems to take the form of "what would we like to have measurements on" rather than "what is measurable". I dread this type of QC management. QA is more "how can I guarantee that the code matches the intent of the code". This is a discussion we should take offline, unless there is a seperate list where it's appropriate. > The weakness of this environment is that, at times, we bite more than we > can chew, and that, in the FreeBSD in particular, our efforts are difused; > We work on a lot of different big things. Instead, we should try to form > task forces which work on specific things, broadening or exeprtise level, > and ensuring maturity of features. rather than count. Well, organize away! ...8-) > > I would definitely like to see someone produce a PrestoServ card for > > FreeBSD (for example). This would get it into a hell of a lot of > > traditionally "big iron" shops. > > And what is a PrestoServ card? Battery backed RAM for stable storage of NFS writes. Uh... "hard updates"... 8-). > > The whole fault tolerance issue is "how do I make small iron look like > > big iron without getting Tony Overfield to redesign the PC?". > > > > 8-). > > And who is Tony Overfield? You are talking to some ignorant audience here Engineer at Dell. Argues hardware and BIOS, occasionally. Good source for feedback about "how do I talk to PC hardware instead of non-perverse hardware". Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199803110040.RAA13485>