Date: Wed, 19 Mar 1997 08:40:17 -0700 From: "Russell L. Carter" <rcarter@consys.com> To: Mike Pritchard <mpp@freefall.freebsd.org> Cc: jkh@time.cdrom.com (Jordan K. Hubbard), hackers@freebsd.org Subject: Re: dup3() - I've thought it over and decided... Message-ID: <199703191540.IAA26553@conceptual.com> In-Reply-To: Your message of "Wed, 19 Mar 1997 05:57:58 PST." <199703191357.FAA22301@freefall.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> > As for Cray's implementation, yes, it allows you to create a complete > snapshot of the process, process group, or session. At this point you > could either kill the the proc/pgrp/session for later restart, or allow > it to keep running and only use the snapshot in case of a system crash. > I was involved in some work on this that allowed you to checkpoint the > process on one machine and then restart it on another for load leveling > purposes. > > It was used mainly for checkpoint/restart of long running batch > jobs submitted via NQS, but it was usable with interactive jobs > to a degree. There was on-going work for better interactive > support when I left Cray (see below). There are some other interesting things you can do with this if you have it. Fault tolerant ORBs, for instance. If you've got a mission critical long running app with enough simplicity you can periodically checkpoint to reliable storage and restart on another compatible system with a minimum of fuss should you happen to have any of a myriad number of problems with your first platform. Deep Pockets that have things that sustain damage are funding stuff like this right now :-) I've spent part of the last month looking somewhat superficially into the issues, for SGIs there's something called Hibernator that sorta works. Cray does appear to be the current state-of-the-art. Couple checkpointing/process migration with a queuing system like Codine that understands distributed environments like ORBs, PVM, MPI, etc., and you have the potential for a pretty fault tolerant, distributed computing resource based mainly on off-the-shelf hardware. For long running apps that is, ISPs are a different problem. -- Russell L. Carter Voice:(520) 636-2600 FAX:(520) 636-2888 rcarter@consys.com Conceptual Systems & Software, P.O. Box 1129 Chino Valley AZ 86323 "Before sitting down, always look for ferrets."
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199703191540.IAA26553>