Date: Tue, 24 Oct 1995 22:36:22 -0700 (PDT) From: Julian Elischer <julian@ref.tfs.com> To: rcarter@geli.com (Russell L. Carter) Cc: terry@lambert.org, bugs@ns1.win.net, hackers@FreeBSD.ORG Subject: Re: process migration Message-ID: <199510250536.WAA17892@ref.tfs.com> In-Reply-To: <199510250500.WAA04762@geli.clusternet> from "Russell L. Carter" at Oct 24, 95 10:00:38 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> > > > > > > > > > There's also other problems: > > > > > > 1) File as swap store. The executable file is acting as its own > > > swap store; this means you must reopen the file (which means > > > you need its name) and reestablish the flags on the vnode to > > > orevent writes to it. > > write the entire process space including non resident pages.. > > (implies that shared programs become static ) > > > > > > 2) Memory overcommit. There very well may not be enough swap > > > to checkpoint the program. > > put it out to a file....... > > If overcommitted ignore it. Too bad. > > > > > > > 3) Shared libraries. The shared library mappings must be > > > restored, probably seperately. > > static.. quite possibly this might be used in a specialist environment > > (such as what russel is working on,) where shared libs might not be required > > in any case) > > Righto. Cray Research machines have been checkpointing fine for 10 years. Of > course, they only swap and don't page (or didn't use to, I haven't played > with the SPARC stuff). Everything is statically linked. Primitive > model, works fine with the bulk of *their* workload. > > Would work fine with my model too, as long as it just applied to user apps. > > A great deal of effort is expended to protect users from themselves, but > if they need checkpointing, the users often are very savvy about getting > themselves on the boat. That includes app developers too. > > Note: this is for jobs that run a minimum of several days, sometimes > weeks. I could imagine it in the form of a signal that asks the process to pack itself up.. the system supplies the tools to do so, it just has to no which things aren't savable.. then it does a foo() call and when it returns, it's a week later.. and it resumes everything it stopped.. not so good for random processes, but good for specialist stuff > > Regards, > Russell > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199510250536.WAA17892>