From owner-freebsd-hackers Tue Oct 24 22:36:53 1995 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.6.12/8.6.6) id WAA28895 for hackers-outgoing; Tue, 24 Oct 1995 22:36:53 -0700 Received: from ref.tfs.com (ref.tfs.com [140.145.254.251]) by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id WAA28890 for ; Tue, 24 Oct 1995 22:36:51 -0700 Received: (from julian@localhost) by ref.tfs.com (8.6.12/8.6.12) id WAA17892; Tue, 24 Oct 1995 22:36:23 -0700 From: Julian Elischer Message-Id: <199510250536.WAA17892@ref.tfs.com> Subject: Re: process migration To: rcarter@geli.com (Russell L. Carter) Date: Tue, 24 Oct 1995 22:36:22 -0700 (PDT) Cc: terry@lambert.org, bugs@ns1.win.net, hackers@FreeBSD.ORG In-Reply-To: <199510250500.WAA04762@geli.clusternet> from "Russell L. Carter" at Oct 24, 95 10:00:38 pm X-Mailer: ELM [version 2.4 PL24] Content-Type: text Content-Length: 1923 Sender: owner-hackers@FreeBSD.ORG Precedence: bulk > > > > > > > > > > There's also other problems: > > > > > > 1) File as swap store. The executable file is acting as its own > > > swap store; this means you must reopen the file (which means > > > you need its name) and reestablish the flags on the vnode to > > > orevent writes to it. > > write the entire process space including non resident pages.. > > (implies that shared programs become static ) > > > > > > 2) Memory overcommit. There very well may not be enough swap > > > to checkpoint the program. > > put it out to a file....... > > If overcommitted ignore it. Too bad. > > > > > > > 3) Shared libraries. The shared library mappings must be > > > restored, probably seperately. > > static.. quite possibly this might be used in a specialist environment > > (such as what russel is working on,) where shared libs might not be required > > in any case) > > Righto. Cray Research machines have been checkpointing fine for 10 years. Of > course, they only swap and don't page (or didn't use to, I haven't played > with the SPARC stuff). Everything is statically linked. Primitive > model, works fine with the bulk of *their* workload. > > Would work fine with my model too, as long as it just applied to user apps. > > A great deal of effort is expended to protect users from themselves, but > if they need checkpointing, the users often are very savvy about getting > themselves on the boat. That includes app developers too. > > Note: this is for jobs that run a minimum of several days, sometimes > weeks. I could imagine it in the form of a signal that asks the process to pack itself up.. the system supplies the tools to do so, it just has to no which things aren't savable.. then it does a foo() call and when it returns, it's a week later.. and it resumes everything it stopped.. not so good for random processes, but good for specialist stuff > > Regards, > Russell > > >