From owner-freebsd-hackers@FreeBSD.ORG  Fri Jan 14 02:05:23 2005
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id AC08A16A4CF
	for <freebsd-hackers@freebsd.org>;
	Fri, 14 Jan 2005 02:05:23 +0000 (GMT)
Received: from afields.ca (afields.ca [216.194.67.132])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C4E2443D41
	for <freebsd-hackers@freebsd.org>;
	Fri, 14 Jan 2005 02:05:18 +0000 (GMT)
	(envelope-from afields@afields.ca)
Received: from afields.ca (localhost.afields.ca [127.0.0.1])
	by afields.ca (8.12.11/8.12.11) with ESMTP id j0E25Gqa074691;
	Thu, 13 Jan 2005 21:05:16 -0500 (EST)
	(envelope-from afields@afields.ca)
Received: (from afields@localhost)
	by afields.ca (8.12.11/8.12.11/Submit) id j0E25GL0074690;
	Thu, 13 Jan 2005 21:05:16 -0500 (EST)
	(envelope-from afields)
Date: Thu, 13 Jan 2005 21:05:16 -0500
From: Allan Fields <bsd@afields.ca>
To: Brooks Davis <brooks@one-eyed-alien.net>
Message-ID: <20050114020516.GD26802@afields.ca>
References: <Pine.GSO.4.50L0.0501121412570.2985-100000@faith.cs.utah.edu>
	<20050112214002.GA21038@odin.ac.hmc.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20050112214002.GA21038@odin.ac.hmc.edu>
User-Agent: Mutt/1.4i
cc: freebsd-hackers@freebsd.org
cc: Siddharth Aggarwal <saggarwa@cs.utah.edu>
Subject: Re: process checkpoint restore facility now in DragonFly BSD
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Jan 2005 02:05:24 -0000

On Wed, Jan 12, 2005 at 01:40:02PM -0800, Brooks Davis wrote:
> On Wed, Jan 12, 2005 at 02:17:38PM -0700, Siddharth Aggarwal wrote:
> > 
> > I am responding to a post back in Oct 2003 when the checkpointing feature
> > was announced for DragonFly. I have been doing some research on this, and
> > have seen some projects that use Xen VMM to achieve checkpoints of guest
> > OSes.
> > 
> > So I was looking for inputs from people as to what everyone feels about
> > checkpointing, whether it should be done at the physical machine level or
> > VM level. Pros and Cons of each approach, if any further development was
> > done on DragonFly for checkpoint since then and if it was stopped, why?
> > Are there serious limitations to checkpointing a physical machine?
> > 
> > Sorry for such a vague posting, but I thought this would be a good
> > platform to get some feedback.
> 
> The DragonFly lists would be the logical place to discuss DragonFly
> features.
> 
> From my perspective as a scientific computing user, VM level
> checkpointing is it little use since I get the overhead of the VM and
> I can't easily do the application level checkpointing required to
> checkpoing distributed programs.  There are probably a number of places
> where it is useful in scientific computing, but I don't find it to be
> all that intresting.

IMHO, it all depends on if process checkpointing is made practical
and reliable enough to be employed for non-trivial programs.  I'm
not entirely convinced if a single system checkpoint is the
ultimate answer though that is certainly highly desirable.

One potential drawback with full system images is the lack of
support for runtime checkpoints (multiple process checkpoints) and
the lack of a framework for process migration and/or persistence
of a subset of the processes on a system.

Persistence is almost non-existent at all levels and sessioning
weak.  A whole solution is needed (integrating the two).  The work
thus far shouldn't be brushed off so easily as a multi-tiered approach
could be of benefit.

Each level of persistence offers it's own pros and cons:
	- Scope & Granularity of operation (degrees flexibility in
	  specification, checkpoint set);
	- Storage options;
	- Interface; - Means of Coordination;
	- etc.

For process checkpoint: The means to coordinate checkpoints and
satisfy order of dependency between processes under checkpoint is
a next step in the implementation path.

Building on previous email:
  *     Process Checkpointing Support:
	[..]
        An often overlooked application to process-level persistence
        is fault-tolerance.  It might be possible to have a process
        survive an otherwise fatal system panic and/or hardware
	failure.  [With-out having to resume from a whole system
	checkpoint.]
	[..]


> -- Brooks
> 
> -- 
> Any statement of the form "X is the one, true Y" is FALSE.
> PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

-- 
 Allan Fields, AFRSL - http://afields.ca
 2D4F 6806 D307 0889 6125  C31D F745 0D72 39B4 5541