From owner-freebsd-hackers@FreeBSD.ORG Mon Oct 20 13:52:08 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9973C16A4BF for ; Mon, 20 Oct 2003 13:52:08 -0700 (PDT) Received: from demos.bsdclusters.com (demos.bsdclusters.com [69.55.225.36]) by mx1.FreeBSD.org (Postfix) with ESMTP id EC18E43FB1 for ; Mon, 20 Oct 2003 13:52:07 -0700 (PDT) (envelope-from kmacy@fsmware.com) Received: from demos.bsdclusters.com (demos [69.55.225.36]) h9KKq7G6073910; Mon, 20 Oct 2003 13:52:07 -0700 (PDT) (envelope-from kmacy@fsmware.com) Received: from localhost (kmacy@localhost)h9KKq75d073907; Mon, 20 Oct 2003 13:52:07 -0700 (PDT) X-Authentication-Warning: demos.bsdclusters.com: kmacy owned process doing -bs Date: Mon, 20 Oct 2003 13:52:07 -0700 (PDT) From: Kip Macy X-X-Sender: kmacy@demos.bsdclusters.com To: hackers@freebsd.org Message-ID: <20031020134532.B63978@demos.bsdclusters.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: process checkpoint restore facility now in DragonFly BSD X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Oct 2003 20:52:08 -0000 At BSDCon '03 it was mentioned that a process checkpoint / restore facility would be a useful addition to FreeBSD. This post is to announce that Matt and I have added such a facility to DragonFly BSD. It is noteworthy for -hackers as anyone who is interested could still port it with relative ease. Basically you use it by kldload'ing the checkpt.ko module, which should now be built automatically. You then ^E the program you want to checkpoint, and use the 'checkpt' utility in /usr/bin to resume it from the checkpoint file. The program is *NOT* killed by this signal, it continues to run after the checkpoint file(s) have been generated. Alternatively, you can send the program any signal that will cause it to coredump and exit. You will then be able to restore from the core dump! In conjunction with a shared file system this can be used for process migration. The checkpoint program is currently designed to work only with simple programs... it will restore the signal, descriptors references to regular files, the VM state (anonymous memory), as well as any nominal file mappings, but it cannot restore sockets, pipes, or device descriptors. So, while you can checkpoint a pipe sequence, you can't really restore it. Pipes, ttys, and common devices (zero, null, bpf) will not be that hard to add. Stream sockets are an open question. Please note that there are *SEVERE* security issues with this module. The module is not loaded into the kernel by default and, when loaded, can only be used by users in the wheel group. You can change the group requirements with a sysctl (see the manual page for checkpt). The security issues relate to the restoration of signals and file descriptors (in particular, the restoration system call will convert file handles into file descriptors which could potentially allow any file in the system to be accessed). Matt has put in some basic security checks but they are not meant to be all encompassing! It is going into the tree now because Matt and I have done enough work on it that anyone else interested in working on it can theoretically dig in. Significant debugging is still in place. We've left it as a module to facilitate debugging. It should be useable for scientific applications now. It should already work considerably better then the linux equivalent what with the regular file descriptor save/restore capability. Any developer who wishes to work on the checkpointing module and related code is welcome to!