From owner-freebsd-hackers@FreeBSD.ORG Tue Feb 14 14:53:45 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0BA98106566C for ; Tue, 14 Feb 2012 14:53:45 +0000 (UTC) (envelope-from maninya@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id C1BF28FC08 for ; Tue, 14 Feb 2012 14:53:44 +0000 (UTC) Received: by ghbg15 with SMTP id g15so34450ghb.13 for ; Tue, 14 Feb 2012 06:53:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=MXtDrTc9uD717xWmK+4IPTcXaLh+CV6uyavLHY/JSvw=; b=HjJj5iQrBjL3sE4BR7H7s+oDx5YvHUD4Rk+J/m9+RBLu/GWEL0hWCqwvlvhrUA5KlW YZ85hgqlXxd6Z1aevDCQ/s33+GZi6YO/DdZiNggVagKQrHZz5A/kh2yQCSfRhYoysxss kMV5V3bsmBKTLubHAce/SBr1T615YKCWs9sJk= MIME-Version: 1.0 Received: by 10.236.124.206 with SMTP id x54mr26797526yhh.112.1329229423688; Tue, 14 Feb 2012 06:23:43 -0800 (PST) Received: by 10.146.243.17 with HTTP; Tue, 14 Feb 2012 06:23:43 -0800 (PST) Date: Tue, 14 Feb 2012 19:53:43 +0530 Message-ID: From: Maninya M To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: OS support for fault tolerance X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Feb 2012 14:53:45 -0000 For multicore desktop computers, suppose one of the cores fails, the FreeBSD OS crashes. My question is about how I can make the OS tolerate this hardware fault. The strategy is to checkpoint the state of each core at specific intervals of time in main memory. Once a core fails, its previous state is retrieved from the main memory, and the processes that were running on it are rescheduled on the remaining cores. I read that the OS tolerates faults in large servers. I need to make it do this for a Desktop OS. I assume I would have to change the scheduler program. I am using FreeBSD 9.0 on an Intel core i5 quad core machine. How do I go about doing this? What exactly do I need to save for the "state" of the core? What else do I need to know? I have absolutely no experience with kernel programming or with FreeBSD. Any pointers to good sources about modifying the source-code of FreeBSD would be greatly appreciated.