From owner-freebsd-current@FreeBSD.ORG Tue Jun 26 01:55:16 2007 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3E00D16A41F for ; Tue, 26 Jun 2007 01:55:16 +0000 (UTC) (envelope-from ssouhlal@FreeBSD.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 2D6F613C43E for ; Tue, 26 Jun 2007 01:55:15 +0000 (UTC) (envelope-from ssouhlal@FreeBSD.org) Received: from bleh.hutesahuseahues.com (c-76-21-32-5.hsd1.ca.comcast.net [76.21.32.5]) by elvis.mu.org (Postfix) with ESMTP id 3565E1A4D83 for ; Mon, 25 Jun 2007 18:25:10 -0700 (PDT) Message-ID: <46806B3E.2060701@FreeBSD.org> Date: Mon, 25 Jun 2007 18:26:22 -0700 From: Suleiman Souhlal User-Agent: Thunderbird 2.0.0.4 (X11/20070621) MIME-Version: 1.0 To: current@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: [PATCH] Machine Check Architecture on amd64 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jun 2007 01:55:16 -0000 Hi, I have a simple patch for amd64 that uses the Machine Check Architecture/Exceptions on most recent x86 CPUs to detect memory errors: http://people.freebsd.org/~ssouhlal/testing/mce-20070621.diff It will report uncorrected and corrected errors (the latter, only if sysctl machdep.mce.log_corrected=1). You can ask the kernel to panic if it gets an uncorrected error by setting machdep.mce.panic_on_uc=1. All this can be disabled by setting the machdep.mce.enable tunable to 0. I'm still not sure if I want this enabled by default, as I don't have any Intel machines to test this on, but I have tested it on Opteron (both corrected and uncorrected errors). I would appreciate it if someone would try this, especially if you have Intel machines with bad RAM. Comments are welcome. -- Suleiman