From owner-freebsd-current@FreeBSD.ORG Tue Jun 26 07:10:43 2007 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3EC5516A46E for ; Tue, 26 Jun 2007 07:10:43 +0000 (UTC) (envelope-from ssouhlal@FreeBSD.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 2EC2313C455 for ; Tue, 26 Jun 2007 07:10:43 +0000 (UTC) (envelope-from ssouhlal@FreeBSD.org) Received: from [192.168.0.97] (c-76-21-32-5.hsd1.ca.comcast.net [76.21.32.5]) by elvis.mu.org (Postfix) with ESMTP id A4F621A4D81; Tue, 26 Jun 2007 00:09:24 -0700 (PDT) In-Reply-To: <20070626065520.GQ27942@hoeg.nl> References: <46806B3E.2060701@FreeBSD.org> <20070626065520.GQ27942@hoeg.nl> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <24C31FAE-96D1-40EC-9504-D2A4785B5E93@FreeBSD.org> Content-Transfer-Encoding: 7bit From: Suleiman Souhlal Date: Tue, 26 Jun 2007 00:10:37 -0700 To: Ed Schouten X-Mailer: Apple Mail (2.752.3) Cc: current@freebsd.org Subject: Re: [PATCH] Machine Check Architecture on amd64 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jun 2007 07:10:43 -0000 On Jun 25, 2007, at 11:55 PM, Ed Schouten wrote: > * Suleiman Souhlal wrote: >> Hi, >> >> I have a simple patch for amd64 that uses the Machine Check >> Architecture/Exceptions on most recent x86 CPUs to detect memory >> errors: >> >> http://people.freebsd.org/~ssouhlal/testing/mce-20070621.diff >> >> It will report uncorrected and corrected errors (the latter, only >> if sysctl >> machdep.mce.log_corrected=1). >> You can ask the kernel to panic if it gets an uncorrected error >> by setting >> machdep.mce.panic_on_uc=1. >> All this can be disabled by setting the machdep.mce.enable >> tunable to 0. I'm >> still not sure if I want this enabled by default, as I don't have >> any Intel >> machines to test this on, but I have tested it on Opteron (both >> corrected >> and uncorrected errors). >> >> I would appreciate it if someone would try this, especially if >> you have >> Intel machines with bad RAM. >> >> Comments are welcome. > > | /* > | * Uncorrected MCEs will generate a #MC, while corrected > | * don't, so we have to periodically poll for them. > | */ > > What about adding an option to only print uncorrected MCE's? That's > the > most interesting data and we can get that without using a kthread, > right? sysctl machdep.mce.log_corrected=0 machdep.mce.poll_delay=0 will stop reporting the corrected errors and will stop the kthread (but won't actually kill it (I guess I'll fix that before I commit the patch)). Thanks, -- Suleiman