From owner-svn-src-all@freebsd.org Fri Apr 28 18:25:11 2017 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EE4C7D536F4; Fri, 28 Apr 2017 18:25:11 +0000 (UTC) (envelope-from cem@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B019A21B; Fri, 28 Apr 2017 18:25:11 +0000 (UTC) (envelope-from cem@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id v3SIPAkw089577; Fri, 28 Apr 2017 18:25:10 GMT (envelope-from cem@FreeBSD.org) Received: (from cem@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id v3SIPAnU089576; Fri, 28 Apr 2017 18:25:10 GMT (envelope-from cem@FreeBSD.org) Message-Id: <201704281825.v3SIPAnU089576@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: cem set sender to cem@FreeBSD.org using -f From: Conrad Meyer Date: Fri, 28 Apr 2017 18:25:10 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r317567 - head/sys/x86/x86 X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Apr 2017 18:25:12 -0000 Author: cem Date: Fri Apr 28 18:25:10 2017 New Revision: 317567 URL: https://svnweb.freebsd.org/changeset/base/317567 Log: x86 MCA: Fix a deadlock in MCA exception processing In exceptional circumstances, an MCA exception will trigger when the freelist is exhausted. In such a case, no error will be logged on the list and 'mca_count' will not be incremented. Prior to this patch, all CPUs that received the exception would spin forever. With this change, the CPU that detects the error but finds the freelist empty will proceed to panic the machine, ending the deadlock. A follow-up to r260457. Reported by: Ryan Libby Reviewed by: jhb@ Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D10536 Modified: head/sys/x86/x86/mca.c Modified: head/sys/x86/x86/mca.c ============================================================================== --- head/sys/x86/x86/mca.c Fri Apr 28 17:58:15 2017 (r317566) +++ head/sys/x86/x86/mca.c Fri Apr 28 18:25:10 2017 (r317567) @@ -653,7 +653,7 @@ amd_thresholding_update(enum scan_mode m * count of the number of valid MC records found. */ static int -mca_scan(enum scan_mode mode) +mca_scan(enum scan_mode mode, int *recoverablep) { struct mca_record rec; uint64_t mcg_cap, ucmask; @@ -704,7 +704,9 @@ mca_scan(enum scan_mode mode) } if (mode == POLLED) mca_fill_freelist(); - return (mode == MCE ? recoverable : count); + if (recoverablep != NULL) + *recoverablep = recoverable; + return (count); } /* @@ -726,7 +728,7 @@ mca_scan_cpus(void *context, int pending CPU_FOREACH(cpu) { sched_bind(td, cpu); thread_unlock(td); - count += mca_scan(POLLED); + count += mca_scan(POLLED, NULL); thread_lock(td); sched_unbind(td); } @@ -1150,7 +1152,7 @@ void mca_intr(void) { uint64_t mcg_status; - int old_count, recoverable; + int recoverable, count; if (!(cpu_feature & CPUID_MCA)) { /* @@ -1164,20 +1166,18 @@ mca_intr(void) } /* Scan the banks and check for any non-recoverable errors. */ - old_count = mca_count; - recoverable = mca_scan(MCE); + count = mca_scan(MCE, &recoverable); mcg_status = rdmsr(MSR_MCG_STATUS); if (!(mcg_status & MCG_STATUS_RIPV)) recoverable = 0; if (!recoverable) { /* - * Wait for at least one error to be logged before - * panic'ing. Some errors will assert a machine check - * on all CPUs, but only certain CPUs will find a valid - * bank to log. + * Only panic if the error was detected local to this CPU. + * Some errors will assert a machine check on all CPUs, but + * only certain CPUs will find a valid bank to log. */ - while (mca_count == old_count) + while (count == 0) cpu_spinwait(); panic("Unrecoverable machine check exception"); @@ -1199,7 +1199,7 @@ cmc_intr(void) * Serialize MCA bank scanning to prevent collisions from * sibling threads. */ - count = mca_scan(CMCI); + count = mca_scan(CMCI, NULL); /* If we found anything, log them to the console. */ if (count != 0) {