From owner-freebsd-current@FreeBSD.ORG Fri May 14 15:42:45 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1DDE71065672; Fri, 14 May 2010 15:42:45 +0000 (UTC) (envelope-from matthew.fleming@isilon.com) Received: from seaxch09.isilon.com (seaxch09.isilon.com [74.85.160.25]) by mx1.freebsd.org (Postfix) with ESMTP id EF6488FC0A; Fri, 14 May 2010 15:42:44 +0000 (UTC) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Date: Fri, 14 May 2010 08:42:44 -0700 Message-ID: <06D5F9F6F655AD4C92E28B662F7F853E021D4D5E@seaxch09.desktop.isilon.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write Thread-Index: AcrzcBgZpUhlL2PhT9OENHWVaOT1DQAChHTp References: <01NN32EOXMYC006UN1@tmk.com> <4BED3912.9080509@FreeBSD.org> <01NN3PQCOFHE006UN1@tmk.com> From: "Matthew Fleming" To: "Terry Kennedy" X-Mailman-Approved-At: Fri, 14 May 2010 17:02:30 +0000 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-current@freebsd.org, freebsd-stable@FreeBSD.org, John Baldwin Subject: RE: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 15:42:45 -0000 > As an aside, this is a quad-core in one package CPU (an X3363). On = both > this box and a similar one with an X5470, console messages continue to > print out after "the system has been halted - press any key to reboot" = - > in particular, the shutdown makes a bunch of the "behind the scenes" = man- > agement stuff like the virtual keyboard and monitor appear. Plugging = or > unplugging USB devices will go through the whole deal of detecting and > making their service available. Oops, youre right that other CPUs are running. The stop_cpus() call is only made if kdb is entered. doadump() is = called out of boot() which comes later. At Isilon weve been running = with a patch that does stop_cpus() pretty close to the front of = panic(9). As an design decision it seems reasonable to call stop_cpus() early in = panic(9) simply because most causes for panic means something = unexpected, and the sooner the other CPUs arent running the more likely = it is that they dont do more damage, leaving the system in a more useful = state for dump or {g,d}db analysis. This should be done before dump or = entering kdb. Im ccing -current@ since I would like a small discussion of moving the = stop_cpus() to earlier in panic. If this change is agreeable I can roll = up a patch and test it on CURRENT. Im not sure yet how much of the = other panic-related changes we have made at Isilon would be required. Thanks, matthew