From owner-freebsd-questions@FreeBSD.ORG Mon Jul 28 19:50:19 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 186D6106569F for ; Mon, 28 Jul 2008 19:50:19 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from weak.local (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C4AAE8FC08; Mon, 28 Jul 2008 19:50:17 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <488E22FB.60203@FreeBSD.org> Date: Mon, 28 Jul 2008 21:50:19 +0200 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.16 (Macintosh/20080707) MIME-Version: 1.0 To: stevefranks@ieee.org References: <539c60b90807280935i50041623pe54b6ad65d5b89b8@mail.gmail.com> In-Reply-To: <539c60b90807280935i50041623pe54b6ad65d5b89b8@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Mailing List Subject: Re: 'stray irq7's cause hang? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Jul 2008 19:50:19 -0000 Steve Franks wrote: > I've got a new system that hangs after about 2 hours - no > ctrl-alt-esc, not ctrl-alt-Fn, no ctrl-alt-delete. > > I tried hints.0.apic.disabled="YES" (that's apic, not acpi) (or > whatever the correct syntax from the handbook is), but I still get the > hang, and the stray irq 7's. As far as I can see, there's no other > dmesg output related. The stray interrupts may be a red herring. "Stray" means that no driver is handling them, and so there is no driver to screw up :) I see straq irq 7's on a HP proliant blade system, and also the hard hangs (it doesn't even reply to a NMI; this means it is almost certainly a hardware error). However I am now fairly certain the hangs are associated to disk failure. Several of the blades that were hanging went on to develop DMA errors from ATA, and after I validated the remaining systems with smartctl and took offline yet more blades that failed the self-tests, I have not had the problem recur. Kris