From owner-freebsd-current@FreeBSD.ORG Wed Dec 20 15:43:22 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E8AEC16A4D4 for ; Wed, 20 Dec 2006 15:43:22 +0000 (UTC) (envelope-from andy@siliconlandmark.com) Received: from lexi.siliconlandmark.com (lexi.siliconlandmark.com [209.69.98.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id E066043CAA for ; Wed, 20 Dec 2006 15:43:03 +0000 (GMT) (envelope-from andy@siliconlandmark.com) Received: from [10.7.6.254] ([63.76.235.163]) by lexi.siliconlandmark.com (8.13.3/8.13.3) with ESMTP id kBKFUMkS056639; Wed, 20 Dec 2006 10:30:22 -0500 (EST) (envelope-from andy@siliconlandmark.com) In-Reply-To: <45893F4D.9060104@cisco.com> References: <45891FE9.4020700@cisco.com> <20061220040151.B88849@xorpc.icir.org> <4589288E.2070509@cisco.com> <45893F4D.9060104@cisco.com> Mime-Version: 1.0 (Apple Message framework v752.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <58281AA0-3738-490C-9EA8-7766033713A2@siliconlandmark.com> Content-Transfer-Encoding: 7bit From: Andre Guibert de Bruet Date: Wed, 20 Dec 2006 10:30:16 -0500 To: Randall Stewart X-Mailer: Apple Mail (2.752.2) X-Virus-Scanned: ClamAV 0.88.6/2360/Wed Dec 20 01:24:09 2006 on lexi.siliconlandmark.com X-Virus-Status: Clean X-Information: Please contact the ISP for more information X-SL-MailScanner: Not scanned: please contact your Internet E-Mail Service Provider for details X-SL-SpamCheck: not spam, SpamAssassin (not cached, score=-1.458, required 6, AWL -0.00, BAYES_00 -2.60, SPF_FAIL 1.14) X-MailScanner-From: andy@siliconlandmark.com Cc: freebsd-current@freebsd.org Subject: Re: A stuck system X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Dec 2006 15:43:23 -0000 On Dec 20, 2006, at 8:49 AM, Randall Stewart wrote: > Ok, I was wrong on this... I recreated it.. hooked up > my em0 card to my laptop (right now its isolated > running the mpi tests and uses the loopback only). > > I do a ping > > And ta-da the system comes back to life after > being hung for 15 minutes. > > This time I did not see any of the usual syslog messages > either... of course it was only "stuck" for 15 minutes or > so... > > I will leave the thing running and get it stuck again and > validate that the msk and usb will also cause the machine > to come back to life.. > > Is there any way this could be a lost interupt type problem (remember > the scheduler is appearing to "stop" scheduling things). OR > is this a problem with my hardware... somehow failing to > deliver interupts maybe??? I am seeing something similar on my dual Xeon system. It appears that a kernel from December 13th did not exhibit this behavior whereas one from the 16th does. I am able to "revive" the machine by pushing traf on the msk0 interface. Kernel config: http://bling.properkernel.com/freebsd/BLING Andy /* Andre Guibert de Bruet * 6f43 6564 7020 656f 2e74 4220 7469 6a20 */ /* Code poet / Sysadmin * 636f 656b 2e79 5320 7379 6461 696d 2e6e */ /* GSM: +1 734 846 8758 * 5520 494e 2058 6c73 7565 6874 002e 0000 */ /* WWW: siliconlandmark.com * C/C++, Java, Perl, PHP, SQL, XHTML, XML */