Date: Wed, 20 Dec 2006 10:30:16 -0500 From: Andre Guibert de Bruet <andy@siliconlandmark.com> To: Randall Stewart <rrs@cisco.com> Cc: freebsd-current@freebsd.org Subject: Re: A stuck system Message-ID: <58281AA0-3738-490C-9EA8-7766033713A2@siliconlandmark.com> In-Reply-To: <45893F4D.9060104@cisco.com> References: <45891FE9.4020700@cisco.com> <20061220040151.B88849@xorpc.icir.org> <4589288E.2070509@cisco.com> <45893F4D.9060104@cisco.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Dec 20, 2006, at 8:49 AM, Randall Stewart wrote: > Ok, I was wrong on this... I recreated it.. hooked up > my em0 card to my laptop (right now its isolated > running the mpi tests and uses the loopback only). > > I do a ping > > And ta-da the system comes back to life after > being hung for 15 minutes. > > This time I did not see any of the usual syslog messages > either... of course it was only "stuck" for 15 minutes or > so... > > I will leave the thing running and get it stuck again and > validate that the msk and usb will also cause the machine > to come back to life.. > > Is there any way this could be a lost interupt type problem (remember > the scheduler is appearing to "stop" scheduling things). OR > is this a problem with my hardware... somehow failing to > deliver interupts maybe??? I am seeing something similar on my dual Xeon system. It appears that a kernel from December 13th did not exhibit this behavior whereas one from the 16th does. I am able to "revive" the machine by pushing traf on the msk0 interface. Kernel config: http://bling.properkernel.com/freebsd/BLING Andy /* Andre Guibert de Bruet * 6f43 6564 7020 656f 2e74 4220 7469 6a20 */ /* Code poet / Sysadmin * 636f 656b 2e79 5320 7379 6461 696d 2e6e */ /* GSM: +1 734 846 8758 * 5520 494e 2058 6c73 7565 6874 002e 0000 */ /* WWW: siliconlandmark.com * C/C++, Java, Perl, PHP, SQL, XHTML, XML */
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?58281AA0-3738-490C-9EA8-7766033713A2>