From owner-freebsd-current@FreeBSD.ORG Wed Dec 20 13:50:08 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 883A216A416 for ; Wed, 20 Dec 2006 13:50:08 +0000 (UTC) (envelope-from rrs@cisco.com) Received: from sj-iport-5.cisco.com (sj-iport-5.cisco.com [171.68.10.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id CA0F943CB3 for ; Wed, 20 Dec 2006 13:49:55 +0000 (GMT) (envelope-from rrs@cisco.com) Received: from sj-dkim-8.cisco.com ([171.68.10.93]) by sj-iport-5.cisco.com with ESMTP; 20 Dec 2006 05:49:54 -0800 Received: from sj-core-3.cisco.com (sj-core-3.cisco.com [171.68.223.137]) by sj-dkim-8.cisco.com (8.12.11/8.12.11) with ESMTP id kBKDnsAB021843; Wed, 20 Dec 2006 05:49:54 -0800 Received: from xbh-sjc-211.amer.cisco.com (xbh-sjc-211.cisco.com [171.70.151.144]) by sj-core-3.cisco.com (8.12.10/8.12.6) with ESMTP id kBKDnsA4025972; Wed, 20 Dec 2006 05:49:54 -0800 (PST) Received: from xfe-sjc-212.amer.cisco.com ([171.70.151.187]) by xbh-sjc-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Wed, 20 Dec 2006 05:49:52 -0800 Received: from [127.0.0.1] ([171.68.225.134]) by xfe-sjc-212.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Wed, 20 Dec 2006 05:49:52 -0800 Message-ID: <45893F4D.9060104@cisco.com> Date: Wed, 20 Dec 2006 08:49:01 -0500 From: Randall Stewart User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050920 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Randall Stewart References: <45891FE9.4020700@cisco.com> <20061220040151.B88849@xorpc.icir.org> <4589288E.2070509@cisco.com> In-Reply-To: <4589288E.2070509@cisco.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 20 Dec 2006 13:49:52.0743 (UTC) FILETIME=[B9849F70:01C7243D] DKIM-Signature: v=0.5; a=rsa-sha256; q=dns/txt; l=2866; t=1166622594; x=1167486594; c=relaxed/relaxed; s=sjdkim8002; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=rrs@cisco.com; z=From:=20Randall=20Stewart=20 |Subject:=20Re=3A=20A=20stuck=20system |Sender:=20; bh=wRBpYnyz870n7DdFKbAVjrPw3kj+qMLN8V2M1dwHRPo=; b=H+1Cgwnh6JRefZq3CxMxPjWw2XEAuNy7nfuHIkrq39ttGVSCO1PRcG+ryCFe5Y9QSRQZbii4 4JzdOCYjM74uAVpThTNMQ4lURVn3jHgszb54Zj6rELzHaVmuDkvASq5M; Authentication-Results: sj-dkim-8; header.From=rrs@cisco.com; dkim=pass (sig from cisco.com/sjdkim8002 verified; ); Cc: Luigi Rizzo , freebsd-current@freebsd.org Subject: Re: A stuck system X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Dec 2006 13:50:08 -0000 Luigi: Ok, I was wrong on this... I recreated it.. hooked up my em0 card to my laptop (right now its isolated running the mpi tests and uses the loopback only). I do a ping And ta-da the system comes back to life after being hung for 15 minutes. This time I did not see any of the usual syslog messages either... of course it was only "stuck" for 15 minutes or so... I will leave the thing running and get it stuck again and validate that the msk and usb will also cause the machine to come back to life.. Is there any way this could be a lost interupt type problem (remember the scheduler is appearing to "stop" scheduling things). OR is this a problem with my hardware... somehow failing to deliver interupts maybe??? R Randall Stewart wrote: > Luigi Rizzo wrote: > >> On Wed, Dec 20, 2006 at 06:35:05AM -0500, Randall Stewart wrote: >> >>> All: >>> >>> Ok my P4D machine is sitting hung... its in that >>> state I mentioned previously. >>> >>> It will not respond to network input on the em0 card... i.e. >>> it won't answer pings.. >>> >>> I have not tried the new msk0 device... its not configured up :-( >>> >>> Now, I know from past experience if I hit any key... it will >>> start up again.. give out various warnings and timeouts.. sometimes >>> a "clock ran backwards".. possibly.. and then >>> start working fine again.. >>> >>> Is there anything I can try to get some information so we can >>> figure whats going on... >>> >>> It could be a hardware problem... don't know... but >>> it might not be.. it does look like a lost interupt... but >>> thats just a stab in the dark guess.. >> >> >> >> could you try putting a second network card in the box ? >> >> if you suspect it is only the 'em' card that is stuck >> a second one might give you some hints on what is going on. >> >> or plug in some usb device and see if there is any daemon >> responding to the event, etc. >> >> cheers >> luigi >> > Ahh.. great Idea.. I do have a second motherboard e-net card > (msk0).. that I have the driver loaded.. but just have > not gotten around to enabling.. > > But of course thats hind site.. > > Let me try my USB device.. I have one of those USB-Keys that > I use in meetings that work with FreeBSD.. let me see if that > "revives" the system.. if so then I can get in and configure up > the second network :-) > > drat.. idiot that I am... I moved the chasy and knocked the > power cable out.. > > Ok I will reboot and this time before running the test that > will lock it up.. I will enable the network too.. so I will > have two things to try.. > > It will take me a few hours to hit the condition again... > > I will get back to you with results...sigh.. > > R > -- Randall Stewart NSSTG - Cisco Systems Inc. 803-345-0369 803-317-4952 (cell)