From owner-freebsd-stable@FreeBSD.ORG Tue Feb 10 17:47:39 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7129D16A4CF for ; Tue, 10 Feb 2004 17:47:39 -0800 (PST) Received: from book.riviera.org.uk (book.riviera.org.uk [146.101.136.75]) by mx1.FreeBSD.org (Postfix) with SMTP id C4A5E43D2F for ; Tue, 10 Feb 2004 17:47:38 -0800 (PST) (envelope-from elliot@devnull.org.uk) Received: (qmail 18616 invoked by uid 0); 11 Feb 2004 01:47:36 -0000 Received: from eddie.riviera.org.uk (HELO ?192.168.254.200?) (elliot@devnull.org.uk@213.208.108.167) by book.riviera.org.uk with SMTP; 11 Feb 2004 01:47:36 -0000 Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: <20040206230049.W20729@carver.gumbysoft.com> References: <481C8DB1-591D-11D8-8420-000A95765552@devnull.org.uk> <20040206230049.W20729@carver.gumbysoft.com> Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <54D68324-5C34-11D8-8420-000A95765552@devnull.org.uk> Content-Transfer-Encoding: 7bit From: Elliot Moore Date: Wed, 11 Feb 2004 01:48:03 +0000 To: freebsd-stable@freebsd.org X-Mailer: Apple Mail (2.612) Subject: Re: FreeBSD4.9 - panic: timeout table full - UPDATE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Feb 2004 01:47:39 -0000 Well so far so good The node is performing well doing its job and has been up for 3.5 days. On Saturday: replaced the IDE cables. replaced the memory. loaded BIOS fail-safe defaults. Ok, due to the node being miles away in a colo, I broke the golden rule by changing 3 things at the same time, but I can test the cables and memory in a test FreeBSD node at my leisure. Good, it's not a FreeBSD problem. (well apart from it not warning me! I will investigate 5.x KTR) Though somebody else may benefit from reading this update in the archives. Thanks doug for your help and a big shout to the FreeBSD community. :wq ells On 7 Feb 2004, at 07:10, Doug White wrote: > On Sat, 7 Feb 2004, Elliot Moore wrote: > >> Hello all, >> >> I have a repetitive kernel panic on FreeBSD-4.9 [fresh installed from >> CD - no CVS upgrades] >> >> ========================= >> panic: timeout table full > > Hm, haven't seen this one. > > Looking at your config, you may be overtuning by cranking up maxusers > that > high. I suggest leaving it at 0, and letting the system autotune. I'd > also not suggest changing NMBCLUSTERS unless you have a specific > reason to > do so. > >> * [Q] ??: either the number of free ncallouts is depleating over time >> or something has stopped responding, causing a rapid increase in the >> number of timeouts called or something has stopped clearing its >> timeout >> handles - a bad driver? > > Could be, or a stuck loop somewhere. Unfortunately, you'd need to be > watching things when it goes off to see if there are any more kernel > messages, or if a disk is flipping out, or something like that. > >> * [Q] Does somebody know of a method to ask the kernel how many >> timeouts are assigned and what called them? > > You could attach gdb to /dev/kmem and poke around, although that gets > tricky, and unless you know your way around you won't have much luck. > >> To be able to find out how many are left/being used and >> therefore >> workout the rate of depletion would be helpful in debugging - AND to >> 'throw in the towel' and reboot safely before it dies! >> Can this be done? [some inquiry code or a kernel patch] >> Is there something already in FreeBSD that can do this? > > in 5.x there is the KTR mechanism, which can record various kernel > events. > This isn't available in 4.x, however. > >> The only quirk i see at boot is this in dmesg: >> pci0: (vendor=0x8086, dev=0x24c3) at 31.3 irq 7 > > This is an SMBus controller, if you compile in the intpm driver it > should > get picked up. Not critical to system operation, however. > >> And sometimes (note: not all the time) this message after boot or >> midway thru the day: >> stray irq 7 >> >> * [Q] This unknown card at irq7 I imagine from vendor this is the >> onboard Intel SMBus/I2C bridge. Could this play a part in this timeout >> panic? > > Doubtful; irq 7 is a junk irq that various things can trigger. Stuck > interrupts don't schedule callouts. > >> * [Q] is my kernel config at fault? (though GENERIC still paniced) > > Good to know that GENERIC also had the problem. I'd stick with GENERIC > for > now unless you have need of a custom driver or configuration; easier > for > the rest of us to debug against :) > > Its possible that your disk is flaking out and not accepting commands, > or > has some other sort of failure that causes the ata driver to > malfunction. > Have you tried replacing the disk? > >> * [Q] I have a 70 gig UFS+S filesystem (27067418 used inodes) is it >> normal for it to take an hour to fsck after the panic? > > An hour would be a very long time. > > -- > Doug White | FreeBSD: The Power to Serve > dwhite@gumbysoft.com | www.FreeBSD.org