From owner-freebsd-stable@FreeBSD.ORG Sat Mar 3 03:13:29 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 85CB516A401 for ; Sat, 3 Mar 2007 03:13:29 +0000 (UTC) (envelope-from scrappy@freebsd.org) Received: from hub.org (hub.org [200.46.204.220]) by mx1.freebsd.org (Postfix) with ESMTP id 2E01413C48D for ; Sat, 3 Mar 2007 03:13:29 +0000 (UTC) (envelope-from scrappy@freebsd.org) Received: from localhost (unknown [200.46.204.182]) by hub.org (Postfix) with ESMTP id 64F9B85C8C7; Fri, 2 Mar 2007 23:13:17 -0400 (AST) Received: from hub.org ([200.46.204.220]) by localhost (mx1.hub.org [200.46.204.182]) (amavisd-new, port 10024) with ESMTP id 83097-02; Fri, 2 Mar 2007 23:13:22 -0400 (AST) Received: from ganymede.hub.org (blk-89-241-126.eastlink.ca [24.89.241.126]) by hub.org (Postfix) with ESMTP id 8F58985C8DD; Fri, 2 Mar 2007 23:13:16 -0400 (AST) Received: from localhost (localhost [127.0.0.1]) by ganymede.hub.org (Postfix) with ESMTP id 80D6D5FB3E; Fri, 2 Mar 2007 23:13:33 -0400 (AST) Date: Fri, 02 Mar 2007 23:13:33 -0400 From: "Marc G. Fournier" To: Antony Mawer Message-ID: In-Reply-To: <45E60761.8050101@mawer.org> References: <5F9C60E2708CB953C06B21EA@ganymede.hub.org> <45E60761.8050101@mawer.org> X-Mailer: Mulberry/4.0.7 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline Cc: freebsd-stable@freebsd.org Subject: Re: Some days, it doesn't pay to upgrade ... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Mar 2007 03:13:29 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Based on the suggestion by someone on this list, I setup a screen session with top running, to watch things ... again, after 3 days, the server goes 'out of process' ... this time, of course, I could get in to look around and kill off processes ... from what I can tell, a process that all it does is: ping -c 1 with a 300 sec timeout that runs once a minute started to 'run over top of' each other out of cron ... the host that it is pinging is on the same switch and has been running fine for 20 days now, and it wasn't until I did the last upgrade on teh server causing the problems that these problems started ... Coincidence? :) I'm going to fix the script so that it doesn't try to run over itself ... anyone konw of a problem with the fxp driver in 6-STABLE that might cause the ping to hang? - --On Thursday, March 01, 2007 09:51:13 +1100 Antony Mawer wrote: > On 27/02/2007 11:59 PM, Marc G. Fournier wrote: >> After 155 days of problem free uptime, I upgraded my 6-STABLE system the >> other day to the latest cvsup ... 3 days later, the whole thing hung solid >> with: >> >> >> Feb 27 04:32:49 mars uptimec: The server requested that we do a new login >> Feb 27 04:33:00 mars kernel: maxproc limit exceeded by uid 0, please see >> tuning(7) and login.conf(5). >> Feb 27 04:33:10 mars kernel: maxproc limit exceeded by uid 60, please see >> tuning(7) and login.conf(5). >> >> Stupid question: why isn't there some mechanism that prevents new processes >> from starting up, instead of locking up the whole server? I'm not asking >> for the evilness of Linux, where it arbitrarily kills off existing >> processes, but if maxproc is hit, why continue to try and start up new ones? > > What do you define as 'hung solid'? You are unable to get in via SSH? Or at a > console via iLO/etc? > > I've seen this on some of our 6.0-RELEASE machines (along with maxpipekva > exhausted errors), and you can't SSH in from that point... because sshd forks > to handle the connection, and all available process slots are used up. > > I've thought about writing a background daemon to monitor the logs for signs > of this (or even to just try and create a short-lived child process by > fork()ing every 5 minutes or so), and dump information to disk then reboot > the system when this occurs... it's a work-around for something that > "shouldn't happen", but it does anyway... once I'm able to identify _what_ is > causing the build-up of processes, then I might be able to do something about > killing them...!!! > > > It's quite deceptive from an end-user point of view, because things like > Apache that are already keep running, so all they see are strange bits and > pieces that don't work... and as always, its one of those things that only > happens on some clients machines, but never on any of our test machines... > > --Antony > > > PS. I haven't disappeared off the face of the earth.. though close.. my > fiance and I have been busy planning the wedding, and wound up buying a house > at the same time..!! Will catch up shortly once I get a chance to come up for > air!! - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFF6Ofd4QvfyHIvDvMRAmoqAJ9ka8ZQxq0Ciidyy4R60bTmYfxeggCeLz7i /De9C0Hmdqb22nErxhyUaZA= =Seo0 -----END PGP SIGNATURE-----