Date: Sat, 03 Mar 2007 01:32:20 -0400 From: "Marc G. Fournier" <scrappy@freebsd.org> To: Antony Mawer <fbsd-stable@mawer.org> Cc: freebsd-stable@freebsd.org Subject: Re: Some days, it doesn't pay to upgrade ... Message-ID: <3AF45A659F5D4E8DD7260AA1@ganymede.hub.org> In-Reply-To: <45E60761.8050101@mawer.org> References: <5F9C60E2708CB953C06B21EA@ganymede.hub.org> <45E60761.8050101@mawer.org>
next in thread | previous in thread | raw e-mail | index | archive | help
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I don't know how critical this is, but I just thought about it ... this is my only system running gmirror ... everything seems fine according ot gmirror status, but maybe something iswron gthere I'm not seeing: Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider mirror/vm destroyed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm destroyed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider mirror/md2 destroyed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2 destroyed. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md2 removed from md0. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 removed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider mirror/md1 destroyed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1 destroyed. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md1 removed from md0. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 destroyed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1 created (id=2282154470). Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da1 detected. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da2 detected. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da2 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da1 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider mirror/md1 launched. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2 created (id=3089402334). Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da3 detected. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da4 detected. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da4 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da3 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider mirror/md2 launched. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm created (id=2175292049). Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider da5 detected. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 created (id=1094782536). Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md1 attached to md0. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md2 attached to md0. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Force device vm start due to timeout. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider da5 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider mirror/vm launched. mirror/md1 COMPLETE da1 da2 mirror/md2 COMPLETE da3 da4 mirror/vm DEGRADED da5 I'm not using da5 right now, its just in there ... went with a RAID1+0 vs RAID5 configuration ... - --On Thursday, March 01, 2007 09:51:13 +1100 Antony Mawer <fbsd-stable@mawer.org> wrote: > On 27/02/2007 11:59 PM, Marc G. Fournier wrote: >> After 155 days of problem free uptime, I upgraded my 6-STABLE system the >> other day to the latest cvsup ... 3 days later, the whole thing hung solid >> with: >> >> >> Feb 27 04:32:49 mars uptimec: The server requested that we do a new login >> Feb 27 04:33:00 mars kernel: maxproc limit exceeded by uid 0, please see >> tuning(7) and login.conf(5). >> Feb 27 04:33:10 mars kernel: maxproc limit exceeded by uid 60, please see >> tuning(7) and login.conf(5). >> >> Stupid question: why isn't there some mechanism that prevents new processes >> from starting up, instead of locking up the whole server? I'm not asking >> for the evilness of Linux, where it arbitrarily kills off existing >> processes, but if maxproc is hit, why continue to try and start up new ones? > > What do you define as 'hung solid'? You are unable to get in via SSH? Or at a > console via iLO/etc? > > I've seen this on some of our 6.0-RELEASE machines (along with maxpipekva > exhausted errors), and you can't SSH in from that point... because sshd forks > to handle the connection, and all available process slots are used up. > > I've thought about writing a background daemon to monitor the logs for signs > of this (or even to just try and create a short-lived child process by > fork()ing every 5 minutes or so), and dump information to disk then reboot > the system when this occurs... it's a work-around for something that > "shouldn't happen", but it does anyway... once I'm able to identify _what_ is > causing the build-up of processes, then I might be able to do something about > killing them...!!! > > > It's quite deceptive from an end-user point of view, because things like > Apache that are already keep running, so all they see are strange bits and > pieces that don't work... and as always, its one of those things that only > happens on some clients machines, but never on any of our test machines... > > --Antony > > > PS. I haven't disappeared off the face of the earth.. though close.. my > fiance and I have been busy planning the wedding, and wound up buying a house > at the same time..!! Will catch up shortly once I get a chance to come up for > air!! - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFF6Qhk4QvfyHIvDvMRAhJ0AKDVibziN1W1TagIapB5GWN3+mbCGACdHd4w dgT0Xi40Ie/pBeUMB8Pj1go= =bSuI -----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3AF45A659F5D4E8DD7260AA1>