From owner-freebsd-hackers Sun May 5 18:15:23 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from slc.edu (weir-01c.slc.edu [207.106.89.46]) by hub.freebsd.org (Postfix) with ESMTP id 987A237B401 for ; Sun, 5 May 2002 18:15:16 -0700 (PDT) Received: (from anthony@localhost) by slc.edu (8.11.6/8.11.6) id g461HWl01478; Sun, 5 May 2002 21:17:32 -0400 (EDT) (envelope-from anthony) Date: Sun, 5 May 2002 21:17:31 -0400 From: Anthony Schneider To: Patrick Thomas Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: what causes a userland to stop, but allows kernel to continue ? Message-ID: <20020505211731.A1386@mail.slc.edu> References: <20020505162455.K86733-100000@utility.clubscholarship.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="OgqxwSJOaUobr8KG" Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20020505162455.K86733-100000@utility.clubscholarship.com>; from root@utility.clubscholarship.com on Sun, May 05, 2002 at 04:31:36PM -0700 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --OgqxwSJOaUobr8KG Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable FWIW, I've very recently had something similar happen to a 4.5-STABLE box. The machine was NOT SMP, and the cause, as far as we know, was that /var had been filled by apache's error_log -- a funky new mod_throttle install with lots of=20 critical_acquire() failed: Permission denied critical_release() failed: Permission denied entries. Now, I assume that this is not because /var was full, but actually because of system V semaphore locking in the mod_throttle code. In mod_throttle-3.1.2... The critical_acquire() code from mod_throttle.c (assuming defined(USE_SYSTEM_V_SERIALIZATION)): struct critical { int id; struct sembuf on; struct sembuf off; }; static int critical_acquire(t_critical *mp) { for (errno =3D 0; semop(mp->id, &mp->on, 1) < 0; ) { if (errno !=3D EINTR) { /*** We really should kill the server here. ***/ perror("critical_acquire() failed"); /* Neither of these calls appear to shutdown the * server and its children; exit(APEXIT_CHILDFATAL), * appears to kill only the parent process. */ ap_start_shutdown(); return -1; } } return 0; } Livelock, maybe? Is there some sort of internal kernel semaphore table whi= ch might be getting filled up or something? I'd also like to find out more ab= out this, but sadly, the machine is a remote one and I can't drop into ddb as suggested... Thanks you all very much. Hope this information is of use. -Anthony. On Sun, May 05, 2002 at 04:31:36PM -0700, Patrick Thomas wrote: >=20 > So, based on a previous thread, it looks like I have a server whose > userland halted, essentially, but the kernel continued running. >=20 > As evidenced by: >=20 > - you can still ping the server just fine > - you can still connect to running services just fine - if you ssh to it, > `ssh -v` (verbose) claims a connection is established, but the server > doesn't respond in any way over that connection. Further, you can telnet > to POP or IMAP or HTTP ports, and get a connection, but you can't get any > response. > - cron does NOT run while the server is in this state - no jobs run > - no response from the console - caps lock does NOT toggle the LED >=20 > So, as was suggested in the previous thread, it looks like my kernel is > still running, but the userland has halted. There are no log entries that > give any clue as to why this happened last week. >=20 >=20 > 1. from a theoretical standpoint, how would this happen ? > 2. Is there any way to watchdog for it and escape from it before the > userland completely crashes ? > 3. any previous/old problems that would cause this behavior ? >=20 >=20 > It is a FreeBSD 4.5-RELEASE system, and it is SMP - fairly heavily loaded > (averages 60% CPU idle in `top` output). >=20 > thanks, >=20 > PT >=20 >=20 >=20 > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-hackers" in the body of the message ----------------------------------------------- PGP key at: http://www.keyserver.net/ http://www.anthonydotcom.com/gpgkey/key.txt Home: http://www.anthonydotcom.com ----------------------------------------------- --OgqxwSJOaUobr8KG Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (FreeBSD) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjzV2asACgkQ+rDjkNht5F0YaACeM1vJW/faHB3qhHUddINZMnx3 pn8AoIqn2u4B3pCmqFC9Dwi8TV84isUb =wl0Z -----END PGP SIGNATURE----- --OgqxwSJOaUobr8KG-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message