From owner-freebsd-current@FreeBSD.ORG Fri Feb 11 22:50:18 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 804D016A4CE; Fri, 11 Feb 2005 22:50:18 +0000 (GMT) Received: from obsecurity.dyndns.org (CPE0050040655c8-CM00111ae02aac.cpe.net.cable.rogers.com [69.199.47.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3C19C43D49; Fri, 11 Feb 2005 22:50:18 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 70A9F51247; Fri, 11 Feb 2005 14:50:17 -0800 (PST) Date: Fri, 11 Feb 2005 14:50:17 -0800 From: Kris Kennaway To: "David G. Lawrence" Message-ID: <20050211225017.GA58711@xor.obsecurity.org> References: <20050130025217.GA32612@xor.obsecurity.org> <20050130075422.GL48777@opteron.dglawrence.com> <20050130093527.GA89923@xor.obsecurity.org> <20050130101403.GM48777@opteron.dglawrence.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Nq2Wo0NMKNjxTN9z" Content-Disposition: inline In-Reply-To: <20050130101403.GM48777@opteron.dglawrence.com> User-Agent: Mutt/1.4.2.1i cc: alc@freeBSD.org cc: current@freeBSD.org cc: Kris Kennaway Subject: Re: do_execve() finding vmspace_destroyed set under load X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Feb 2005 22:50:18 -0000 --Nq2Wo0NMKNjxTN9z Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Jan 30, 2005 at 02:14:03AM -0800, David G. Lawrence wrote: > > > > Needless to say, the scripts get pretty unhappy when they're summar= ily > > > > aborted. What is the cause of this? > > >=20 > > > There are many reasons why an exec can fail - you'd need to collect > > > more info to be able to say specifically. Speaking generally, the abo= ve > > > code happens because something failed after the process's address spa= ce > > > had been cleared, so there is no process executable image to return > > > to. The only thing to do in that case is to kill off the process. If > > > you're only seeing the problem under load, it is probably indicating > > > that your running out of a kernel VM pool of some kind. > >=20 > > Any suggestions on what to look at to try and debug this further? >=20 > The first thing to do is to add some kernel printf's to do_execve() > in each of the 'if (error)' cases to determine where the error is occurin= g. > It's probably not worth putting them in cases prior to the 'loop through > the list of image activators', since the vmspace isn't destroyed until > then. > Once you've done that, the cause of the problem should become obvious. It's the error branch here: for (i =3D 0; error =3D=3D -1 && execsw[i]; ++i) { if (execsw[i]->ex_imgact =3D=3D NULL || execsw[i]->ex_imgact =3D=3D img_first) { continue; } error =3D (*execsw[i]->ex_imgact)(imgp); } if (error) { if (error =3D=3D -1) { if (textset =3D=3D 0) imgp->vp->v_vflag &=3D ~VV_TEXT; error =3D ENOEXEC; } goto exec_fail_dealloc; } But I forgot to print the value of error..duh :-( Kris --Nq2Wo0NMKNjxTN9z Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (FreeBSD) iD8DBQFCDTapWry0BWjoQKURAi9NAKDrUKm0uQT9lTKY9SuDzhgU2qknAQCcCNBF ieH+SejgnyRNAsJRgj+Wz8o= =Lq+c -----END PGP SIGNATURE----- --Nq2Wo0NMKNjxTN9z--