Date: Thu, 5 Apr 2001 17:06:17 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: bde@zeta.org.au (Bruce Evans) Cc: tlambert@primenet.com (Terry Lambert), rwatson@FreeBSD.ORG (Robert Watson), freebsd-arch@FreeBSD.ORG, dillon@FreeBSD.ORG Subject: Re: Eliminate crget() from nfs kernel code? Message-ID: <200104051706.KAA24246@usr01.primenet.com> In-Reply-To: <Pine.BSF.4.21.0104051805550.46796-100000@besplex.bde.org> from "Bruce Evans" at Apr 05, 2001 06:18:44 PM
next in thread | previous in thread | raw e-mail | index | archive | help
> > [ ... crget() ... ] > > > > I am not too happy with crget() at the moment. Even discounting > > the fact that it calls MALLOC(), and does not check the results > > (the new [BAD] semantics permit this to fail under extremely low > > memory conditions [FOR NO GOOD REASON] instead of hanging), it is > > New [BAD] semantics for malloc(..., M_WAITOK) would require some > dead bodies :-). I haven't seen any. Check every occurance of MALLOC() and the two lines immediately following it in the kernel. Some calls with M_WAITOK check for a NULL value for the assigned pointer; others do not. Someone is using it wrong. I've personally seen a panic, preceeded by several warning messages (which only occur with "INVARIANTS" turned on), under heavy network load. Specifically, write a small program that accepts and closes socket connections, while on the other end, have a client which connects as fast as possible. This is (effectively) a simulation of the highest possible HTTP connection traffic. The messages complaining that the memory on the free list has changed out from under it will occur when the system load as a result of the server hits around 98.72%. The panic can occur anywhere. However, it's easy to accelerate the panic by running a program that sleeps 1 second, runs a grep for something, and sleeps again. A lot of the complaints are about "was type cred". If you end up exacerbating the panic with the grep shell script, the pace it _invariably_ occurs is in grep calling access(2) calling crdup(), where, after the allocation, the uihold() fails on the copy of the credential with a "page not present" error on the attempted increment through indirect (incx instruction). When I check the return of the MALLOC(), I don't see a problem. Obviously, someone is either freeing something twice, using memory they are not allowed to use, or freeing something and then continuing to use it. Though there is a race condition that you could drive a truck through in close(2) (the reference count is decremented to zero, menaing that the descriptor could end up reused, prior to a lot of things being done to the now zero referencs descriptor), a gross closing of that hole indicates that it is not a close race that is the problem, and that the problem has to be occuring in the soclose() code, or in the credentials code. Since this only happens under load, I am assuming an overflow, where an allocation from the free list is not being checked for a wrap. So once again: the finger points to MALLOC().. > > If you "fix" crget(), you will also need to fix crdup(). There > > are plenty of places where crdup() is called, not just in the > > access() system call, where it is bogusly used to replace _only_ > > the initial group of the real GID, leaving the groups of the > > effective UID active, falsely yielding access to the file, even > > if the real UID would have not have contained the same group list > > as the effecive UID (gotta love "security" code). > > This just how access() works. It checks the access that you would > have setting the IDs to the real ones. Setting the IDs to the real > ones has no effect on the groups list except possibly for removing/ > changing the effective GID if that is on the list. The problem with this approach is that a priviledged process which calls setgroups(2) using its effective user ID to permit it to make the call will change the group set for the "real" credential, when the access(2) call occurs. The only way I can see around this is to keep separate cred structures for real and effective IDs, and thus keep seperate groups, and pass the "real" cred structure, instead of a hacked "effective" structure down to do the work. The problem with doing this is that there is not a [sg]etegroups(2) call. Since setgroups(2) is a BSD-ism, it seems to me that the only real way to fix this is to introduce two new BSD-isms for manipulating effective group lists, which would operate on the effective cred's grouplist, instead of on the gloabl group list.. I can't say I'm thrilled with the idea. My personal implementation choice would be to make [sg]etgroups(2) take an additional parameter, and have user space make the call via [sg]et{e}groups(3), with the second parameter selecting between effective and real. This could be made binary backward compatible using knowledge of the number of real arguments on the stack to decide on the behaviour, if necessary. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200104051706.KAA24246>