Date: Mon, 11 Feb 2002 23:12:57 -0500 (EST) From: John Baldwin <jhb@FreeBSD.org> To: Terry Lambert <tlambert2@mindspring.com> Cc: Julian Elischer <julian@elischer.org>, Alfred Perlstein <bright@mu.org>, bde@freebsd.org, current@freebsd.org Subject: Re: ucred holding patch, BDE version Message-ID: <XFMail.020211231257.jhb@FreeBSD.org> In-Reply-To: <3C688604.75C0E136@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 12-Feb-02 Terry Lambert wrote: > Then the combination of calls fails. There is no case where > it will succeed when previously it would not. I would argue > that the failure is a lost race condition that exists in the > code anyway, and that this approach merely makes the race > failure more reliably reproducible. 8-). > > I think you could "fix" this case, in any event, by saving a > reference to the credential at the start (or, better, as a > result of the sleep, when you know it's going to happen), > any time you are going to split a system call across multiple > potential blocking operations. That certainly is *NOT* as > common to all system calls as this discussion implies. > > >> The idea of per-thread ucred references is so that a thread will have the >> same >> credentials for the lifetime of a syscall so that the entire syscall is self >> consistent. It also means that except for when we update the ucred >> reference, >> we don't need locks to access thread credentials since the thread references >> are to read-only credentials. We discussed this on -arch _months_ ago and >> everyone agreed with it then. > > As long as the pointer updates are atomic, you don't need to > hold a lock to reference it anyway. You don't care over the > update, since it's a drop of priviledge. Ok, I'll use really small words and diagrams: thread's 1 and 2 belong to process p p->p_ucred starts pointing at memory X CPU 1 is executing thread 1 CPU 2 is executing thread 2 CPU 3 is executing some other random code in the kernel At start both thread 1 and thread 2 have td_ucred == X CPU 1 CPU 2 CPU 3 seteuid(), p_ucred in userland now points to Y ... seteuid(), p_ucred now points to Z ... some syscall, Y's only remaining ref is free'd ... allocates some other kernel structure at Y executes any syscall because on non-i386 arch's, the value of p_ucred may still be sitting in the store buffer and we aren't using a lock, we may get any of the values X, Y, or Z for p_ucred when we test to see if it needs updating. If we get X, we keep using the old reference (this is the acceptable race from before). If we get Z, then we gain a reference to the new ucred. If we get Y, then we increment the refcount of the ucred formerly at Y, in fact corrupting the memory allocated by CPU 3! *BOOM* Is that simple enough? >> > Perhaps this dicsussion is enough impetus to justify >> > revisiting the atomic_t type definitions, which would >> > be useful as reference counted hold/release mechanisms >> > that would obviate the need for locks here? This would >> > at least let you defer getting rid of the per thread >> > cred instances until later. >> >> All having the refcount_t and other refcount_* functions would do is let us >> get >> rid of the per-ucred mutex (actually a pool mutex, so the overhead per ucred >> is >> just a pointer right now). It wouln't change the fact that we need a lock >> to >> make sure p_ucred is up to date before we read a value we need to depend on >> or >> actually use. > > Not true. You can take a reference to it, and the reference > is guaranteed to not change out from under you, so long as > it is counted. Use of a stack variable to store the value > over the count increment allows a compare after the increment, > after which a retry is possible (if there was a race during > the reference taking). Thus there is no need for a lock, e.g.: > > cred * > addreftocred(cred **cpp) > { > cred *rcp; /* stack variable */ > > again: > rcp = *cp; > atomic_plusplus( rcp->count); /* magic atomic crap */ > if( rcp != *cp) { > atomic_mimusminus( rcp->count); > goto again; > } > > return( rcp); > } > > This depends on the rarity of credential changes; in the > degenerate case, where there is a credential change between > each normal system call that is not a credential change, the > overhead is immense. But people who write code like that > can be safely punished without fear. 8-). This code is still broken. How do you know that the value you are reading from cp (well, cpp, looks like you have a typo :-P) isn't stale and that your increment isn't corrupting something? Esp. since you are using atomic ops that means your increment might corrupt somethign long enough to become visible to another CPU and really hose things. There is a race condition in between that plusplus and minusminus (albeit a small one) that is just _begging_ for a kernel panic. > -- Terry -- John Baldwin <jhb@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.020211231257.jhb>