Date: Thu, 8 Mar 2007 10:55:22 +0100 From: Divacky Roman <xdivac02@stud.fit.vutbr.cz> To: Ed Maste <emaste@phaedrus.sandvine.ca> Cc: freebsd-hackers@freebsd.org Subject: Re: Hung kernel from sysv semaphore semu_list corruption Message-ID: <20070308095522.GA14973@stud.fit.vutbr.cz> In-Reply-To: <20070307230731.GA71684@sandvine.com> References: <20070307230731.GA71684@sandvine.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Mar 07, 2007 at 06:07:31PM -0500, Ed Maste wrote:
> Nightly tests on our 6.1-based installation using pgsql have resulted in
> a number of kernel hangs, due to a corrupt semu_list (the list ended up
> with a loop).
>
> It seems there are a few holes in the locking in the semaphore code. The
> issue we've encountered comes from semexit_myhook. It obtains a pointer
> to a list element after acquiring SEMUNDO_LOCK, and after dropping the
> lock manipulates the next pointer to remove the element from the list.
>
> The fix below solves our current problem. Any comments?
>
> --- RELENG_6/src/sys/kern/sysv_sem.c Tue Jun 7 01:03:27 2005
> +++ swbuild_plt_boson/src/sys/kern/sysv_sem.c Tue Mar 6 16:13:45 2007
> @@ -1259,16 +1259,17 @@
> struct proc *p;
> {
> struct sem_undo *suptr;
> - struct sem_undo **supptr;
>
> /*
> * Go through the chain of undo vectors looking for one
> * associated with this process.
> */
> SEMUNDO_LOCK();
> - SLIST_FOREACH_PREVPTR(suptr, supptr, &semu_list, un_next) {
> - if (suptr->un_proc == p)
> + SLIST_FOREACH(suptr, &semu_list, un_next) {
> + if (suptr->un_proc == p) {
> + SLIST_REMOVE(&semu_list, suptr, sem_undo, un_next);
this is wrong.. you cannot remove element from a *LIST when its iterated using *LIST_FOREACH.
Use *LIST_FOREACH_SAFE instead...
thnx for the patch!
roman
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070308095522.GA14973>
