Date: Wed, 7 Mar 2007 18:07:31 -0500 From: Ed Maste <emaste@phaedrus.sandvine.ca> To: freebsd-hackers@freebsd.org Subject: Hung kernel from sysv semaphore semu_list corruption Message-ID: <20070307230731.GA71684@sandvine.com>
next in thread | raw e-mail | index | archive | help
Nightly tests on our 6.1-based installation using pgsql have resulted in a number of kernel hangs, due to a corrupt semu_list (the list ended up with a loop). It seems there are a few holes in the locking in the semaphore code. The issue we've encountered comes from semexit_myhook. It obtains a pointer to a list element after acquiring SEMUNDO_LOCK, and after dropping the lock manipulates the next pointer to remove the element from the list. The fix below solves our current problem. Any comments? --- RELENG_6/src/sys/kern/sysv_sem.c Tue Jun 7 01:03:27 2005 +++ swbuild_plt_boson/src/sys/kern/sysv_sem.c Tue Mar 6 16:13:45 2007 @@ -1259,16 +1259,17 @@ struct proc *p; { struct sem_undo *suptr; - struct sem_undo **supptr; /* * Go through the chain of undo vectors looking for one * associated with this process. */ SEMUNDO_LOCK(); - SLIST_FOREACH_PREVPTR(suptr, supptr, &semu_list, un_next) { - if (suptr->un_proc == p) + SLIST_FOREACH(suptr, &semu_list, un_next) { + if (suptr->un_proc == p) { + SLIST_REMOVE(&semu_list, suptr, sem_undo, un_next); break; + } } SEMUNDO_UNLOCK(); @@ -1328,8 +1329,9 @@ * Deallocate the undo vector. */ DPRINTF(("removing vector\n")); + SEMUNDO_LOCK(); suptr->un_proc = NULL; - *supptr = SLIST_NEXT(suptr, un_next); + SEMUNDO_UNLOCK(); } static int
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070307230731.GA71684>