Date: Wed, 7 Mar 2007 18:07:31 -0500 From: Ed Maste <emaste@phaedrus.sandvine.ca> To: freebsd-hackers@freebsd.org Subject: Hung kernel from sysv semaphore semu_list corruption Message-ID: <20070307230731.GA71684@sandvine.com>
index | next in thread | raw e-mail
Nightly tests on our 6.1-based installation using pgsql have resulted in
a number of kernel hangs, due to a corrupt semu_list (the list ended up
with a loop).
It seems there are a few holes in the locking in the semaphore code. The
issue we've encountered comes from semexit_myhook. It obtains a pointer
to a list element after acquiring SEMUNDO_LOCK, and after dropping the
lock manipulates the next pointer to remove the element from the list.
The fix below solves our current problem. Any comments?
--- RELENG_6/src/sys/kern/sysv_sem.c Tue Jun 7 01:03:27 2005
+++ swbuild_plt_boson/src/sys/kern/sysv_sem.c Tue Mar 6 16:13:45 2007
@@ -1259,16 +1259,17 @@
struct proc *p;
{
struct sem_undo *suptr;
- struct sem_undo **supptr;
/*
* Go through the chain of undo vectors looking for one
* associated with this process.
*/
SEMUNDO_LOCK();
- SLIST_FOREACH_PREVPTR(suptr, supptr, &semu_list, un_next) {
- if (suptr->un_proc == p)
+ SLIST_FOREACH(suptr, &semu_list, un_next) {
+ if (suptr->un_proc == p) {
+ SLIST_REMOVE(&semu_list, suptr, sem_undo, un_next);
break;
+ }
}
SEMUNDO_UNLOCK();
@@ -1328,8 +1329,9 @@
* Deallocate the undo vector.
*/
DPRINTF(("removing vector\n"));
+ SEMUNDO_LOCK();
suptr->un_proc = NULL;
- *supptr = SLIST_NEXT(suptr, un_next);
+ SEMUNDO_UNLOCK();
}
static int
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070307230731.GA71684>
