Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 7 Mar 2007 18:07:31 -0500
From:      Ed Maste <emaste@phaedrus.sandvine.ca>
To:        freebsd-hackers@freebsd.org
Subject:   Hung kernel from sysv semaphore semu_list corruption
Message-ID:  <20070307230731.GA71684@sandvine.com>

next in thread | raw e-mail | index | archive | help
Nightly tests on our 6.1-based installation using pgsql have resulted in
a number of kernel hangs, due to a corrupt semu_list (the list ended up
with a loop).

It seems there are a few holes in the locking in the semaphore code.  The
issue we've encountered comes from semexit_myhook.  It obtains a pointer
to a list element after acquiring SEMUNDO_LOCK, and after dropping the
lock manipulates the next pointer to remove the element from the list.

The fix below solves our current problem.  Any comments?

--- RELENG_6/src/sys/kern/sysv_sem.c    Tue Jun  7 01:03:27 2005
+++ swbuild_plt_boson/src/sys/kern/sysv_sem.c   Tue Mar  6 16:13:45 2007
@@ -1259,16 +1259,17 @@
        struct proc *p;
 {
        struct sem_undo *suptr;
-       struct sem_undo **supptr;

        /*
         * Go through the chain of undo vectors looking for one
         * associated with this process.
         */
        SEMUNDO_LOCK();
-       SLIST_FOREACH_PREVPTR(suptr, supptr, &semu_list, un_next) {
-               if (suptr->un_proc == p)
+       SLIST_FOREACH(suptr, &semu_list, un_next) {
+               if (suptr->un_proc == p) {
+                       SLIST_REMOVE(&semu_list, suptr, sem_undo, un_next);
                        break;
+               }
        }
        SEMUNDO_UNLOCK();

@@ -1328,8 +1329,9 @@
         * Deallocate the undo vector.
         */
        DPRINTF(("removing vector\n"));
+       SEMUNDO_LOCK();
        suptr->un_proc = NULL;
-       *supptr = SLIST_NEXT(suptr, un_next);
+       SEMUNDO_UNLOCK();
 }

 static int




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070307230731.GA71684>