Date: Thu, 8 Mar 2007 10:55:22 +0100 From: Divacky Roman <xdivac02@stud.fit.vutbr.cz> To: Ed Maste <emaste@phaedrus.sandvine.ca> Cc: freebsd-hackers@freebsd.org Subject: Re: Hung kernel from sysv semaphore semu_list corruption Message-ID: <20070308095522.GA14973@stud.fit.vutbr.cz> In-Reply-To: <20070307230731.GA71684@sandvine.com> References: <20070307230731.GA71684@sandvine.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Mar 07, 2007 at 06:07:31PM -0500, Ed Maste wrote: > Nightly tests on our 6.1-based installation using pgsql have resulted in > a number of kernel hangs, due to a corrupt semu_list (the list ended up > with a loop). > > It seems there are a few holes in the locking in the semaphore code. The > issue we've encountered comes from semexit_myhook. It obtains a pointer > to a list element after acquiring SEMUNDO_LOCK, and after dropping the > lock manipulates the next pointer to remove the element from the list. > > The fix below solves our current problem. Any comments? > > --- RELENG_6/src/sys/kern/sysv_sem.c Tue Jun 7 01:03:27 2005 > +++ swbuild_plt_boson/src/sys/kern/sysv_sem.c Tue Mar 6 16:13:45 2007 > @@ -1259,16 +1259,17 @@ > struct proc *p; > { > struct sem_undo *suptr; > - struct sem_undo **supptr; > > /* > * Go through the chain of undo vectors looking for one > * associated with this process. > */ > SEMUNDO_LOCK(); > - SLIST_FOREACH_PREVPTR(suptr, supptr, &semu_list, un_next) { > - if (suptr->un_proc == p) > + SLIST_FOREACH(suptr, &semu_list, un_next) { > + if (suptr->un_proc == p) { > + SLIST_REMOVE(&semu_list, suptr, sem_undo, un_next); this is wrong.. you cannot remove element from a *LIST when its iterated using *LIST_FOREACH. Use *LIST_FOREACH_SAFE instead... thnx for the patch! roman
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070308095522.GA14973>