From owner-freebsd-hackers@FreeBSD.ORG Thu Mar 8 10:11:36 2007 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2CF0616A403 for ; Thu, 8 Mar 2007 10:11:36 +0000 (UTC) (envelope-from xdivac02@stud.fit.vutbr.cz) Received: from eva.fit.vutbr.cz (eva.fit.vutbr.cz [147.229.176.14]) by mx1.freebsd.org (Postfix) with ESMTP id B874113C48D for ; Thu, 8 Mar 2007 10:11:35 +0000 (UTC) (envelope-from xdivac02@stud.fit.vutbr.cz) Received: from eva.fit.vutbr.cz (localhost [127.0.0.1]) by eva.fit.vutbr.cz (envelope-from xdivac02@eva.fit.vutbr.cz) (8.13.8/8.13.7) with ESMTP id l289tNPB015566 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 8 Mar 2007 10:55:23 +0100 (CET) Received: (from xdivac02@localhost) by eva.fit.vutbr.cz (8.13.8/8.13.3/Submit) id l289tMTZ015565; Thu, 8 Mar 2007 10:55:22 +0100 (CET) Date: Thu, 8 Mar 2007 10:55:22 +0100 From: Divacky Roman To: Ed Maste Message-ID: <20070308095522.GA14973@stud.fit.vutbr.cz> References: <20070307230731.GA71684@sandvine.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070307230731.GA71684@sandvine.com> User-Agent: Mutt/1.4.2.2i X-Scanned-By: MIMEDefang 2.57 on 147.229.176.14 Cc: freebsd-hackers@freebsd.org Subject: Re: Hung kernel from sysv semaphore semu_list corruption X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2007 10:11:36 -0000 On Wed, Mar 07, 2007 at 06:07:31PM -0500, Ed Maste wrote: > Nightly tests on our 6.1-based installation using pgsql have resulted in > a number of kernel hangs, due to a corrupt semu_list (the list ended up > with a loop). > > It seems there are a few holes in the locking in the semaphore code. The > issue we've encountered comes from semexit_myhook. It obtains a pointer > to a list element after acquiring SEMUNDO_LOCK, and after dropping the > lock manipulates the next pointer to remove the element from the list. > > The fix below solves our current problem. Any comments? > > --- RELENG_6/src/sys/kern/sysv_sem.c Tue Jun 7 01:03:27 2005 > +++ swbuild_plt_boson/src/sys/kern/sysv_sem.c Tue Mar 6 16:13:45 2007 > @@ -1259,16 +1259,17 @@ > struct proc *p; > { > struct sem_undo *suptr; > - struct sem_undo **supptr; > > /* > * Go through the chain of undo vectors looking for one > * associated with this process. > */ > SEMUNDO_LOCK(); > - SLIST_FOREACH_PREVPTR(suptr, supptr, &semu_list, un_next) { > - if (suptr->un_proc == p) > + SLIST_FOREACH(suptr, &semu_list, un_next) { > + if (suptr->un_proc == p) { > + SLIST_REMOVE(&semu_list, suptr, sem_undo, un_next); this is wrong.. you cannot remove element from a *LIST when its iterated using *LIST_FOREACH. Use *LIST_FOREACH_SAFE instead... thnx for the patch! roman