From owner-freebsd-hackers@FreeBSD.ORG Wed Mar 7 23:19:33 2007 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2FA7116A400 for ; Wed, 7 Mar 2007 23:19:33 +0000 (UTC) (envelope-from emaste@phaedrus.sandvine.ca) Received: from gw.sandvine.com (sandvine.com [199.243.201.138]) by mx1.freebsd.org (Postfix) with ESMTP id E2C0113C48E for ; Wed, 7 Mar 2007 23:19:32 +0000 (UTC) (envelope-from emaste@phaedrus.sandvine.ca) Received: from labgw2.phaedrus.sandvine.com ([192.168.3.11]) by gw.sandvine.com with Microsoft SMTPSVC(6.0.3790.1830); Wed, 7 Mar 2007 18:07:31 -0500 Received: by labgw2.phaedrus.sandvine.com (Postfix, from userid 12627) id 8C54511708; Wed, 7 Mar 2007 18:07:31 -0500 (EST) Date: Wed, 7 Mar 2007 18:07:31 -0500 From: Ed Maste To: freebsd-hackers@freebsd.org Message-ID: <20070307230731.GA71684@sandvine.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-OriginalArrivalTime: 07 Mar 2007 23:07:31.0770 (UTC) FILETIME=[6275B9A0:01C7610D] Subject: Hung kernel from sysv semaphore semu_list corruption X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Mar 2007 23:19:33 -0000 Nightly tests on our 6.1-based installation using pgsql have resulted in a number of kernel hangs, due to a corrupt semu_list (the list ended up with a loop). It seems there are a few holes in the locking in the semaphore code. The issue we've encountered comes from semexit_myhook. It obtains a pointer to a list element after acquiring SEMUNDO_LOCK, and after dropping the lock manipulates the next pointer to remove the element from the list. The fix below solves our current problem. Any comments? --- RELENG_6/src/sys/kern/sysv_sem.c Tue Jun 7 01:03:27 2005 +++ swbuild_plt_boson/src/sys/kern/sysv_sem.c Tue Mar 6 16:13:45 2007 @@ -1259,16 +1259,17 @@ struct proc *p; { struct sem_undo *suptr; - struct sem_undo **supptr; /* * Go through the chain of undo vectors looking for one * associated with this process. */ SEMUNDO_LOCK(); - SLIST_FOREACH_PREVPTR(suptr, supptr, &semu_list, un_next) { - if (suptr->un_proc == p) + SLIST_FOREACH(suptr, &semu_list, un_next) { + if (suptr->un_proc == p) { + SLIST_REMOVE(&semu_list, suptr, sem_undo, un_next); break; + } } SEMUNDO_UNLOCK(); @@ -1328,8 +1329,9 @@ * Deallocate the undo vector. */ DPRINTF(("removing vector\n")); + SEMUNDO_LOCK(); suptr->un_proc = NULL; - *supptr = SLIST_NEXT(suptr, un_next); + SEMUNDO_UNLOCK(); } static int