From owner-freebsd-stable@FreeBSD.ORG Mon Jun 23 18:52:32 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 001951065680; Mon, 23 Jun 2008 18:52:31 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 5268D8FC12; Mon, 23 Jun 2008 18:52:31 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [IPv6:::1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m5NIqN7A036036; Mon, 23 Jun 2008 14:52:24 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: James Gritton Date: Mon, 23 Jun 2008 14:51:52 -0400 User-Agent: KMail/1.9.7 References: <20080615112318.146C1F18512@mx.npubs.com> <200806180917.05689.jhb@freebsd.org> <485A81FF.1000000@gritton.org> In-Reply-To: <485A81FF.1000000@gritton.org> MIME-Version: 1.0 Content-Disposition: inline Message-Id: <200806231451.52340.jhb@freebsd.org> Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [IPv6:::1]); Mon, 23 Jun 2008 14:52:24 -0400 (EDT) X-Virus-Scanned: ClamAV 0.93.1/7542/Mon Jun 23 12:42:14 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,NO_RELAYS autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org Subject: Re: FreeBSD 6.3 deadlock (vm_map?) with DDB output X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jun 2008 18:52:32 -0000 On Thursday 19 June 2008 11:57:51 am James Gritton wrote: > John Baldwin wrote: > > On Sunday 15 June 2008 07:23:19 am Stef Walter wrote: > > > >> I've been trying to track down a deadlock on some newish production > >> servers running FreeBSD 6.3-RELEASE-p2. The deadlock occurs on a > >> specific (although mundane) hardware configuration, and each of several > >> servers running this hardware deadlock about once per week. > >> > >> Although I suspect that this is not hardware related, from a (naive) > >> perusal of the attached stack traces. > >> > >> Forgive me if my interpretation of this is all wrong, but I'm pretty > >> desperate for help. So here's my basic understanding of the deadlock: > >> > >> These processes seem to be waiting on the page queue mutex: > >> sendmail (in vm_mmap > vm_map_find > vm_map_insert > vm_map_pmap_enter) > >> bsnmpd (in malloc, uma_large_malloc > page_alloc > kmem_malloc) > >> httpd (in trap > trap_pfault > vm_fault) > >> [g_up] (in g_vfs_done > bufdone) > >> > >> The page queue mutex is held by rsync process: > >> rsync (in trap > trap_pfault > vm_fault > pmap_enter) > >> > >> Rsync kernel process (in pmap_enter) was interrupted while holding the > >> page queue lock? > >> > >> > >> Giant is enabled in loader.conf due to the needs of the pf firewall when > >> dealing with user credentials lookups. I do not believe that Giant plays > >> into this deadlock. Kernel config attached. > >> > >> Any and all help or info is welcome. Thanks in advance. > >> > > > > Try this change: > > > > jhb 2007-10-27 22:07:40 UTC > > > > FreeBSD src repository > > > > Modified files: > > sys/kern sched_4bsd.c > > Log: > > Change the roundrobin implementation in the 4BSD scheduler to trigger a > > userland preemption directly from hardclock() via sched_clock() when a > > thread uses up a full quantum instead of using a periodic timeout to cause > > a userland preemption every so often. This fixes a potential deadlock > > when IPI_PREEMPTION isn't enabled where softclock blocks on a lock held > > by a thread pinned or bound to another CPU. The current thread on that > > CPU will never be preempted while softclock is blocked. > > > > Note that ULE already drives its round-robin userland preemption from > > sched_clock() as well and always enables IPI_PREEMPT. > > > > MFC after: 1 week > > > > Revision Changes Path > > 1.108 +8 -29 src/sys/kern/sched_4bsd.c > > > > We use it at work on 6.x. W/o this fix, round-robin stops working on 4BSD > > when softclock() (swi4: clock) blocks on a lock like Giant. > > > > I've been seeing similar troubles on 6.2 and I'll have to give this a > try as we upgrade to 6.3. I notice "MFC after: 1 week" in the log; it's > been a week - any chance of seeing this fix rolled into 6.x? If people confirm it fixes issues I will MFC it. There was some pushback when I first committed it so I waited on the MFC. -- John Baldwin