Date: Wed, 18 Jun 2008 09:17:05 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-hackers@freebsd.org, stef@memberwebs.com Cc: freebsd-stable@freebsd.org Subject: Re: FreeBSD 6.3 deadlock (vm_map?) with DDB output Message-ID: <200806180917.05689.jhb@freebsd.org> In-Reply-To: <20080615112318.146C1F18512@mx.npubs.com> References: <20080615112318.146C1F18512@mx.npubs.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sunday 15 June 2008 07:23:19 am Stef Walter wrote:
> I've been trying to track down a deadlock on some newish production
> servers running FreeBSD 6.3-RELEASE-p2. The deadlock occurs on a
> specific (although mundane) hardware configuration, and each of several
> servers running this hardware deadlock about once per week.
>
> Although I suspect that this is not hardware related, from a (naive)
> perusal of the attached stack traces.
>
> Forgive me if my interpretation of this is all wrong, but I'm pretty
> desperate for help. So here's my basic understanding of the deadlock:
>
> These processes seem to be waiting on the page queue mutex:
> sendmail (in vm_mmap > vm_map_find > vm_map_insert > vm_map_pmap_enter)
> bsnmpd (in malloc, uma_large_malloc > page_alloc > kmem_malloc)
> httpd (in trap > trap_pfault > vm_fault)
> [g_up] (in g_vfs_done > bufdone)
>
> The page queue mutex is held by rsync process:
> rsync (in trap > trap_pfault > vm_fault > pmap_enter)
>
> Rsync kernel process (in pmap_enter) was interrupted while holding the
> page queue lock?
>
>
> Giant is enabled in loader.conf due to the needs of the pf firewall when
> dealing with user credentials lookups. I do not believe that Giant plays
> into this deadlock. Kernel config attached.
>
> Any and all help or info is welcome. Thanks in advance.
Try this change:
jhb 2007-10-27 22:07:40 UTC
FreeBSD src repository
Modified files:
sys/kern sched_4bsd.c
Log:
Change the roundrobin implementation in the 4BSD scheduler to trigger a
userland preemption directly from hardclock() via sched_clock() when a
thread uses up a full quantum instead of using a periodic timeout to cause
a userland preemption every so often. This fixes a potential deadlock
when IPI_PREEMPTION isn't enabled where softclock blocks on a lock held
by a thread pinned or bound to another CPU. The current thread on that
CPU will never be preempted while softclock is blocked.
Note that ULE already drives its round-robin userland preemption from
sched_clock() as well and always enables IPI_PREEMPT.
MFC after: 1 week
Revision Changes Path
1.108 +8 -29 src/sys/kern/sched_4bsd.c
We use it at work on 6.x. W/o this fix, round-robin stops working on 4BSD
when softclock() (swi4: clock) blocks on a lock like Giant.
--
John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200806180917.05689.jhb>
