Date: Thu, 19 Jun 2008 09:57:51 -0600 From: James Gritton <jamie@gritton.org> To: freebsd-hackers@freebsd.org Cc: freebsd-stable@freebsd.org, John Baldwin <jhb@freebsd.org> Subject: Re: FreeBSD 6.3 deadlock (vm_map?) with DDB output Message-ID: <485A81FF.1000000@gritton.org> In-Reply-To: <200806180917.05689.jhb@freebsd.org> References: <20080615112318.146C1F18512@mx.npubs.com> <200806180917.05689.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
John Baldwin wrote: > On Sunday 15 June 2008 07:23:19 am Stef Walter wrote: > >> I've been trying to track down a deadlock on some newish production >> servers running FreeBSD 6.3-RELEASE-p2. The deadlock occurs on a >> specific (although mundane) hardware configuration, and each of several >> servers running this hardware deadlock about once per week. >> >> Although I suspect that this is not hardware related, from a (naive) >> perusal of the attached stack traces. >> >> Forgive me if my interpretation of this is all wrong, but I'm pretty >> desperate for help. So here's my basic understanding of the deadlock: >> >> These processes seem to be waiting on the page queue mutex: >> sendmail (in vm_mmap > vm_map_find > vm_map_insert > vm_map_pmap_enter) >> bsnmpd (in malloc, uma_large_malloc > page_alloc > kmem_malloc) >> httpd (in trap > trap_pfault > vm_fault) >> [g_up] (in g_vfs_done > bufdone) >> >> The page queue mutex is held by rsync process: >> rsync (in trap > trap_pfault > vm_fault > pmap_enter) >> >> Rsync kernel process (in pmap_enter) was interrupted while holding the >> page queue lock? >> >> >> Giant is enabled in loader.conf due to the needs of the pf firewall when >> dealing with user credentials lookups. I do not believe that Giant plays >> into this deadlock. Kernel config attached. >> >> Any and all help or info is welcome. Thanks in advance. >> > > Try this change: > > jhb 2007-10-27 22:07:40 UTC > > FreeBSD src repository > > Modified files: > sys/kern sched_4bsd.c > Log: > Change the roundrobin implementation in the 4BSD scheduler to trigger a > userland preemption directly from hardclock() via sched_clock() when a > thread uses up a full quantum instead of using a periodic timeout to cause > a userland preemption every so often. This fixes a potential deadlock > when IPI_PREEMPTION isn't enabled where softclock blocks on a lock held > by a thread pinned or bound to another CPU. The current thread on that > CPU will never be preempted while softclock is blocked. > > Note that ULE already drives its round-robin userland preemption from > sched_clock() as well and always enables IPI_PREEMPT. > > MFC after: 1 week > > Revision Changes Path > 1.108 +8 -29 src/sys/kern/sched_4bsd.c > > We use it at work on 6.x. W/o this fix, round-robin stops working on 4BSD > when softclock() (swi4: clock) blocks on a lock like Giant. > I've been seeing similar troubles on 6.2 and I'll have to give this a try as we upgrade to 6.3. I notice "MFC after: 1 week" in the log; it's been a week - any chance of seeing this fix rolled into 6.x? - Jamie
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?485A81FF.1000000>