From owner-cvs-all@FreeBSD.ORG Sat Jun 9 02:20:31 2007 Return-Path: X-Original-To: cvs-all@freebsd.org Delivered-To: cvs-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EC71E16A400; Sat, 9 Jun 2007 02:20:31 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail36.syd.optusnet.com.au (mail36.syd.optusnet.com.au [211.29.133.76]) by mx1.freebsd.org (Postfix) with ESMTP id 676E613C44B; Sat, 9 Jun 2007 02:20:31 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail36.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l592KHX9010905 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 9 Jun 2007 12:20:20 +1000 Date: Sat, 9 Jun 2007 12:20:19 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: John Baldwin In-Reply-To: <200706081210.24393.jhb@freebsd.org> Message-ID: <20070609112753.C15075@besplex.bde.org> References: <200706051420.l55EKEih018925@repoman.freebsd.org> <200706071059.41466.jhb@freebsd.org> <20070609002752.M13082@besplex.bde.org> <200706081210.24393.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: src-committers@freebsd.org, Kip Macy , cvs-all@freebsd.org, Attilio Rao , Bruce Evans , cvs-src@freebsd.org, Kostik Belousov , Jeff Roberson Subject: Re: cvs commit: src/sys/kern kern_mutex.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Jun 2007 02:20:32 -0000 On Fri, 8 Jun 2007, John Baldwin wrote: > On Friday 08 June 2007 10:50:15 am Bruce Evans wrote: >> On Thu, 7 Jun 2007, John Baldwin wrote: >>> Hmm, one problem with the non-preemption page zero is that it doesn't > yield >>> the lock when it voluntarily yields. I wonder if something like this > patch >>> would help things for the non-preemption case: >> >> This works well. It fixed the extra 1.4 million voluntary context switches >> and even reduced the number by 10-100k. The real runtime is still up a bit, >> but user+sys+pgzero time is down a bit. Some numbers are: - real time up from 827 (best time on June 4 kernel) to 835 seconds (current with this patch and others in common with June 4 kernel) - real mysteriously time up from 827 seconds to 832 even with the June 4 kernel. >>> Index: vm_zeroidle.c >>> =================================================================== >>> RCS file: /usr/cvs/src/sys/vm/vm_zeroidle.c,v >>> retrieving revision 1.45 >>> diff -u -r1.45 vm_zeroidle.c >>> --- vm_zeroidle.c 18 May 2007 07:10:50 -0000 1.45 >>> +++ vm_zeroidle.c 7 Jun 2007 14:56:02 -0000 >>> @@ -147,8 +147,10 @@ >>> #ifndef PREEMPTION >>> if (sched_runnable()) { >>> mtx_lock_spin(&sched_lock); >>> + mtx_unlock(&vm_page_queue_free_mtx); >>> mi_switch(SW_VOL, NULL); >>> mtx_unlock_spin(&sched_lock); >>> + mtx_lock(&vm_page_queue_free_mtx); >>> } >>> #endif >>> } else { >> >> The sched_locks are now of course thread_locks. I put the vm unlock >> before the thread lock since the above order seems to risk a LOR. That >> may have been a mistake -- we would prefer not to be switched after >> deciding to do it ourself. > > Actually, the order is on purpose so that we don't waste time doing a > preemption when we're about to switch anyway. OK, changed back (not tested with it changed back). What about after returning from mi_switch()? The spinlock must be dropped before acquiring the sleep lock, but that wastes some time if preemption occurs. Here we expect to have more work to do and it seems best to ensure doing at least one page of work per context switch if possible. >>>> Index: vm_zeroidle.c >>> =================================================================== >>> RCS file: /usr/cvs/src/sys/vm/vm_zeroidle.c,v >>> retrieving revision 1.45 >>> diff -u -r1.45 vm_zeroidle.c >>> --- vm_zeroidle.c 18 May 2007 07:10:50 -0000 1.45 >>> +++ vm_zeroidle.c 7 Jun 2007 14:58:39 -0000 >>> @@ -110,8 +110,10 @@ >>> if (m != NULL && (m->flags & PG_ZERO) == 0) { >>> vm_pageq_remove_nowakeup(m); >>> mtx_unlock(&vm_page_queue_free_mtx); >>> + critical_exit(); >>> pmap_zero_page_idle(m); >>> mtx_lock(&vm_page_queue_free_mtx); >>> + critical_enter(); >>> m->flags |= PG_ZERO; >>> vm_pageq_enqueue(PQ_FREE + m->pc, m); >>> ++vm_page_zero_count; >> >> Next I will try this. I put the critical_exit() before the vm unlock. >> mtx_unlock() should be allowed to switch if it wants. However, we >> would prefer to keep context switches disabled in the above -- just drop >> the lock so that other CPUs can proceed. > > Again, the order here is on purpose to make sure we don't preempt while > holding the lock. critical_enter(9) says that critical regions should not be interlocked with other synchronization primitives like this. It makes an exception for spin locks, but vm_page_queue_free_mtx is a sleep lock. I think this is as good a place to preempt as any. In fact, the code can be simplified by preempting here and not in the main loop. The vm mutex is already dropped here, so extra code isn't needed to drop in the main loop. In the !PREEMPTION case, check sched_runnable() here. In the PREEMPTION case, can you explain why involuntary preemption never worked right with SMP for SCHED_4BSD? I thought that it was because SCHED_4BSD doesn't do enough IPIs, so it doesn't preempt idle threads running on other CPUs, but shouldn't preemption occur when the vm mutex is unlocked in the above, if sched_runnable() is true and there is something better to run than pagezero? >>> @@ -141,20 +143,25 @@ >>> idlezero_enable = idlezero_enable_default; >>> >>> mtx_lock(&vm_page_queue_free_mtx); >>> + critical_enter(); >>> for (;;) { >>> if (vm_page_zero_check()) { >>> vm_page_zero_idle(); >>> #ifndef PREEMPTION >>> if (sched_runnable()) { >>> mtx_lock_spin(&sched_lock); >>> + mtx_unlock(&vm_page_queue_free_mtx); >>> mi_switch(SW_VOL, NULL); >>> mtx_unlock_spin(&sched_lock); >>> + mtx_lock(&vm_page_queue_free_mtx); >>> } >>> #endif >> >> I added the necessary critical_exit/enter() calls here. > > They aren't needed in the !PREEMPTION case since the context switch is > explicit. The critical sections are only needed for the PREEMPTION case > really. They are needed to pass the KASSERT(td_critnest == 1 || ...) in mi_switch(). PREEMPTION ifdefs on the critical sections would be messy. Next I will try moving the PREEMPTION code to where the vm mutex is dropped naturally. I will try the following order: // Stay in critical section. drop vm mutex // though critical_enter(9) says this is wrong zero the page // intentionally in critical section #ifdef PREEMPTION(?) // even more needed now that the critical // prevents prevents a switch on the mutex drop if (sched_runnable()) { thread_lock(...) critical_exit() // if using critical sections with !PRE* mi_switch(...) thread_unlock(...) } else #endif critical_exit() // I would like to avoid another switch here if there was a switch // above, but don't know how to. acquire vm mutex critical_enter() Bruce