Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 9 Jun 2007 12:20:19 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        John Baldwin <jhb@freebsd.org>
Cc:        src-committers@freebsd.org, Kip Macy <kip.macy@gmail.com>, cvs-all@freebsd.org, Attilio Rao <attilio@freebsd.org>, Bruce Evans <brde@optusnet.com.au>, cvs-src@freebsd.org, Kostik Belousov <kostikbel@gmail.com>, Jeff Roberson <jroberson@chesapeake.net>
Subject:   Re: cvs commit: src/sys/kern kern_mutex.c
Message-ID:  <20070609112753.C15075@besplex.bde.org>
In-Reply-To: <200706081210.24393.jhb@freebsd.org>
References:  <200706051420.l55EKEih018925@repoman.freebsd.org> <200706071059.41466.jhb@freebsd.org> <20070609002752.M13082@besplex.bde.org> <200706081210.24393.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 8 Jun 2007, John Baldwin wrote:

> On Friday 08 June 2007 10:50:15 am Bruce Evans wrote:
>> On Thu, 7 Jun 2007, John Baldwin wrote:

>>> Hmm, one problem with the non-preemption page zero is that it doesn't
> yield
>>> the lock when it voluntarily yields.  I wonder if something like this
> patch
>>> would help things for the non-preemption case:
>>
>> This works well.  It fixed the extra 1.4 million voluntary context switches
>> and even reduced the number by 10-100k.  The real runtime is still up a bit,
>> but user+sys+pgzero time is down a bit.

Some numbers are:
- real time up from 827 (best time on June 4 kernel) to 835 seconds (current
   with this patch and others in common with June 4 kernel)
- real mysteriously time up from 827 seconds to 832 even with the June 4
   kernel.

>>> Index: vm_zeroidle.c
>>> ===================================================================
>>> RCS file: /usr/cvs/src/sys/vm/vm_zeroidle.c,v
>>> retrieving revision 1.45
>>> diff -u -r1.45 vm_zeroidle.c
>>> --- vm_zeroidle.c       18 May 2007 07:10:50 -0000      1.45
>>> +++ vm_zeroidle.c       7 Jun 2007 14:56:02 -0000
>>> @@ -147,8 +147,10 @@
>>> #ifndef PREEMPTION
>>>                        if (sched_runnable()) {
>>>                                mtx_lock_spin(&sched_lock);
>>> +                               mtx_unlock(&vm_page_queue_free_mtx);
>>>                                mi_switch(SW_VOL, NULL);
>>>                                mtx_unlock_spin(&sched_lock);
>>> +                               mtx_lock(&vm_page_queue_free_mtx);
>>>                        }
>>> #endif
>>>                } else {
>>
>> The sched_locks are now of course thread_locks.  I put the vm unlock
>> before the thread lock since the above order seems to risk a LOR.  That
>> may have been a mistake -- we would prefer not to be switched after
>> deciding to do it ourself.
>
> Actually, the order is on purpose so that we don't waste time doing a
> preemption when we're about to switch anyway.

OK, changed back (not tested with it changed back).

What about after returning from mi_switch()?  The spinlock must be dropped
before acquiring the sleep lock, but that wastes some time if preemption
occurs.  Here we expect to have more work to do and it seems best to
ensure doing at least one page of work per context switch if possible.

>>>> Index: vm_zeroidle.c
>>> ===================================================================
>>> RCS file: /usr/cvs/src/sys/vm/vm_zeroidle.c,v
>>> retrieving revision 1.45
>>> diff -u -r1.45 vm_zeroidle.c
>>> --- vm_zeroidle.c       18 May 2007 07:10:50 -0000      1.45
>>> +++ vm_zeroidle.c       7 Jun 2007 14:58:39 -0000
>>> @@ -110,8 +110,10 @@
>>>        if (m != NULL && (m->flags & PG_ZERO) == 0) {
>>>                vm_pageq_remove_nowakeup(m);
>>>                mtx_unlock(&vm_page_queue_free_mtx);
>>> +               critical_exit();
>>>                pmap_zero_page_idle(m);
>>>                mtx_lock(&vm_page_queue_free_mtx);
>>> +               critical_enter();
>>>                m->flags |= PG_ZERO;
>>>                vm_pageq_enqueue(PQ_FREE + m->pc, m);
>>>                ++vm_page_zero_count;
>>
>> Next I will try this.  I put the critical_exit() before the vm unlock.
>> mtx_unlock() should be allowed to switch if it wants.  However, we
>> would prefer to keep context switches disabled in the above -- just drop
>> the lock so that other CPUs can proceed.
>
> Again, the order here is on purpose to make sure we don't preempt while
> holding the lock.

critical_enter(9) says that critical regions should not be interlocked
with other synchronization primitives like this.  It makes an exception
for spin locks, but vm_page_queue_free_mtx is a sleep lock.

I think this is as good a place to preempt as any.  In fact, the code
can be simplified by preempting here and not in the main loop.  The
vm mutex is already dropped here, so extra code isn't needed to drop
in the main loop.  In the !PREEMPTION case, check sched_runnable()
here.  In the PREEMPTION case, can you explain why involuntary preemption
never worked right with SMP for SCHED_4BSD?  I thought that it was
because SCHED_4BSD doesn't do enough IPIs, so it doesn't preempt idle
threads running on other CPUs, but shouldn't preemption occur when the
vm mutex is unlocked in the above, if sched_runnable() is true and
there is something better to run than pagezero?

>>> @@ -141,20 +143,25 @@
>>>        idlezero_enable = idlezero_enable_default;
>>>
>>>        mtx_lock(&vm_page_queue_free_mtx);
>>> +       critical_enter();
>>>        for (;;) {
>>>                if (vm_page_zero_check()) {
>>>                        vm_page_zero_idle();
>>> #ifndef PREEMPTION
>>>                        if (sched_runnable()) {
>>>                                mtx_lock_spin(&sched_lock);
>>> +                               mtx_unlock(&vm_page_queue_free_mtx);
>>>                                mi_switch(SW_VOL, NULL);
>>>                                mtx_unlock_spin(&sched_lock);
>>> +                               mtx_lock(&vm_page_queue_free_mtx);
>>>                        }
>>> #endif
>>
>> I added the necessary critical_exit/enter() calls here.
>
> They aren't needed in the !PREEMPTION case since the context switch is
> explicit.  The critical sections are only needed for the PREEMPTION case
> really.

They are needed to pass the KASSERT(td_critnest == 1 || ...)  in mi_switch().

PREEMPTION ifdefs on the critical sections would be messy.

Next I will try moving the PREEMPTION code to where the vm mutex is dropped
naturally.  I will try the following order:

 	// Stay in critical section.
 	drop vm mutex         // though critical_enter(9) says this is wrong
 	zero the page         // intentionally in critical section
#ifdef PREEMPTION(?)          // even more needed now that the critical
                               // prevents prevents a switch on the mutex drop
 	if (sched_runnable()) {
 		thread_lock(...)
 		critical_exit()    // if using critical sections with !PRE*
 		mi_switch(...)
 		thread_unlock(...)
 	} else
#endif
 		critical_exit()
 	// I would like to avoid another switch here if there was a switch
 	// above, but don't know how to.
 	acquire vm mutex
 	critical_enter()

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070609112753.C15075>