From owner-freebsd-current@FreeBSD.ORG  Fri Jun  4 14:28:00 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6D7F216A4CE
	for <freebsd-current@FreeBSD.org>;
	Fri,  4 Jun 2004 14:28:00 -0700 (PDT)
Received: from root.org (root.org [67.118.192.226])
	by mx1.FreeBSD.org (Postfix) with SMTP id 59E2E43D41
	for <freebsd-current@FreeBSD.org>;
	Fri,  4 Jun 2004 14:28:00 -0700 (PDT)	(envelope-from nate@root.org)
Received: (qmail 52772 invoked by uid 1000); 4 Jun 2004 21:27:52 -0000
Date: Fri, 4 Jun 2004 14:27:52 -0700 (PDT)
From: Nate Lawson <nate@root.org>
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <200406041646.23134.jhb@FreeBSD.org>
Message-ID: <20040604135848.K52586@root.org>
References: <714DBA6C-B565-11D8-B32B-000393AB07D8@verizon.net>
 <200406041108.33170.jhb@FreeBSD.org> <20040604123934.D52021@root.org>
 <200406041646.23134.jhb@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-current@FreeBSD.org
Subject: Re: reboot or shutdown not working with -current
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Jun 2004 21:28:00 -0000

On Fri, 4 Jun 2004, John Baldwin wrote:
> On Friday 04 June 2004 03:46 pm, Nate Lawson wrote:
> > My goal with this originally was to drain all threads out of the idle
> > handler before continuing the shutdown process.  The assumption that
> > changed with the ithread commit was that the idle thread gets to run
> > sometime after an interrupt occurs.  It's actually kind of tough to have
> > a sched switch before any instructions get to execute after the "go to
> > sleep" one since I profile the length of the sleep to figure out how deep
> > a sleep to use the next cycle.  This is to keep sporadically loaded
> > machines responsive.  So if a preemption can happen every sleep, I'll have
> > to redo this approach and go with one that doesn't require profiling and
> > executes no code after the sleep.
>
> I think your current code is fine though.  As soon as you send out the IPI,
> any CPUs that are idle will bounce.  Is there anything dangerous after you
> enable interrupts again on the processor?  For C1, you don't seem to go near
> any ACPI-specific code after resume, and for C2/C3 you seem to defer enabling
> interrupts until you have read the counters (and I assume that C2/C3 resume
> doesn't actually handle the interrupt until you do sti) so you shouldn't have
> to worry about the CPU doing the shutdown since if it ever returns to the
> idlethread it will already be out of the critical section.  Thus, I don't
> think you need the while loop or busy counter at all.

You're right about that.  I had forgotten only C1 automatically re-enables
interrupts.  For C2/C3, the IO port read merely returns and it's up to the
caller to re-enable interrupts.

After looking at the code again, the refcount was to protect accesses to
the softc->cpu_cx_states array.  I assumed that the softc cannot be
touched after device_shutdown returns 0 since the device and its softc
will be gone.  So a thread is sleeping and we IPI all processors and the
idle thread gets switched away from, it may get switched back to after
device_shutdown has complete and the softc is invalid memory.  Then the
array access at the bottom would be invalid.

Another complicating factor that hasn't gone in yet but is sitting in my
tree pending careful analysis is shared use of a single processor's P_LVLx
register.  This is for future machines that have only one P_BLK defined
but shared between multiple processors.  For those machines, only
acpi_cpu0 will have a valid sc->cpu_cx_states array and other processors
will all share it if their P_BLKs are null.  With the global refcount, my
goal was to prevent modifying any idle-related data in the softc (i.e.
re-evaluating registers in response to a notify on _CST) if any threads
are currently idle.  Instead, the appropriate response was to disable
entry to the idle function, evict all threads, make the mods, then
re-enable idle.  (This function in my local tree has been generalized to
acpi_cpu_evict and is used in more than device_shutdown.)  Since this is a
really rare event (i.e. when changing AC line status) but the idle thread
running is really common (every 10 us), using a mutex or sx lock is not as
good.

So I guess I still need a way to guarantee other threads are no longer in
the idle function after calling smp_rendezvous. Since smp_rendezvous
ensures that every CPU has "checked in" before returning, that gets me the
guarantee that device_shutdown won't return before every idle thread has
been awoken.  But that's still not good enough because when the idle
thread is resumed, it will continue after the sti instruction and
reference invalid data at the end of the function.  I'm thinking I can
add a check that idle has been disabled after the call to sleep but before
interrupts are enabled.  If disabled, return immediately and skip the
profiling step.  If the thread was woken via acpi_cpu_evict(), then idling
will always be disabled at this point and the thread will exit immediately
(and not re-enter) without accessing the softc.  If the thread woke up
normally, there's a harmless race.  1. If acpi_cpu_evict() has disabled
entry to idle but not called smp_rendezvous() yet, then the thread will
exit without recording stats.  2.  If acpi_cpu_evict() has not disabled
entry to idle, then it will continue executing the rest of the function
normally.  Hmm, I'll need to think about this a little.

The main goal is to do as little as possible after sleeping as this adds
latency to handling whatever thread is ready to run.  It's ok to do extra
work up front before going to sleep though since the system is idle
anyway.

-Nate