From owner-freebsd-arch@FreeBSD.ORG  Fri Jan 18 11:11:38 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 8FED5180;
 Fri, 18 Jan 2013 11:11:38 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail27.syd.optusnet.com.au (mail27.syd.optusnet.com.au
 [211.29.133.168])
 by mx1.freebsd.org (Postfix) with ESMTP id B86BCF10;
 Fri, 18 Jan 2013 11:11:37 +0000 (UTC)
Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au
 (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106])
 by mail27.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r0IBBIhL001611
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Fri, 18 Jan 2013 22:11:20 +1100
Date: Fri, 18 Jan 2013 22:11:18 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Ian Lepore <ian@freebsd.org>
Subject: Re: [RFC/RFT] calloutng
In-Reply-To: <1358432027.32417.213.camel@revolution.hippie.lan>
Message-ID: <20130118202123.Y1839@besplex.bde.org>
References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org> 
 <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de>
 <50DB4EFE.2020600@FreeBSD.org> <20130106152313.GD26039@alchemy.franken.de>
 <50EBF921.2000304@FreeBSD.org> <20130113180940.GM26039@alchemy.franken.de>
 <50F30CAB.3000001@FreeBSD.org>  <20130114102118.V1045@besplex.bde.org>
 <1358432027.32417.213.camel@revolution.hippie.lan>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=Zty1sKHG c=1 sm=1 a=kj9zAlcOel0A:10
 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=mUAV9h2nInsA:10
 a=_K3Ihd90oorLvW07haYA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117
Cc: Davide Italiano <davide@freebsd.org>, Alexander Motin <mav@freebsd.org>,
 freebsd-arch@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>,
 Marius Strobl <marius@alchemy.franken.de>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Jan 2013 11:11:38 -0000

On Thu, 17 Jan 2013, Ian Lepore wrote:

> On Mon, 2013-01-14 at 11:38 +1100, Bruce Evans wrote:

>> Er, timecounters are called with a spin mutex held in existing code:
>> though it is dangerous to do so, timecounters are called from fast
>> interrupt handlers for very timekeeping-critical purposes:
>> - to implement the TIOCTIMESTAMP ioctl (except this is broken in
>>    -current).  This was a primitive version of pps timestamping.
>> - for pps timestamping.  The interrupt handler (which should be a fast
>>    interrupt handler to minimize latency) calls pps_capture() which
>>    calls tc_get_timecount() and does other "lock-free" accesses to the
>>    timecounter state.  This still works in -current (at least there is
>>    still code for it).
>
> Unfortunately, calling pps_capture() in the primary interrupt context is
> no longer an option with the stock pps driver.  Ever since the ppbus
> rewrite all ppbus children must use threaded handlers.  I tried to fix
> that a couple different ways, and both ended up with crazy-complex code

Hmm, I didn't notice that ppc supported pps (I try not to look at it since
it is ugly :-), and don't know of any version of it that uses non-threaded
handlers (except in FreeBSD-4 before, where normal interrupt handlers
were non-threaded, so ppc had their high latency but not the even higher
latency and overheads of threaded handlers).

OTOH, my x86 RTC interrupt handler is threaded and supports pps, and
I haven't noticed any latency problems with this.  It just can't
possibly give the < ~1 usec jitter that FreeBSD-[3-4] could give ~15
years ago using a fast interrupt handler (there must be only 1 device
using a fast interrupt handler, with this dedicated to pps, else the
multiple fast interrupt handlers will give latency much larger than
~1 usec to each other.  I don't actually use this for anything except
testing whether the RTC can be used for a poor man's pps.

> scattered around the ppbus family just to support the rarely-used pps
> capture.  It would have been easier to do if filter and threaded
> interrupt handlers had the same function signature.
>
> I ended up writting a separate driver that can be used instead of ppc +
> ppbus + pps, since anyone who cares about precise pps capture is
> unlikely to be sharing the port with a printer or plip device or some
> such.

Probably all pps handlers should be special.  On x86 with reasonable
timecounter hardware, say a TSC, it takes about 10 instructions for
an entire pps interrupt handler:

XintrN:
 	pushl	%eax
 	pushl	%edx
 	rdtsc
 	# Need some ugliness for EIO here or later.
 	ss:movl	%eax,ppscap	# Hopefully lock-free via time-domain locking.
 	ss:movl	%edx,ppscap+4
 	popl	%edx
 	popl	%eax
 	iret

After capturing the timecounter hardware value here, you convert it
to a pps event at leisure.  But since this only happens once per second,
it wouldn't be very inefficient to turn the interrupt handler into a
slow high-latency one, even a threaded one, to handle the pps event
and/or other devices attached to the interrupt.

>>    OTOH, all drivers that call pps_capture() from their interrupt handler
>>    then immediately call pps_event().  This has always been very broken,
>>    and became even more broken with SMPng.  pps_event() does many more
>>    timecounter and pps accesses whose locking is unclear at best, and
>>    in some configurations it calls hardpps(), which is only locked by
>>    Giant, despite comments in kern_ntptime.c still saying that it (and
>>    many other functions in kern_ntptime.c) must be called at splclock()
>>    or higher.  splclock() is of course now null, but the locking
>>    requirements in kern_ntptime.c haven't changed much.  kern_ntptime.c
>>    always needed to be locked by the equivalent of a spin mutex, which
>>    is stronger locking than was given by splclock().  pps_event() would
>>    have to aquire the spin mutex before calling hardpps(), although
>>    this is bad for fast interrupt handlers.  The correct implementation
>>    is probably to only do the capture part from fast interrupt handlers.
>
> In my rewritten dedicated pps driver I call pps_capture() from the
> filter handler and pps_event() from the threaded handler.  I never found

That seems right.

> any good documentation on the low-level details of this stuff, and there
> isn't enough good example code to work from.  My hazy memory is that I

THere seem to be no good examples.

> ended up studying the pps_capture() and pps_event() code enough to infer
> that their design intent seems to be to allow you to capture with no
> locking and do the event processing later in some sort of deferred or
> threaded context.

That seems to be the design, but there are no examples of separating
the event from the capture.

I think the correct locking is:
- capture in a fast interrupt handler, into a per-device state that
   is locked by whatever locks all of the state accessed by the fast
   interrupt handler
- switch to a less critical context later:
   - lock this step agains reentrance from itself
   - re-acquire the fast interrupt handler's lock, as strictly necessary
     to access the state locked by that, and copy it to local state.
     Release the fast interrupt handler's lock.  This locking can probably
     be skipped since the capture only happens once per second and it is
     possible to arrange doing this step somewhere in between the
     captures, but that would be more complicated and fragile.
   - acquire the lock that protects all the state accessed by pps_event().
     This is currently Giant for at least the hardpps() parts.  This
     restricts the contexts that can call pps_capture().

Of course, kern_ntptime.c should use finer locking than Giant, so as to not
restrict its callers so much.  Its Giant locking is also violated by calling
ntp_update_second() from the tc_windup() filter.  It's surprising that the
races here are so rarely harmful that no one has reported them being lost.
Maybe they are only lost on leap seconds synchronized with new moons.

>> ...
>> The spinlock in the i8254 timecounter certainly breaks some cases.
>> For example, suppose the lock is held for a timecounter read from
>> normal context.  It masks hardware interrupts on the current CPU (except
>> in my version).  It doesn't mask NMIs or other traps.  So if the NMI
>> or other trap handler does a timecounter hardware call, there is
>> deadlock in at least the !SMP case.  In my version, it blocks normal
>> interrupts later if they occur, but doesn't block fast interrupts, so
>> the pps_capture() call would deadlock if it occurs, like a timecounter
>> call from an NMI.  I avoid this by not using pps in any fast interrupt
>> handler, and by only using the i8254 timecounter for testing.  I do
>> use pps in a (nonstandard) x86 RTC clock interrupt handler.  My clock
>> interrupt handlers are all non-fast to avoid this and other locking
>> problems.
>
> Hrm, now you've got me a bit worried about capturing in the primary
> context.  Not that I have much option, on a 300mhz Geode and similar
> wimpy embedded processors there's enough latency on a theaded handler
> that the pps signal can be de-asserted by time the handler runs
> (precision timing gear often outputs a very narrow pps pulse, 1 - 10uS
> isn't uncommon).

This reminds me that the old Centronics printer interface has similar
timing.  This caused "lost" printer interrupts on x86 (due to the 8259
not latching them?), and the polling to recover from this was handled
poorly by drivers in 386BSD and Linux.  If the hardware does latch
things, then there would still be a problem determining if the interrupt
is for you if it is shared.  In the Centronics interface mapped to
x86, the interrupt signal can be polled for, but it is of course hard
to see since it is short.

> I know I don't have to worry about NMIs on the systems in question, but
> I'm not so sure about "other trap handler".

There aren't many except machine check and debugger traps on x86.  Ones
like pagefaults shouldn't occur while holding locks.

Bruce