From owner-freebsd-arch@FreeBSD.ORG Fri Jan 18 11:11:38 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8FED5180; Fri, 18 Jan 2013 11:11:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail27.syd.optusnet.com.au (mail27.syd.optusnet.com.au [211.29.133.168]) by mx1.freebsd.org (Postfix) with ESMTP id B86BCF10; Fri, 18 Jan 2013 11:11:37 +0000 (UTC) Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail27.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r0IBBIhL001611 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 18 Jan 2013 22:11:20 +1100 Date: Fri, 18 Jan 2013 22:11:18 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Ian Lepore Subject: Re: [RFC/RFT] calloutng In-Reply-To: <1358432027.32417.213.camel@revolution.hippie.lan> Message-ID: <20130118202123.Y1839@besplex.bde.org> References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org> <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <20130106152313.GD26039@alchemy.franken.de> <50EBF921.2000304@FreeBSD.org> <20130113180940.GM26039@alchemy.franken.de> <50F30CAB.3000001@FreeBSD.org> <20130114102118.V1045@besplex.bde.org> <1358432027.32417.213.camel@revolution.hippie.lan> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=Zty1sKHG c=1 sm=1 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=mUAV9h2nInsA:10 a=_K3Ihd90oorLvW07haYA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: Davide Italiano , Alexander Motin , freebsd-arch@freebsd.org, FreeBSD Current , Marius Strobl X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 11:11:38 -0000 On Thu, 17 Jan 2013, Ian Lepore wrote: > On Mon, 2013-01-14 at 11:38 +1100, Bruce Evans wrote: >> Er, timecounters are called with a spin mutex held in existing code: >> though it is dangerous to do so, timecounters are called from fast >> interrupt handlers for very timekeeping-critical purposes: >> - to implement the TIOCTIMESTAMP ioctl (except this is broken in >> -current). This was a primitive version of pps timestamping. >> - for pps timestamping. The interrupt handler (which should be a fast >> interrupt handler to minimize latency) calls pps_capture() which >> calls tc_get_timecount() and does other "lock-free" accesses to the >> timecounter state. This still works in -current (at least there is >> still code for it). > > Unfortunately, calling pps_capture() in the primary interrupt context is > no longer an option with the stock pps driver. Ever since the ppbus > rewrite all ppbus children must use threaded handlers. I tried to fix > that a couple different ways, and both ended up with crazy-complex code Hmm, I didn't notice that ppc supported pps (I try not to look at it since it is ugly :-), and don't know of any version of it that uses non-threaded handlers (except in FreeBSD-4 before, where normal interrupt handlers were non-threaded, so ppc had their high latency but not the even higher latency and overheads of threaded handlers). OTOH, my x86 RTC interrupt handler is threaded and supports pps, and I haven't noticed any latency problems with this. It just can't possibly give the < ~1 usec jitter that FreeBSD-[3-4] could give ~15 years ago using a fast interrupt handler (there must be only 1 device using a fast interrupt handler, with this dedicated to pps, else the multiple fast interrupt handlers will give latency much larger than ~1 usec to each other. I don't actually use this for anything except testing whether the RTC can be used for a poor man's pps. > scattered around the ppbus family just to support the rarely-used pps > capture. It would have been easier to do if filter and threaded > interrupt handlers had the same function signature. > > I ended up writting a separate driver that can be used instead of ppc + > ppbus + pps, since anyone who cares about precise pps capture is > unlikely to be sharing the port with a printer or plip device or some > such. Probably all pps handlers should be special. On x86 with reasonable timecounter hardware, say a TSC, it takes about 10 instructions for an entire pps interrupt handler: XintrN: pushl %eax pushl %edx rdtsc # Need some ugliness for EIO here or later. ss:movl %eax,ppscap # Hopefully lock-free via time-domain locking. ss:movl %edx,ppscap+4 popl %edx popl %eax iret After capturing the timecounter hardware value here, you convert it to a pps event at leisure. But since this only happens once per second, it wouldn't be very inefficient to turn the interrupt handler into a slow high-latency one, even a threaded one, to handle the pps event and/or other devices attached to the interrupt. >> OTOH, all drivers that call pps_capture() from their interrupt handler >> then immediately call pps_event(). This has always been very broken, >> and became even more broken with SMPng. pps_event() does many more >> timecounter and pps accesses whose locking is unclear at best, and >> in some configurations it calls hardpps(), which is only locked by >> Giant, despite comments in kern_ntptime.c still saying that it (and >> many other functions in kern_ntptime.c) must be called at splclock() >> or higher. splclock() is of course now null, but the locking >> requirements in kern_ntptime.c haven't changed much. kern_ntptime.c >> always needed to be locked by the equivalent of a spin mutex, which >> is stronger locking than was given by splclock(). pps_event() would >> have to aquire the spin mutex before calling hardpps(), although >> this is bad for fast interrupt handlers. The correct implementation >> is probably to only do the capture part from fast interrupt handlers. > > In my rewritten dedicated pps driver I call pps_capture() from the > filter handler and pps_event() from the threaded handler. I never found That seems right. > any good documentation on the low-level details of this stuff, and there > isn't enough good example code to work from. My hazy memory is that I THere seem to be no good examples. > ended up studying the pps_capture() and pps_event() code enough to infer > that their design intent seems to be to allow you to capture with no > locking and do the event processing later in some sort of deferred or > threaded context. That seems to be the design, but there are no examples of separating the event from the capture. I think the correct locking is: - capture in a fast interrupt handler, into a per-device state that is locked by whatever locks all of the state accessed by the fast interrupt handler - switch to a less critical context later: - lock this step agains reentrance from itself - re-acquire the fast interrupt handler's lock, as strictly necessary to access the state locked by that, and copy it to local state. Release the fast interrupt handler's lock. This locking can probably be skipped since the capture only happens once per second and it is possible to arrange doing this step somewhere in between the captures, but that would be more complicated and fragile. - acquire the lock that protects all the state accessed by pps_event(). This is currently Giant for at least the hardpps() parts. This restricts the contexts that can call pps_capture(). Of course, kern_ntptime.c should use finer locking than Giant, so as to not restrict its callers so much. Its Giant locking is also violated by calling ntp_update_second() from the tc_windup() filter. It's surprising that the races here are so rarely harmful that no one has reported them being lost. Maybe they are only lost on leap seconds synchronized with new moons. >> ... >> The spinlock in the i8254 timecounter certainly breaks some cases. >> For example, suppose the lock is held for a timecounter read from >> normal context. It masks hardware interrupts on the current CPU (except >> in my version). It doesn't mask NMIs or other traps. So if the NMI >> or other trap handler does a timecounter hardware call, there is >> deadlock in at least the !SMP case. In my version, it blocks normal >> interrupts later if they occur, but doesn't block fast interrupts, so >> the pps_capture() call would deadlock if it occurs, like a timecounter >> call from an NMI. I avoid this by not using pps in any fast interrupt >> handler, and by only using the i8254 timecounter for testing. I do >> use pps in a (nonstandard) x86 RTC clock interrupt handler. My clock >> interrupt handlers are all non-fast to avoid this and other locking >> problems. > > Hrm, now you've got me a bit worried about capturing in the primary > context. Not that I have much option, on a 300mhz Geode and similar > wimpy embedded processors there's enough latency on a theaded handler > that the pps signal can be de-asserted by time the handler runs > (precision timing gear often outputs a very narrow pps pulse, 1 - 10uS > isn't uncommon). This reminds me that the old Centronics printer interface has similar timing. This caused "lost" printer interrupts on x86 (due to the 8259 not latching them?), and the polling to recover from this was handled poorly by drivers in 386BSD and Linux. If the hardware does latch things, then there would still be a problem determining if the interrupt is for you if it is shared. In the Centronics interface mapped to x86, the interrupt signal can be polled for, but it is of course hard to see since it is short. > I know I don't have to worry about NMIs on the systems in question, but > I'm not so sure about "other trap handler". There aren't many except machine check and debugger traps on x86. Ones like pagefaults shouldn't occur while holding locks. Bruce