Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Aug 2019 09:52:03 -0600
From:      Ian Lepore <ian@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Warner Losh <imp@bsdimp.com>, "freebsd-arm@FreeBSD.org" <freebsd-arm@freebsd.org>
Subject:   Re: dmtpps driver on beaglebone [was: Is it a good idea to use a usb-serial adapter for PPS input? Yes, it is.]
Message-ID:  <8220eafa44d1f3a4d38e3ecfd18f2ff9c7f3ce97.camel@freebsd.org>
In-Reply-To: <20190819152244.GN71821@kib.kiev.ua>
References:  <f1d765814f722fb2c99c9870b3cc2607b4eca2d7.camel@freebsd.org> <B9EFA4D4-C1AD-4181-B421-F6BD53434FA5@dons.net.au> <bfd784f6ac9c939dbe6a243336bc3b6eab02d4f5.camel@freebsd.org> <61B1AAF3-40F6-47BC-8F05-7491C13BF288@dons.net.au> <9E142F1A-5E8C-4410-91F5-7C80B3D0A15B@dons.net.au> <fc1ef92211cfc9aeff3b4bd4335d69ceda5e5223.camel@freebsd.org> <CANCZdfoVJ2ryUr%2B1uh9iCHKLWZO3MONAB9xv3MpWpzYqhycEyw@mail.gmail.com> <f4963626659c115be8306a079be798afd8f0b38e.camel@freebsd.org> <20190819152244.GN71821@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 2019-08-19 at 18:22 +0300, Konstantin Belousov wrote:
> On Mon, Aug 19, 2019 at 08:47:46AM -0600, Ian Lepore wrote:
> > I first ran into the problem on a beaglebone a few years ago, not that
> > long after the number was reduced to two.  The number 4 is pretty much
> > just a vague-ish memory of the debugging I did back then.  When you
> > don't have enough timehands, the symptom is that you lose most of the
> > pps events (but enough get captured that ntpd will more or less run
> > properly on the reduced data).
> > 
> > The busier the system is, the more events get captured succesfully,
> > because the loss is caused when the pps pulse happens while sleeping in
> > cpu_idle(); the path that unwinds out of there leads to calling
> > tc_windup 3 times, iirc, which means the value latched in the
> > timecounter hardware belongs to a set of timehands whose generation
> > count is not valid anymore because of all the tc_windup calls, and the
> > event gets discarded because of the generation mismatch.
> 
> Is the problem specific to the platform then ?

I think it is specific to timecounters that implement pps capture using
the polling mechanism (tc_poll_pps pointer is non-NULL), and I remember
that there was something about the sequence of events that made me
think it would only be a problem on a single-core system.  Or maybe
just way more likely on a single-core system (I've got similarly-
structured drivers on multicore systems that don't have this problem). 
But I don't think it was so much platform-specific as specific to that
pair of conditions.

> 
> The issue you see might come from tc_ticktock(), but it is strange.
> All calls to tc_windup() are serialized by spinlock. tc_ticktock(),
> potentially being called from interrupt context, uses try-lock to avoid
> spinning while other CPU does the update. But this is impossible on UP
> machine, because spinlock in top half disables interrupts, which means
> that event that triggers tc_ticktock() call and which trylock fails
> should be impossible, until the top half finishes.
> 
> Having too many timehands is bad because reader get a higher chance to
> find obsoleted th pointer.  With the rework of the generation counts for
> timehands reader would not use stalled data, but it might have to spin
> somewhat prolonged time.
> 
> In fact, it might be worth trying reducing the timehands count from two
> to one, since then there is no non-current th at all.

The driver involved is arm/ti/am335x/am335x_dmptpps.c.  The basic
sequence that occurs is:

The dmtpps_poll() function is called (via the timecounter.tc_poll_pps
pointer), and it calls pps_capture(), then schedules a taskqueue_fast
task to do the rest of the processing.  That task function calls
pps_event(), and most of the time pps_event() does nothing because it
hits this early out:

	/* If the timecounter was wound up underneath us, bail out. */
	if (pps->capgen == 0 || pps->capgen !=
	    atomic_load_acq_int(&pps->capth->th_generation))
		return;

I forget the exact sequence, but there was something about getting from
the point where the taskqueue task begins running to the point where it
calls pps_event() that almost always burned through 3 timehands
generations, but if the machine was busy enough, sometimes there would
only be one windup and the event would get processed properly.

I don't remember why I used a taskqueue task to separate the capture
and event processing.  Maybe so I could use a mutex to protect the pps
event state, and it was written before I extended the internal pps api
stuff to let drivers use a spin mutex with pps_ioctl().

-- Ian




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8220eafa44d1f3a4d38e3ecfd18f2ff9c7f3ce97.camel>