Date: Mon, 19 Aug 2019 09:52:03 -0600 From: Ian Lepore <ian@freebsd.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: Warner Losh <imp@bsdimp.com>, "freebsd-arm@FreeBSD.org" <freebsd-arm@freebsd.org> Subject: Re: dmtpps driver on beaglebone [was: Is it a good idea to use a usb-serial adapter for PPS input? Yes, it is.] Message-ID: <8220eafa44d1f3a4d38e3ecfd18f2ff9c7f3ce97.camel@freebsd.org> In-Reply-To: <20190819152244.GN71821@kib.kiev.ua> References: <f1d765814f722fb2c99c9870b3cc2607b4eca2d7.camel@freebsd.org> <B9EFA4D4-C1AD-4181-B421-F6BD53434FA5@dons.net.au> <bfd784f6ac9c939dbe6a243336bc3b6eab02d4f5.camel@freebsd.org> <61B1AAF3-40F6-47BC-8F05-7491C13BF288@dons.net.au> <9E142F1A-5E8C-4410-91F5-7C80B3D0A15B@dons.net.au> <fc1ef92211cfc9aeff3b4bd4335d69ceda5e5223.camel@freebsd.org> <CANCZdfoVJ2ryUr%2B1uh9iCHKLWZO3MONAB9xv3MpWpzYqhycEyw@mail.gmail.com> <f4963626659c115be8306a079be798afd8f0b38e.camel@freebsd.org> <20190819152244.GN71821@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 2019-08-19 at 18:22 +0300, Konstantin Belousov wrote: > On Mon, Aug 19, 2019 at 08:47:46AM -0600, Ian Lepore wrote: > > I first ran into the problem on a beaglebone a few years ago, not that > > long after the number was reduced to two. The number 4 is pretty much > > just a vague-ish memory of the debugging I did back then. When you > > don't have enough timehands, the symptom is that you lose most of the > > pps events (but enough get captured that ntpd will more or less run > > properly on the reduced data). > > > > The busier the system is, the more events get captured succesfully, > > because the loss is caused when the pps pulse happens while sleeping in > > cpu_idle(); the path that unwinds out of there leads to calling > > tc_windup 3 times, iirc, which means the value latched in the > > timecounter hardware belongs to a set of timehands whose generation > > count is not valid anymore because of all the tc_windup calls, and the > > event gets discarded because of the generation mismatch. > > Is the problem specific to the platform then ? I think it is specific to timecounters that implement pps capture using the polling mechanism (tc_poll_pps pointer is non-NULL), and I remember that there was something about the sequence of events that made me think it would only be a problem on a single-core system. Or maybe just way more likely on a single-core system (I've got similarly- structured drivers on multicore systems that don't have this problem). But I don't think it was so much platform-specific as specific to that pair of conditions. > > The issue you see might come from tc_ticktock(), but it is strange. > All calls to tc_windup() are serialized by spinlock. tc_ticktock(), > potentially being called from interrupt context, uses try-lock to avoid > spinning while other CPU does the update. But this is impossible on UP > machine, because spinlock in top half disables interrupts, which means > that event that triggers tc_ticktock() call and which trylock fails > should be impossible, until the top half finishes. > > Having too many timehands is bad because reader get a higher chance to > find obsoleted th pointer. With the rework of the generation counts for > timehands reader would not use stalled data, but it might have to spin > somewhat prolonged time. > > In fact, it might be worth trying reducing the timehands count from two > to one, since then there is no non-current th at all. The driver involved is arm/ti/am335x/am335x_dmptpps.c. The basic sequence that occurs is: The dmtpps_poll() function is called (via the timecounter.tc_poll_pps pointer), and it calls pps_capture(), then schedules a taskqueue_fast task to do the rest of the processing. That task function calls pps_event(), and most of the time pps_event() does nothing because it hits this early out: /* If the timecounter was wound up underneath us, bail out. */ if (pps->capgen == 0 || pps->capgen != atomic_load_acq_int(&pps->capth->th_generation)) return; I forget the exact sequence, but there was something about getting from the point where the taskqueue task begins running to the point where it calls pps_event() that almost always burned through 3 timehands generations, but if the machine was busy enough, sometimes there would only be one windup and the event would get processed properly. I don't remember why I used a taskqueue task to separate the capture and event processing. Maybe so I could use a mutex to protect the pps event state, and it was written before I extended the internal pps api stuff to let drivers use a spin mutex with pps_ioctl(). -- Ian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8220eafa44d1f3a4d38e3ecfd18f2ff9c7f3ce97.camel>