Date: Mon, 19 Dec 2011 01:42:05 +1100 From: Lawrence Stewart <lstewart@freebsd.org> To: freebsd-arch@freebsd.org Cc: Julien Ridoux <jrid@cubinlab.ee.unimelb.edu.au>, Benjamin Kaduk <kaduk@mit.edu>, Jung-uk Kim <jkim@freebsd.org> Subject: Re: System clock APIs and feed-forward clock integration with BPF Message-ID: <4EEDFBBD.9050809@freebsd.org> In-Reply-To: <4EE9679A.70909@freebsd.org> References: <201111210417.pAL4HOdi023556@svn.freebsd.org> <4ED8575D.2010405@freebsd.org> <201112021927.25234.jkim@FreeBSD.org> <201112022002.49618.jkim@FreeBSD.org> <4EDB6814.9040403@freebsd.org> <405E9C58-F999-45C2-BC20-81ED4CA8A3E9@cubinlab.ee.unimelb.edu.au> <4EE9679A.70909@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 12/15/11 01:30, Julien Ridoux wrote: >> >> On 04/12/2011, at 11:31 PM, Lawrence Stewart wrote: >> >>> On 12/03/11 12:02, Jung-uk Kim wrote: >>>> On Friday 02 December 2011 07:27 pm, Jung-uk Kim wrote: >>>>> On Thursday 01 December 2011 11:43 pm, Lawrence Stewart wrote: >>>>>> On 12/02/11 03:43, Jung-uk Kim wrote: >>>>>>> On Thursday 01 December 2011 10:11 am, Lawrence Stewart wrote: >>>>>>>> On 11/30/11 05:09, Jung-uk Kim wrote: >>>>>>>>> On Tuesday 29 November 2011 11:13 am, Lawrence Stewart wrote: >>> [snip] >>>>>>>>>> Here's the first of the patches: >>>>>>>>>> >>>>>>>>>> http://people.freebsd.org/~lstewart/patches/misc/ffclock_bpf >>>>>>>>>> _i nt act abi_10.x.r228130.patch >>>>>>>>> >>>>>>>>> I only glanced at it but it looks very close to what I wanted >>>>>>>>> to suggest. >>>>>>>> >>>>>>>> Final candidate patch is at: >>>>>>>> >>>>>>>> http://people.freebsd.org/~lstewart/patches/misc/ffclock_bpf_i >>>>>>>> nt act abi_10.x.r228180.patch >>>>>>>> >>>>>>>> Assuming it passes the "make tinderbox" build I'm currently >>>>>>>> running and no further input is received from interested >>>>>>>> parties, I plan to commit it in ~10 hours. >>>>>>>> >>>>>>>> Changes since the r228130 patch I sent previously: >>>>>>>> >>>>>>>> - The new flags in bpf.h are added unconditionally so that >>>>>>>> they can always be referenced at compile time and a decision >>>>>>>> made at runtime as to whether a flag will be set or not. Using >>>>>>>> one of the new flags when the kernel doesn't have FFCLOCK >>>>>>>> compiled in results in the flag being ignored. An app should >>>>>>>> check for the existence of the "ffclock" kernel feature or the >>>>>>>> "kern.sysclock" sysctl tree before attempting to use the >>>>>>>> flags. >>>>>>>> >>>>>>>> - This patch will hopefully be MFCed at some point, so I added >>>>>>>> a CTASSERT to bpf.c to ensure that the ABI of structs >>>>>>>> bpf_hdr32, bpf_hdr and bpf_xhdr remains intact when FFCLOCK is >>>>>>>> enabled and the union of a ffcounter and struct >>>>>>>> timeval32/timeval/bpf_ts is switched in. >>>>>>>> >>>>>>>> - bpf_gettime() more comprehensively covers all the possible >>>>>>>> cases of flag combination and does sensible things for each >>>>>>>> case (even though some cases are rather silly). >>>>>>>> >>>>>>>> - The snippet of code at the beginning of catchpacket() that >>>>>>>> was manipulating the struct bintime derived from bpf_gettime() >>>>>>>> was gross and has been removed in favour of selecting the >>>>>>>> right {get}bin{up}time() function call in bpf_gettime(). >>>>>>> >>>>>>> I did that to reduce branching. Since you are introducing more >>>>>>> branches, it warrants a function pointer now. >>>>> >>>>> There's another reason, BTW. If mbufs are tagged with timestamps >>>>> (where only monotonic timestamps are allowed), they must be >>>>> converted within the bpf.c. I forgot all the details. :-( >>> >>> We should document this knowledge in some code comments. >>> >>>>>> I see, thanks for the explanation. Could you elaborate a bit more >>>>>> about how you would implement the function pointer idea? I'm also >>>>>> curious in the !FFCLOCK case just how much overhead having the >>>>>> 2-layer nested if/else adds. I'm not an very optimisation savvy >>>>>> person, but I'm wondering if it's actually worth micro-optimising >>>>>> this code. >>>>>> >>>>>> My initial thoughts about your function pointer idea lead to >>>>>> adding a function pointer in the bpf_d and setting it to the >>>>>> appropriate function to get the timestamp from at bpf_d creation >>>>>> or ioctl time. Whilst I like this idea, I can't see how it would >>>>>> work given that the various functions involved in time/ffcounter >>>>>> stamp generation all have different signatures. >>>>>> >>>>>> We could have multiple variants of bpf_gettime() which each call >>>>>> the appropriate underlying functions to generate the appropriate >>>>>> stamp. Would add quite a lot of code but would reduce the >>>>>> overhead of calling bpf_gettime() to an indirect function call + >>>>>> the underlying stamp generation function call. This also solves >>>>>> the problem of multiple function signatures. >>>>> >>>>> Please see my patch: >>>>> >>>>> http://people.freebsd.org/~jkim/bpf_ffclock.diff >>>> >>>> I booted up the kernel and found it just crashes because of stupid >>>> typos. Now a new patch is uploaded in place. Sorry. >>>> >>>> Anyway, I see no regression nor ABI breakage. :-) >>> >>> struct bpf_d being part of the ABI was the main thing I was concerned >>> about, so given that it isn't I like your approach a lot. >>> >>> As noted by Julien, this approach does introduce problems with >>> respect to the follow up patch that adds a permanent bh_ffcounter >>> member to the bpf header. I thought the fromclock() API would >>> sufficiently meet the needs of consumers like BPF, but if we were to >>> proceed with something like Jung-uk's proposed patch I don't think >>> that's true anymore. >>> >>> We decided to bite the bullet and devise an API that is more compact >>> and can return all appropriate time information from all underlying >>> sysclocks in an efficient manner - something like a more generic >>> version of ffclock_abstime(). Julien and I spent quite some time >>> today nutting out details, and Julien has done a proof of concept >>> patch against my proposed BPF patch which looks good as a starting >>> point for discussion: >>> >>> http://www.cubinlab.ee.unimelb.edu.au/~jrid/patches/r228254/sysclock_snap.patch >>> >>> >>> It needs a bit more work and should be split into two patches (one >>> introducing the API and the other being the BPF changes). >>> >>> With something like this though, the BPF patch becomes radically >>> simplified in the FFCLOCK case. We don't even need a function pointer >>> cached in the bpf_d anymore, but could cache a struct sysclock_snap >>> there instead if we really wanted to minimise overhead in bpf_gettime(). >>> >>> We'll have a go at refining the patch tomorrow hopefully, but wanted >>> to put this out there for early consideration. >> >> >> Hi Jung-uk (and all), >> >> It took us a bit of time but we have a new patch. There are a lot of >> changes in the patch, and I will start by summarising our motivation >> for the changes. Hopefully I won't forget anything, but if I leave >> something unclear, don't hesitate to ask Lawrence or me. >> >> Starting with the easy bits, we drop the union in the BPF header and >> followed something similar to what Ben and you proposed. >> >> Next, the core motivation was to ensure that the timestamping function >> be called once only per packet captured, while keeping the ability to >> read both clocks and return the time from one of the two to the user. >> This forced us to redesign the timestamping function and one thing >> leading to another a fair bit of the BPF code. >> >> There 3 orthogonal parameters to the timestamping function in the BPF >> context: >> - which clock is used to return the time >> - what is the format of the value returned >> - and a performance issue (nothing to do with timestamping or >> timekeeping): reading the hardware counter or not >> >> We handle the first one with a new sysclock_getsnapshot() which saves >> both clock parameters when timestamping and potentially reads the >> hardware counter. >> The second one has been covered in previous discussion with the >> addition of BPF_T_FFCOUNTER. >> >> The performance issue created problems. In the current code, the order >> with which the BPF devices are open on one interface impacts the >> timestamping function. Let's consider two BPF devices respectively >> open with BPF_TSTAMP_NORMAL and BPF_TSTAMP_FAST quality. If NORMAL is >> open after FAST, it is at the head of the list, the hardware counter >> is read, and FAST benefits from higher quality timestamps. If FAST is >> open after NORMAL, the timestamping function is called once and return >> last kernel tick time, then it is called a second time for NORMAL, >> this time accessing the hardware counter. >> >> None of these cases are satisfactory. This lead to inconsistent >> behaviour based on the order the BPF are open. An application can >> impact the settings of other BPF devices and other applications / >> kernel consumers. The timestamping suffers from higher latency and >> worse, higher variability of the latency, which is bad. >> >> We then took a different approach, which is inspired by your work on >> BPF. We believe the FAST/NORMAL settings should have 2 level of >> granularity: system wide and per interface. The system wide is the >> default behaviour (for example all BPF devices are NORMAL) and the >> interface one super-seeds the system one. This is implemented by the >> use of new sysctl living in the net.bpf tree. >> >> This enables many scenarios such as: >> - system default is FAST to improve performance because the system >> uses a slow timecounter (eg HPET or ACPI). >> - the system has 3 interfaces. Interface 2 is put in NORMAL mode (need >> for accuracy but not much traffic), and the last can do hardware >> timestamping and is put in BPF_TSTAMP_EXTERNAL mode. >> >> This then naturally led to store the performance setting >> (none/fast/normal/external) for each interface instead of each BPF >> descriptor. >> >> Each BPF descriptor still maintains the other 2 settings: the choice >> of clock and the format of the timestamp independently from each other. >> >> We believe this new patch keeps the spirit of the changes you proposed >> and improves it while accomodating the changes required by the >> feed-forward clock and future evolutions with things like IEEE 1588 >> compatible NICs. >> >> We plan to push this patch as soon as possible. We would really >> appreciate if you could comment on this long email and pach very soon. >> >> The patch can be found here: >> http://www.cubinlab.ee.unimelb.edu.au/~jrid/patches/r228429-v4/ffclock_bpf_intactabi.patch Candidate commit patches can be found at: http://people.freebsd.org/~lstewart/patches/misc/sysclockgetsnap_10.x.r228435.patch http://people.freebsd.org/~lstewart/patches/misc/ffclock_bpf_intactabi_10.x.r228435.patch Relative to Julien's patch, I've split his patch into two, updated bpf.4 documentation, added a net.bpf.tscfg.default sysctl to set the default time stamp for all BPF attached interfaces, and added some cleanup and bug fixes. If I don't hear any objections or receive any requests to hold off committing to allow more time for review, I plan to commit these patches in the next few days. Cheers, Lawrence
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4EEDFBBD.9050809>