From owner-freebsd-net@FreeBSD.ORG Fri Jun 11 13:08:29 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D73891065677; Fri, 11 Jun 2010 13:08:29 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au [211.29.132.182]) by mx1.freebsd.org (Postfix) with ESMTP id 6D4C58FC22; Fri, 11 Jun 2010 13:08:28 +0000 (UTC) Received: from c122-106-175-69.carlnfd1.nsw.optusnet.com.au (c122-106-175-69.carlnfd1.nsw.optusnet.com.au [122.106.175.69]) by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o5BD8PMW028256 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 11 Jun 2010 23:08:26 +1000 Date: Fri, 11 Jun 2010 23:08:24 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Jung-uk Kim In-Reply-To: <201006102124.02005.jkim@FreeBSD.org> Message-ID: <20100611215032.U35046@delplex.bde.org> References: <201006091444.50560.jkim@FreeBSD.org> <20100610173950.T33647@delplex.bde.org> <201006102124.02005.jkim@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: [RFC] BPF timestamping X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 13:08:29 -0000 On Thu, 10 Jun 2010, Jung-uk Kim wrote: > On Thursday 10 June 2010 05:45 am, Bruce Evans wrote: >> On Wed, 9 Jun 2010, Jung-uk Kim wrote: >>> bpf(4) can only timestamp packets with microtime(9). I want to >>> expand it to be able to use different format and resolution. The >>> ... >> This has too many timestamp types, yet not one timestamp type which >> is any good except possibly BPF_T_NONE, and not one monotonic >> timestamp type. Only external uses and compatibility require use >> of CLOCK_REALTIME. >> ... > Please note that I am not trying to solve timecounter issues here. > The current BPF timestamping is not too good because of two main > reasons; 1) it is too slow with some timecounter hardware as you have > noted and 2) we have no API to change timestamp resolution, accuracy, > format, offset, or whatever *at all*. > > The most common trick for the first problem is using getmicrotime(9) > instead of microtime() if the users don't care much about its > accuracy. For those people who want to collect as many packets as > possible without spending fortunes, it works pretty well. However, > suppose you have multiple interfaces. You want good timestamps from > a slower controller (LAN side) and less accurate timestamps from a > super fast controller (WAN side), but you can't. My patch solves > this problem by assigning time stamping function per descriptor. So, > you can use the same resolution but different accuracies, for > example. I now think you should provide exactly the same timestamping features as provided to useland by clock_gettime(2), clock_getres(2) and clock_getaccprecres(2missing), using essentially the same interface and code. The userland interface involves clock ids of type clockid_t with names like CLOCK_REALTIME instead of bpf-specific names and types. Unfortunately it only supports the timespec format. > The second problem is little bit harder for us without breaking > libpcap and its consumers as it expects struct timeval and nothing > else. That's why I had to introduce new header format with compat > shims. In fact, struct bpf_hdr (and struct pcap_sf_pkthdr) is really > obsolete and people have been talking about pcap NG for many years, > which can store timestamps in variable resolutions and offsets. Does it prefer or support bintimes? > However, we can only use the default resolution even if libpcap gets > the new format because we are stuck with struct bpf_hdr[1]. > > BTW, I updated my patch, which includes monotonic clocks now. > > BPF_T_MICROTIME_MONOTONIC microuptime(9) > BPF_T_NANOTIME_MONOTONIC nanouptime(9) > BPF_T_BINTIME_MONOTONIC binuptime(9) > BPF_T_MICROTIME_MONOTONIC_FAST getmicrouptime(9) > BPF_T_NANOTIME_MONOTONIC_FAST getnanouptime(9) > BPF_T_BINTIME_MONOTONIC_FAST getbinuptime(9) > > http://people.freebsd.org/~jkim/bpf_tstamp2.diff > > Thanks for the hint, Bruce, although you may say there are more bogus > clock types now. ;-) Yes, there are far too many, but many are still missing: - aliases BPF_T_*TIME_PRECISE for BPF_T_*TIME correpsonding to the corresponding aliases for clockid_t's. This gives 18 clock ids per timecounter instead of only 12. clock_gettime() only supports 6 of these (it doesn't support the micro or bin time formats). - aliases BPF_T_UPTIME* for BPF_*TIME_MONOTONIC. This gives 27 clock ids per timecounter instead of only 18. clock_gettime() only supports 9 of these. - BPF_T_SECOND corresponding to CLOCK_SECOND. clock_gettime() supports this. - BPF_T_THREAD_CPUTIME corresponding to CLOCK_THREAD_CPUTIME_ID, but without the bogus _ID suffix. The latter gives the runtime of the current thread in nanoseconds. This might be almost useful for bpf if all the packets are stamped by the same kernel or user thread. Then it would function as a packet id with extra info about the time spent processing packets. - BPF_T_VIRTUAL and BPF_T_PROF corresponding to CLOCK_VIRTUAL and CLOCK_PROF. The latter give user and user+sys times for processes. They would be about as useful as BPF_T_THREAD_CPUTIME for bpf. - the total is now 31 for bpf (19 missing) and 13 for clock_gettime(). - multiply this by the number of timecounters. Non-primary timecounters should be available iff something has a use for them. - raw cputicker timestamps. CLOCK_THREAD_CPUTIME_ID's timer uses these. These are not available in userland. They are easily available in the kernel, by calling cpu_tick(). Scaling them is nontrivial. - raw timecounter reads. These are already available in userland via sysctlbyname("kern.timecounter.tc..counter", ...). Strangely, they are hard to call from the kernel. By using normal clock ids and calling kern_clock_gettime(), you can avoid lots of duplication (including documentation of the bpf clock ids) and automatically support new normal clock ids. However, I can't see how to implement the following features as efficiently: - direct scaling to the final precision (kern_clock_gettime() only returns timspecs -- see abov) - delayed scaling to the final precision (bpf seems to make timestamps as binuptimes and scale them later) - avoiding going through layers and switches. bpf goes through several layers and switches now, but perhaps it can go directly to the *time() function in kern_tc.c via a single function pointer, where kern_clock_gettime() and delayed scaling have to use a switch or an indexed function pointer since their clock id is highly variable. Bruce