From owner-freebsd-net@FreeBSD.ORG Fri Jun 11 16:38:42 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from [127.0.0.1] (unknown [IPv6:2001:4f8:fff6::28]) by hub.freebsd.org (Postfix) with ESMTP id 4BB701065672; Fri, 11 Jun 2010 16:38:41 +0000 (UTC) (envelope-from jkim@FreeBSD.org) From: Jung-uk Kim To: Bruce Evans Date: Fri, 11 Jun 2010 12:38:19 -0400 User-Agent: KMail/1.6.2 References: <201006091444.50560.jkim@FreeBSD.org> <201006102124.02005.jkim@FreeBSD.org> <20100611215032.U35046@delplex.bde.org> In-Reply-To: <20100611215032.U35046@delplex.bde.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201006111238.24726.jkim@FreeBSD.org> Cc: freebsd-net@freebsd.org Subject: Re: [RFC] BPF timestamping X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 16:38:42 -0000 On Friday 11 June 2010 09:08 am, Bruce Evans wrote: > On Thu, 10 Jun 2010, Jung-uk Kim wrote: > > On Thursday 10 June 2010 05:45 am, Bruce Evans wrote: > >> On Wed, 9 Jun 2010, Jung-uk Kim wrote: > >>> bpf(4) can only timestamp packets with microtime(9). I want to > >>> expand it to be able to use different format and resolution. > >>> The ... > >> > >> This has too many timestamp types, yet not one timestamp type > >> which is any good except possibly BPF_T_NONE, and not one > >> monotonic timestamp type. Only external uses and compatibility > >> require use of CLOCK_REALTIME. > >> ... > > > > Please note that I am not trying to solve timecounter issues > > here. The current BPF timestamping is not too good because of two > > main reasons; 1) it is too slow with some timecounter hardware as > > you have noted and 2) we have no API to change timestamp > > resolution, accuracy, format, offset, or whatever *at all*. > > > > The most common trick for the first problem is using > > getmicrotime(9) instead of microtime() if the users don't care > > much about its accuracy. For those people who want to collect as > > many packets as possible without spending fortunes, it works > > pretty well. However, suppose you have multiple interfaces. You > > want good timestamps from a slower controller (LAN side) and less > > accurate timestamps from a super fast controller (WAN side), but > > you can't. My patch solves this problem by assigning time > > stamping function per descriptor. So, you can use the same > > resolution but different accuracies, for example. > > I now think you should provide exactly the same timestamping > features as provided to useland by clock_gettime(2), > clock_getres(2) and clock_getaccprecres(2missing), using > essentially the same interface and code. The userland interface > involves clock ids of type clockid_t with names like CLOCK_REALTIME > instead of bpf-specific names and types. Unfortunately it only > supports the timespec format. I thought about using them but struct timespec isn't good enough. It has exactly the same problem as struct timeval does, i.e., sizeof(time_t) and sizeof(long) are variable depending on arch. Note struct bpf_xhdr uses int64_t and uint64_t to work around the problem. At least in theory, it should be good enough until we have to support a 16-byte aligned arch. :-) > > The second problem is little bit harder for us without breaking > > libpcap and its consumers as it expects struct timeval and > > nothing else. That's why I had to introduce new header format > > with compat shims. In fact, struct bpf_hdr (and struct > > pcap_sf_pkthdr) is really obsolete and people have been talking > > about pcap NG for many years, which can store timestamps in > > variable resolutions and offsets. > > Does it prefer or support bintimes? It supports bintime. It does not prefer anything although the default resolution is 1 usec for backward compatibility with old pcap format. > > However, we can only use the default resolution even if libpcap > > gets the new format because we are stuck with struct bpf_hdr[1]. > > > > BTW, I updated my patch, which includes monotonic clocks now. > > > > BPF_T_MICROTIME_MONOTONIC microuptime(9) > > BPF_T_NANOTIME_MONOTONIC nanouptime(9) > > BPF_T_BINTIME_MONOTONIC binuptime(9) > > BPF_T_MICROTIME_MONOTONIC_FAST getmicrouptime(9) > > BPF_T_NANOTIME_MONOTONIC_FAST getnanouptime(9) > > BPF_T_BINTIME_MONOTONIC_FAST getbinuptime(9) > > > > http://people.freebsd.org/~jkim/bpf_tstamp2.diff > > > > Thanks for the hint, Bruce, although you may say there are more > > bogus clock types now. ;-) > > Yes, there are far too many, but many are still missing: > - aliases BPF_T_*TIME_PRECISE for BPF_T_*TIME correpsonding to the > corresponding aliases for clockid_t's. This gives 18 clock ids > per timecounter instead of only 12. clock_gettime() only > supports 6 of these (it doesn't support the micro or bin time > formats). - aliases BPF_T_UPTIME* for BPF_*TIME_MONOTONIC. This > gives 27 clock ids per timecounter instead of only 18. > clock_gettime() only supports 9 of these. > - BPF_T_SECOND corresponding to CLOCK_SECOND. clock_gettime() > supports this. > - BPF_T_THREAD_CPUTIME corresponding to CLOCK_THREAD_CPUTIME_ID, > but without the bogus _ID suffix. The latter gives the runtime of > the current thread in nanoseconds. This might be almost useful for > bpf if all the packets are stamped by the same kernel or user > thread. Then it would function as a packet id with extra info > about the time spent processing packets. > - BPF_T_VIRTUAL and BPF_T_PROF corresponding to CLOCK_VIRTUAL and > CLOCK_PROF. The latter give user and user+sys times for > processes. They would be about as useful as BPF_T_THREAD_CPUTIME > for bpf. - the total is now 31 for bpf (19 missing) and 13 for > clock_gettime(). - multiply this by the number of timecounters. > Non-primary timecounters should be available iff something has a > use for them. > - raw cputicker timestamps. CLOCK_THREAD_CPUTIME_ID's timer uses > these. These are not available in userland. They are easily > available in the kernel, by calling cpu_tick(). Scaling them is > nontrivial. - raw timecounter reads. These are already available > in userland via sysctlbyname("kern.timecounter.tc..counter", > ...). Strangely, they are hard to call from the kernel. That's really far too many for my taste. :-( It'll significantly increase number of special cases for switch statement but I cannot avoid it (please see below). I added _MONOTONIC because it was relatively cheap to implement and important. I may add some aliases for _REALTIME, _PRECISE, and _UPTIME if you insist, though. > By using normal clock ids and calling kern_clock_gettime(), you can > avoid lots of duplication (including documentation of the bpf clock > ids) and automatically support new normal clock ids. However, I > can't see how to implement the following features as efficiently: > - direct scaling to the final precision (kern_clock_gettime() only > returns timspecs -- see abov) > - delayed scaling to the final precision (bpf seems to make > timestamps as binuptimes and scale them later) > - avoiding going through layers and switches. bpf goes through > several layers and switches now, but perhaps it can go directly to > the *time() function in kern_tc.c via a single function pointer, > where kern_clock_gettime() and delayed scaling have to use a switch > or an indexed function pointer since their clock id is highly > variable. As I said, we cannot use kern_clock_gettime() and clockid_t. The code duplication is also necessary evil because multiple descriptors may be attached to a single interface, unless you are effectively asking me to revert the following commit: http://docs.freebsd.org/cgi/mid.cgi?200607241542.k6OFg5ck098374 Cheers, Jung-uk Kim