From owner-freebsd-net@FreeBSD.ORG Fri Jun 11 01:24:17 2010 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from [127.0.0.1] (unknown [IPv6:2001:4f8:fff6::28]) by hub.freebsd.org (Postfix) with ESMTP id 3EC461065672; Fri, 11 Jun 2010 01:24:15 +0000 (UTC) (envelope-from jkim@FreeBSD.org) From: Jung-uk Kim To: Bruce Evans Date: Thu, 10 Jun 2010 21:23:59 -0400 User-Agent: KMail/1.6.2 References: <201006091444.50560.jkim@FreeBSD.org> <20100610173950.T33647@delplex.bde.org> In-Reply-To: <20100610173950.T33647@delplex.bde.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201006102124.02005.jkim@FreeBSD.org> Cc: freebsd-net@FreeBSD.org Subject: Re: [RFC] BPF timestamping X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 01:24:17 -0000 On Thursday 10 June 2010 05:45 am, Bruce Evans wrote: > On Wed, 9 Jun 2010, Jung-uk Kim wrote: > > bpf(4) can only timestamp packets with microtime(9). I want to > > expand it to be able to use different format and resolution. The > > patch is here: > > > > http://people.freebsd.org/~jkim/bpf_tstamp.diff > > > > With this patch, we can select different format and resolution of > > the timestamps. It is done via ioctl(2) with BIOCSTSTAMP > > command. Similarly, you can get the current format and resolution > > with BIOCGTSTAMP command. Currently, the following functions are > > available: > > > > BPF_T_MICROTIME microtime(9) > > BPF_T_NANOTIME nanotime(9) > > BPF_T_BINTIME bintime(9) > > BPF_T_MICROTIME_FAST getmicrotime(9) > > BPF_T_NANOTIME_FAST getnanotime(9) > > BPF_T_BINTIME_FAST getbintime(9) > > BPF_T_NONE ignore time stamps > > This has too many timestamp types, yet not one timestamp type which > is any good except possibly BPF_T_NONE, and not one monotonic > timestamp type. Only external uses and compatibility require use > of CLOCK_REALTIME. > > I recently tried looking at timeout resolution on FreeBSD cluster > machines using ktrace, and found ktrace unusable for this. At > first I blamed the slowness of the default misconfiguered > timecounter ACPI-fast, but the main problem was that I forgot my > home directory was on nfs, and nfs makes writing ktrace records > take hundreds of times longer than on local file systems. > ACPI-fast seemed to be taking nearly 1000 uS, but it was nfs taking > that long. > > Anyway, ACPI-fast takes nearly 1000 nS, which is many times too > long to be good for timestamping individual syscalls or packets, > and makes sub-microseconds resolution useless. The above non-get > *time() interfaces still use the primary timecounter, and this > might be slow even if it is not misconfigured. The above > get*time() interfaces are fast only at the cost of being broken. > Among other bugs, their times only change at relatively large > intervals which should become infinity with tickless kernels. > (BTW, icmp timestamps are still broken on systems with hz < 100. > Someone changed microtime() to getmicrotime(), but getmicrotime() > cannot deliver the resolution of 1 mS supported by icmp timestamps > unless these intervals are <= 1 mS.) Please note that I am not trying to solve timecounter issues here. The current BPF timestamping is not too good because of two main reasons; 1) it is too slow with some timecounter hardware as you have noted and 2) we have no API to change timestamp resolution, accuracy, format, offset, or whatever *at all*. The most common trick for the first problem is using getmicrotime(9) instead of microtime() if the users don't care much about its accuracy. For those people who want to collect as many packets as possible without spending fortunes, it works pretty well. However, suppose you have multiple interfaces. You want good timestamps from a slower controller (LAN side) and less accurate timestamps from a super fast controller (WAN side), but you can't. My patch solves this problem by assigning time stamping function per descriptor. So, you can use the same resolution but different accuracies, for example. The second problem is little bit harder for us without breaking libpcap and its consumers as it expects struct timeval and nothing else. That's why I had to introduce new header format with compat shims. In fact, struct bpf_hdr (and struct pcap_sf_pkthdr) is really obsolete and people have been talking about pcap NG for many years, which can store timestamps in variable resolutions and offsets. However, we can only use the default resolution even if libpcap gets the new format because we are stuck with struct bpf_hdr[1]. BTW, I updated my patch, which includes monotonic clocks now. BPF_T_MICROTIME_MONOTONIC microuptime(9) BPF_T_NANOTIME_MONOTONIC nanouptime(9) BPF_T_BINTIME_MONOTONIC binuptime(9) BPF_T_MICROTIME_MONOTONIC_FAST getmicrouptime(9) BPF_T_NANOTIME_MONOTONIC_FAST getnanouptime(9) BPF_T_BINTIME_MONOTONIC_FAST getbinuptime(9) http://people.freebsd.org/~jkim/bpf_tstamp2.diff Thanks for the hint, Bruce, although you may say there are more bogus clock types now. ;-) Enjoy, Jung-uk Kim [1] libpcap added limited support for the pcap NG format since 1.1.0 and my patch was written with the format in mind. If my patch gets committed, I am going to submit a libpcap patch upstream to introduce new struct bpf_xhdr.