Date: Mon, 14 Jun 2004 08:38:57 -0400 From: Ed Maste <emaste@sandvine.com> To: 'Sergey Lyubka' <devnull@uptsoft.com>, freebsd-hackers@freebsd.org Subject: RE: memory mapped packet capturing - bpf replacement ? Message-ID: <FE045D4D9F7AED4CBFF1B3B813C8533701BD40C7@mail.sandvine.com>
next in thread | raw e-mail | index | archive | help
> The module is a netgraph node, called ng_mmq. mmq stands for > memory-mapped queue. The node has one hook, called "input". > When this hook is connected, > o memory buffer is allocated. size is controlled by the > debug.mmq_size sysctl. > o a device /dev/mmqX is created, where X is a node ID > o /dev/mmqX is mmap-able by the user, mmap() returns an > allocated buffer > o when packet arrives on hook, it is copied to the buffer, > which is actually a ringbuffer. The ringbuffer's head is > advanced. > o user spins until tail != head, which means new data arrived. > Then it reads from ringbuffer, and advances the tail. > o no mutexes are used > > The code is at > > So this is the basic idea. I connected ng_mmq node to my rl0: > ethernet node via the ng_hub, and benchmarked it against the > pcap, using the same pcap callback function. Packet processing was > simulated by the delay() function that just takes some CPU cycles. > What I have found is: > 1. bpf seems to be faster, i.e. it drops less packets than mmq > 2. mmq seems to capture more packets. > > This is sample output from the benchmark utility: > # ./benchmark rl0 /dev/mmq5 1000 > pcap: rcvd: 15061, dropped: 14047, seen: 1000 > mmq: rcvd: 23172, dropped: 21789, seen: 1000 > > Now, the questions: > 1. is my interpretation of benchmark results correct? > 2. if they are correct, why bpf is faster? > 3. is it OK to have no mutexes for ringbuffer operations ? Hello Sergey. I haven't looked at your code, but I'll provide some comments, having implemented a mmaped ringbuffer BPF replacement myself. First off, you should be able to do significantly better than vanilla BPF. Gigabit line rate is doable for "reasonable" sized packets and good hardware. Watch how much time you spend in your simulated packet processing. I also needed to add a delay to my benchmarking, because without it I'd run into the hardware limit (i.e. 1gbps), hiding the effects of further tweaking. However, if it's too great it will overwhelm the bpf/ringbuffer overhead, making your results less useful. I did my benchmark by increasing the packet rate until I found the point at which packets started to be dropped. In my testing I found the call to microtime() to be quite expensive. (It will vary depending on which timecounter is being used.) Is this in a SMP or uniprocesor environment? I think your gain from a ringbuffer interface will be more significant in the SMP case. Does the ng_hub cause the packet to be copied? If so you've still got the same number of copies as vanilla BPF. Are you using the same snap length (or copying the entire packet) in each case? As for question 3, be careful that you're atomically modifying the head and tail indices/pointers. But yes, you can do it without a mutex. -ed
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FE045D4D9F7AED4CBFF1B3B813C8533701BD40C7>