Date: Wed, 5 Nov 2003 23:58:16 -0800 From: Guy Harris <guy@alum.mit.edu> To: Brian Fundakowski Feldman <green@FreeBSD.org> Cc: fenner@FreeBSD.org Subject: Re: bpf/pcap are weird Message-ID: <20031105235816.E331@quadrajet.sonic.net>
next in thread | raw e-mail | index | archive | help
> Okay, this is goofy stuff and breaks a lot of code that otherwise makes > certain assumptions about pcap/bpf that don't work on FreeBSD. Our > bpf(4) doesn't actually care about the non-blocking fd flag, and our pcap(3) > doesn't care at all about BIOCIMMEDIATE. This is a libpcap deficiency that I will probably fix at some point, as 1) some libpcap applications might want that mode and 2) the way you get that mode differs on different platforms (some platforms always implement it, e.g. Linux; other platforms have different ways of requesting it). It's in my queue along with a number of other libpcap deficiencies. > Why do we have BIOCIMMEDIATE? > It seems like it's what SHOULD be implemented with the non-blocking I/O > flag No. BIOCIMMEDIATE and non-blocking mode are different. BIOCIMMEDIATE mode means "make incoming packets readable immediately; don't buffer them up until either the store buffer is full or the timeout expires". This is for use in, for example, applications that are using BPF to implement network protocols, and want to be able to respond immediately to incoming packets, as opposed to, for example, packet capture applications (tcpdump, Ethereal, etc.) which don't necessarily need to immediately show or save incoming packets and which might want to try to get as many packets as possible per read on the BPF device. It does *NOT* mean "an attempt to read on this device won't block even if *no* packets are available", nor should it - applications running in BIOCIMMEDIATE mode would probably still want to block, rather than spin, if no packets are available. Non-blocking mode should mean "an attempt to read on this device won't block, even if there are no packets remaining", so it's not identical to BIOCIMMEDIATE mode. If used in conjunction with a properly-working "select()" or "poll()" - i.e., one that causes the timeout timer to start when the "select()" or "poll()" is done, so that the "select()" or "poll()" will wake up if the store buffer fills *OR* the timeout expires - then it does need to be the case that, if the "select()" or "poll()" says a read on the BPF device will succeed, it will, in fact, succeed. This could be implemented by having reads in non-blocking mode always do a buffer rotation if there are packets in the store buffer but not the hold buffer, just as is the case in BIOCIMMEDIATE mode. That's currently done in "bpf_read()" - note the "|| timed_out" in the "if" inside the "while (d->bd_hbuf == 0)" loop. That appears to have been introduced in 4.5, in revision 1.59.2.8, which was an MFC of revision 1.86: Make bpf's read timeout feature work more correctly with select/poll, and therefore with pthreads. I doubt there is any way to make this 100% semantically identical to the way it behaves in unthreaded programs with blocking reads, but the solution here should do the right thing for all reasonable usage patterns. The basic idea is to schedule a callout for the read timeout when a select/poll is done. When the callout fires, it ends the select if it is still in progress, or marks the state as "timed out" if the select has already ended for some other reason. Additional logic in bpfread then does the right thing in the case where the timeout has fired. Note, I co-opted the bd_state member of the bpf_d structure. It has been present in the structure since the initial import of 4.4-lite, but as far as I can tell it has never been used. PR: kern/22063 and bin/31649 PR 22063 is "bpf when used with the select system call with timeout doesn't forward packets on timeout": When bpf is accessed via libpcap with the select system call with a timeout set if a less than full buffer of packets received on the interface (and passed to bpf.c) they will never be returned to libpcap even on a timeout. OpenBSD has a partial fix for this (it gets the first packet of 9 up and leaves the other 8) which I have corrected, reported to OpenBSD and ported to FreeBSD. As a side note one of the OpenBSD people is working on a better bpf implementation and would be interested in help by someone knowledgable in the FreeBSD VM system to assist porting his code when finished to FreeBSD. (I think the "better bpf implementation" might be Michael Stolarchuk's memory-mapped BPF, but I don't know whether it ever saw the light of day.) PR 31649 is "libpcap doesn't work with -pthread"; the problem is that the userland pthreads library requires that "select()"/"poll()" and non-blocking reads work on anything from which you're trying to read if you can get long-term waits on it - and that wasn't the case for BPF devices. The question then is whether if *not* used with "select()" or "poll()" reads should return whatever packets are there, even if the timer hasn't expired. One could argue that it should, in which case the "if" in question should also check for "ioflag & IO_NDELAY". I don't know whether that would cause problems for any applications, though.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20031105235816.E331>