Date: Tue, 31 May 2011 15:52:12 +0100 From: "Robert N. M. Watson" <rwatson@freebsd.org> To: George Neville-Neil <gnn@FreeBSD.org> Cc: Takuya ASADA <syuu@dokukino.com>, soc-status@freebsd.org, Kazuya Goda <gockzy@gmail.com> Subject: Re: Weekly status report (27th May) Message-ID: <2EF14D0B-A3A1-4835-B07F-728BAFA5B0CB@freebsd.org> In-Reply-To: <8259CBF7-B2E6-49C6-A7C4-6682ECBDBB9F@freebsd.org> References: <BANLkTim=zeRhwGajksbX2fBY9snkcj1h0g@mail.gmail.com> <8259CBF7-B2E6-49C6-A7C4-6682ECBDBB9F@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 31 May 2011, at 14:52, George Neville-Neil wrote: >> - Do we really need to expose queue information and threads to user >> applications? >=20 > There are applications that will want this information. >=20 >> Probably most of BPF application requires to merge packet streams = from >> threads at last. >> For example, sniffer app such as tcpdump and wireshark need to output >> packet dump on a screen, before output it on the screen we need to >> merge packet streams for each queues into one stream. >> If so, isn't it better to merge stream in kernel, not userland? >>=20 >>=20 >> I'm not really sure about use case of BPF, maybe there's use case can >> get benefit from multithreaded BPF? >=20 > Certainly there is a case for it, but perhaps not yet. Let's get = through the > work you've already planned first. I see the test case isn't written = yet, so > how are you testing these changes? When I get some time, probably = next week, > I'll want to run some of this code myself. The rationale to for exposing queues to userspace explicitly is the same = as the rationale for exposing queues to the OS: it's not just packet = data that has cache issues, but program data to do with processing = packet data. The reason for having each BPF device have a input and output queue = masks in my initial thinking was that we would set then to = 0xffffffffffffffff by default, meaning that a particular BPF device = would merge and collect all packets, picking an arbitrary ordering for = interlacing the streams (as it does today). However, an application = might decide to open multiple devices, one each having a particular bit = set, in order to receive for a particular queue. The application could = then have particular threads use particular BPF devices, and based on = hardware having gotten the flow assignment right, it could then avoid = cache line contention for statistics and even stateful processing of = flows in its different threads. That actually gives you a spectrum = between today's behaviour and greater levels of granularity, and lets = the application decide "how early" to blend the different queues of = data. It can ask the kernel to do it, or it can do it itself in = userspace. Robert=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2EF14D0B-A3A1-4835-B07F-728BAFA5B0CB>