From owner-freebsd-net@FreeBSD.ORG Tue Sep 2 14:55:56 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0F6361065681 for ; Tue, 2 Sep 2008 14:55:56 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from out3.smtp.messagingengine.com (out3.smtp.messagingengine.com [66.111.4.27]) by mx1.freebsd.org (Postfix) with ESMTP id C51E88FC08 for ; Tue, 2 Sep 2008 14:55:55 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from compute2.internal (compute2.internal [10.202.2.42]) by out1.messagingengine.com (Postfix) with ESMTP id 0E29815B987; Tue, 2 Sep 2008 10:55:55 -0400 (EDT) Received: from heartbeat2.messagingengine.com ([10.202.2.161]) by compute2.internal (MEProxy); Tue, 02 Sep 2008 10:55:55 -0400 X-Sasl-enc: L8Uj39Rt4N3sCc1VHExaZxMd7ZIz6kLpePJmSkDJ1FFs 1220367354 Received: from empiric.lon.incunabulum.net (82-35-112-254.cable.ubr07.dals.blueyonder.co.uk [82.35.112.254]) by mail.messagingengine.com (Postfix) with ESMTPSA id 7E71C3BC71; Tue, 2 Sep 2008 10:55:54 -0400 (EDT) Message-ID: <48BD53F9.50002@FreeBSD.org> Date: Tue, 02 Sep 2008 15:55:53 +0100 From: "Bruce M. Simpson" User-Agent: Thunderbird 2.0.0.14 (X11/20080514) MIME-Version: 1.0 To: Luigi Rizzo References: <3170f42f0809010507q6c37a9d5q19649bc261d7656d@mail.gmail.com> <48BBE7B2.4050409@FreeBSD.org> <48BCE4AA.6050807@elischer.org> <3170f42f0809020017k643180efte155a5b5701a40cf@mail.gmail.com> <20080902105124.GA22832@onelab2.iet.unipi.it> In-Reply-To: <20080902105124.GA22832@onelab2.iet.unipi.it> X-Enigmail-Version: 0.95.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD networking and TCP/IP list Subject: Re: how to read dynamic data structures from the kernel (was Re: reading routing table) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2008 14:55:56 -0000 Luigi Rizzo wrote: > do you know if any of the *BSD kernels implements some good mechanism > to access a dynamic kernel data structure (e.g. the routing tree/trie, > or even a list or hash table) without the flaws of the two approaches > i indicate above ? > Hahaha. I ran into an isomorphic problem with Net-SNMP at work last week. There's a need to export the BGP routing table via SNMP. Of course doing this in our framework at work requires some IPC calls which always require a select() (or WaitForMultipleObjects()) based continuation. Net-SNMP doesn't support continuations at the table iterator level, so somehow, we need to implement an iterator which can accomodate our blocking IPC mechanism. [No, we don't use threads, and that would actually create more problems than it solves -- running single-threaded with continuations lets us run lock free, and we rely on the OS's IPC primitives to serialize our code. works just fine for us so far...] So we would end up caching the whole primary key range in the SNMP sub-agent on a table OID access, a technique which would allow us to defer the IPC calls providing we walk the entire range of the iterator and cache the keys -- but even THAT is far too much data for the BGP table, which is a trie with ~250,000 entries. I hate SNMP GETNEXT. Back to the FreeBSD kernel, though. If you look at in_mcast.c, particularly in p4 bms_netdev, this is what happens for the per-socket multicast source filters -- there is the linearization of an RB-tree for setsourcefilter(). This is fine for something with a limit of ~256 entries per socket (why RB for something so small? this is for space vs time -- and also it has to merge into a larger filter list in the IGMPv3 paths.) And the lock granularity is per-socket. However it doesn't do for something as big as a BGP routing table. C++ lends itself well to expressing these kinds of smart-pointer idioms, though. I'm thinking perhaps we need the notion of a sysctl iterator, which allocates a token for walking a shared data structure, and is able to guarantee that the token maps to a valid pointer for the same entry, until its 'advance pointer' operation is called. Question is, who's going to pull the trigger? cheers BMS P.S. I'm REALLY getting fed up with the lack of openness and transparency largely incumbent in doing work in p4. Come one come all -- we shouldn't need accounts for folk to see and contribute what's going on, and the stagnation is getting silly. FreeBSD development should not be a committer or chum-of-committer in-crowd.