Date: Tue, 02 Sep 2008 15:55:53 +0100 From: "Bruce M. Simpson" <bms@FreeBSD.org> To: Luigi Rizzo <rizzo@iet.unipi.it> Cc: FreeBSD networking and TCP/IP list <freebsd-net@freebsd.org> Subject: Re: how to read dynamic data structures from the kernel (was Re: reading routing table) Message-ID: <48BD53F9.50002@FreeBSD.org> In-Reply-To: <20080902105124.GA22832@onelab2.iet.unipi.it> References: <3170f42f0809010507q6c37a9d5q19649bc261d7656d@mail.gmail.com> <48BBE7B2.4050409@FreeBSD.org> <48BCE4AA.6050807@elischer.org> <3170f42f0809020017k643180efte155a5b5701a40cf@mail.gmail.com> <alpine.BSF.1.10.0809021017500.1150@fledge.watson.org> <20080902105124.GA22832@onelab2.iet.unipi.it>
next in thread | previous in thread | raw e-mail | index | archive | help
Luigi Rizzo wrote: > do you know if any of the *BSD kernels implements some good mechanism > to access a dynamic kernel data structure (e.g. the routing tree/trie, > or even a list or hash table) without the flaws of the two approaches > i indicate above ? > Hahaha. I ran into an isomorphic problem with Net-SNMP at work last week. There's a need to export the BGP routing table via SNMP. Of course doing this in our framework at work requires some IPC calls which always require a select() (or WaitForMultipleObjects()) based continuation. Net-SNMP doesn't support continuations at the table iterator level, so somehow, we need to implement an iterator which can accomodate our blocking IPC mechanism. [No, we don't use threads, and that would actually create more problems than it solves -- running single-threaded with continuations lets us run lock free, and we rely on the OS's IPC primitives to serialize our code. works just fine for us so far...] So we would end up caching the whole primary key range in the SNMP sub-agent on a table OID access, a technique which would allow us to defer the IPC calls providing we walk the entire range of the iterator and cache the keys -- but even THAT is far too much data for the BGP table, which is a trie with ~250,000 entries. I hate SNMP GETNEXT. Back to the FreeBSD kernel, though. If you look at in_mcast.c, particularly in p4 bms_netdev, this is what happens for the per-socket multicast source filters -- there is the linearization of an RB-tree for setsourcefilter(). This is fine for something with a limit of ~256 entries per socket (why RB for something so small? this is for space vs time -- and also it has to merge into a larger filter list in the IGMPv3 paths.) And the lock granularity is per-socket. However it doesn't do for something as big as a BGP routing table. C++ lends itself well to expressing these kinds of smart-pointer idioms, though. I'm thinking perhaps we need the notion of a sysctl iterator, which allocates a token for walking a shared data structure, and is able to guarantee that the token maps to a valid pointer for the same entry, until its 'advance pointer' operation is called. Question is, who's going to pull the trigger? cheers BMS P.S. I'm REALLY getting fed up with the lack of openness and transparency largely incumbent in doing work in p4. Come one come all -- we shouldn't need accounts for folk to see and contribute what's going on, and the stagnation is getting silly. FreeBSD development should not be a committer or chum-of-committer in-crowd.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48BD53F9.50002>