From owner-freebsd-net@FreeBSD.ORG Tue Sep 2 10:48:34 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CE5DE1065671 for ; Tue, 2 Sep 2008 10:48:34 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.9.129]) by mx1.freebsd.org (Postfix) with ESMTP id 585958FC15 for ; Tue, 2 Sep 2008 10:48:34 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 9C0957309C; Tue, 2 Sep 2008 12:51:24 +0200 (CEST) Date: Tue, 2 Sep 2008 12:51:24 +0200 From: Luigi Rizzo To: FreeBSD networking and TCP/IP list Message-ID: <20080902105124.GA22832@onelab2.iet.unipi.it> References: <3170f42f0809010507q6c37a9d5q19649bc261d7656d@mail.gmail.com> <48BBE7B2.4050409@FreeBSD.org> <48BCE4AA.6050807@elischer.org> <3170f42f0809020017k643180efte155a5b5701a40cf@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Subject: how to read dynamic data structures from the kernel (was Re: reading routing table) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2008 10:48:34 -0000 in the (short so far) thread which i am hijacking, the issue came out of what is a good mechanism for reading the route table from the kernel, since FreeBSD currently uses /dev/kmem and this is not always available/easy to use with dynamically changing data structures. The routing table is only one instance of potentially many similar data structures that we might want to fetch - others are the various firewall tables (the output of 'ipfw show'), possibly bridging tables, socket lists and so on. The issue is actually twofold. The interface problem, or how to pull bits from the kernel, is so easy to be almost irrelevant -- getsockopt, sysctl, kmem, or some special file descriptor does the job as long as the underlying chunk of data does not change (or can be locked) during the syscall. The real problem is that these data structures are dynamic and potentially large, so the following approach (used e.g. in ipfw) enter kernel; get shared lock on the structure; navigate through the structure and make a linearized copy; unlock; copyout the linearized copy; is extremely expensive and has the potential to block other activities for a long time. Accessing /dev/kmem and follow pointers there has probably the risk that you cannot lock the kernel data structure while you navigate on it, so you are likely to follow stale pointers. What we'd need is some internal representation of the data structure that could give us individual entries of the data structure on each call, together with extra info (a pointer if we can guarantee that it doesn't get stale, something more if we cannot make the guarantee) to allow the navigation to occur. I believe this is a very old and common problem, so my question is: do you know if any of the *BSD kernels implements some good mechanism to access a dynamic kernel data structure (e.g. the routing tree/trie, or even a list or hash table) without the flaws of the two approaches i indicate above ? cheers luigi [original thread below just for reference, but i believe i made a fair summary above] On Tue, Sep 02, 2008 at 10:19:55AM +0100, Robert Watson wrote: > On Tue, 2 Sep 2008, Debarshi Ray wrote: > > >>unfortunatly netstat -rn uses /dev/kmem > > > >Yes. I also found that FreeBSD's route(8) implementation does not have an > >equivalent of 'netstat -r'. NetBSD and GNU/Linux implementations have such > >an option. Any reason for this? Is it because you did not want to muck > >with /dev/kmem in route(8) and wanted it to work with PF_ROUTE only? I > >have not yet gone through NetBSD's route(8) code though. > > Usually the "reason" for things like this is that no one has written the > code to do otherwise :-). PF_ROUTE is probably not a good mechanism for > any bulk data transfer due to the constraints of being a datagram socket, > although doing it via an interated dump rather than a simple dump operation > would probably work. Sysctl is generally a better interface for monitoring > for various reasona, although it also has limitations. Maintaining > historic kmem support is important, since it is also the code used for > interpreting core dumps, and we don't want to lose support for that. > > Robert N M Watson > Computer Laboratory > University of Cambridge > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"