Date: Mon, 11 Sep 2000 10:36:32 -0700 From: Alfred Perlstein <bright@wintelcom.net> To: net@freebsd.org Subject: Network stack journal. Message-ID: <20000911103632.E12231@fw.wintelcom.net>
next in thread | raw e-mail | index | archive | help
Journal: threading the FreeBSD network stack. --------------------------------------------- Notes: When I use a lowercase name for someone that refers to thier freefall login name. --------------------------------------------- Preface: I'm writing the journal for several reasons: 1) to provide a place for notes, because the network stack is so large there's going to be many parts I'm going to have to skip over, I'll note what I've skipped over so either I can get back to it or someone else can jump in and do it. 2) document how the locking systems I'm putting in work 3) random thoughts as I progress either towards my goal or insanity I started working on this a day or two after the SMPng commit which brought FreeBSD mutex primatives and interrupt threads, which was sometime in the first week of Sept 2000. I started this journal a couple of days after starting my work so I will detail a few things that have happened so far: Initially I wanted to place mutex locks in both the socket and socketbuffer structures, that proved to be too painful, instead use a lock on the socket and keep the old sleep/flags locking on the socketbuffer, there isn't a race because the socketbuffer flags are protected by my socket lock and the newly added msleep() function allows me to maninpulate the flags and sleep on them safely with my socket mutex interlocked. I'm gone through a lot of the code replacing manipulation of statistical counters with atomic_ operations, some places have many manipulations (particularly the tcp code) it may make more sense to keep a local statistics counter on the stack and do a batched update of the global statistics structure under a spinlock. Other alternatives include per-cpu counters but I've heard many negative comments about doing stats like that. Bosko Milekic <bmilekic@dsuper.net> was kind enough to MPsafe the mbuf allocator code, we need to test this, he used await/asleep rather than msleep, this ought to be checked for validity as the asleep interface was implemented before SMPng and may not be safe. I'm hoping that Bosko sticks around to help out, he's got some great programming skill and there's a lot of code to work on. I've already decided that my initial goal is going to be getting udp and tcp4 working, unfortunatly that means I'm most likely not working on: BRIDGE, DUMMYNET, INET6, NETATALK, NS, IPX, IPSEC, NETGRAPH I suspect that they can easily be made mpsafe, but they aren't a consideration at this point, I just want to get something working right now and that means userland<-(tcp/udp)->wire MPsafe code. The good part is that now more than ever developers are active enough to jump in and fix these. And before I get flamed off the earth I most likely will not be committing until INET6, IPSEC and NETGRAPH maintainers are comfortable with it. Malloc is now MPsafe thanks to jasone and jake which is obviously an important and key starting point. I had an interesting discovery the other night, when replaceing an spl with a mutex over a particular structure we must be very careful. While the spl is raised we can tsleep and are effectively dropping the mutual exclusion however we must be wary of that when switching over to mutexes to avoid deadlocks. A quick (stupid) example: calling a function to wait for data to arrive on a socket while holding the socket lock and forgetting to drop the lock before calling it. Normally spl would be dropped the instant you slept and the network stack could churn along and dump some data into your socketbuffer, but this is no longer the case, the interrupt must also block against your mutex and if you screw up you block waiting for data while the socket is locked against outside manipulation including data arrival. So far I think I have a pretty sound system protecting sockets, there also some preliminary stuff with routes and pcbs but I need to work on those more. I've switched the ucred system to use atomic ops which should make it mpsafe. Journal continued at: http://people.freebsd.org/~alfred/mpsafe/stackjournal.txt Work in progress: http://people.freebsd.org/~alfred/mpsafe/mpsafestack.diff Ok, and here begins a time based journal. ---------------------------------------------- Mon Sep 11 10:16:50 PDT 2000 Realized that attempting to thread tcp_input code before ether code was a bad idea. The tcp code uses global variables from the IP code which probably uses globals from the ether code, so I'm working in the wrong direction (or working in a direction that's going to have me spread out too thin). I've decided to take this route. either_input->ip_input->tcp/udp_input-> and tcp_output->ip_output->ether_output -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000911103632.E12231>