Date: Thu, 28 Feb 2008 03:43:57 +0100 From: Marko Zec <zec@icir.org> To: Kris Kennaway <kris@freebsd.org> Cc: Marko Zec <zec@freebsd.org>, Brooks Davis <brooks@freebsd.org>, Andre Oppermann <andre@freebsd.org>, Julian Elischer <julian@elischer.org>, FreeBSD Current <current@freebsd.org> Subject: Re: warning of pending commit attempt. Message-ID: <200802280343.57576.zec@icir.org> In-Reply-To: <47C49FAA.1020605@FreeBSD.org> References: <47C39948.3080907@elischer.org> <47C494B5.2040306@elischer.org> <47C49FAA.1020605@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 27 February 2008 00:24:26 Kris Kennaway wrote: > Julian Elischer wrote: > > Kris Kennaway wrote: > >> Julian Elischer wrote: > >>> Andre Oppermann wrote: > >>>> Brooks Davis wrote: > >>>>> On Mon, Feb 25, 2008 at 08:44:56PM -0800, Julian Elischer wrote: > >>>>>> At some stage in the next few weeks I will be trying to commit > >>>>>> Marco Zec's vimage code to -current. (only 'trying' not > >>>>>> for technical reasons, but political). > >>>> > >>>> ... > >>>> > >>>>>> Why now? > >>>>>> The code is in a shape where teh compiled out version of hte > >>>>>> system is stable. In the compiled in version, it is functional > >>>>>> enough to provide nearly all of what people want. It needs > >>>>>> people with other interests to adapt it to their purposes and > >>>>>> use it so that it can become a solid product for future > >>>>>> releases. > >>>>> > >>>>> The website has a snapshot with a date over a month old and > >>>>> many comments about unstable interfaces. I've seen zero > >>>>> reports of substantial testing... > >>>> > >>>> What about locking and SMP scalability? Any new choke points? > >>> > >>> not that I've seen. > >> > >> That's a less than resounding endorsement :) > > > > do the 10Gb ethernet adapters have any major problems? > > are you willing to answer "no"? > > should we then rip them from the tree? > > Those are small, isolated components, so hardly the same thing as a > major architectural change that touches every part of the protocol > stack. > > But if someone came along and said "I am going to replace the 10ge > drivers, but I dunno how well they perform" I'd say precisely the > same thing. > > Presumably someone (if not you, then Marko) has enough of a grasp of > the architectural changes being proposed to comment about what > changes (if any) were made to synchronisation models, and whether > there are new sources of performance overhead introduced. > > That person can answer Andre's question. OK first my appologies to everybody for being late in jumping into this thread... I'll attempt to address a few questions rised so far in a random order, but SMP scalability definitely tops the list... I think it's safe to assume that network stack instances / vimages will have lifetime frequencies similar to those of jails, i.e. once they get instantiated, in typical applications vimages would remain static over extended periods of time, rather than created and teared off thousands of times per second like TCP sessions or sockets in general. Hence, synchronizing access to global vimage or vnet lists can be probably accomplished using rmlocks which are essentially free for read-only consumers. The current code in p4 still uses a handcrafted shared / exclusive refcounted locking scheme with refcounts protected by a spinlock, since in 7.0 we don't have rmlocks yet, but I'll try converting those to rmlocks in the "official" p4 vimage branch which is tracking HEAD. Another thing to note is that the frequency of read-only iterations over vnets is also quite low - mostly this needs to be done only in slowtimo(), fasttimo() and drain() networking handlers, i.e. only a couple of times per second. All iteration points are easy to fgrep for in the code given that they are always implemented using VNET_ITERLOOP macros, which simply vanish away when the kernel is compiled without options VIMAGE. But most importantly on the performance critical datapaths (i.e. socket - TCP - IP - link layer - device drivers, and vice versa) no additional synchronization points / bottlenecks were introduced. In fact, the framework opens up the possibility to replicate some of the existing choked locks over multiple vnets, potentially reducing contention in cases where load would be evenly spread over multiple vimages / vnets. Other people have asked about vimages and jails: yes it is possible to run multiple jails inside a vimage / vnet, with the original semantics of jails completely preserved. Non-developers accessing the code: after freebsd.org's p4 to anoncvs autosyncer died last summer I've tried posting source tarballs every few weeks on the project's somewhat obscure web site (that Julian has advertised every now and then on this list): http://imunes.net/virtnet/ I've just dumped a diff against -HEAD there, and will post new tarballs in a few minutes as well. Impact of the changes on device drivers: in general no changes were needed at the device driver layer, as drivers do not need to be aware that they are running on a virtualized kernel. Each NICs is logically attached to one and only one network stack instance at a time, and it receives data from upper layers and feeds the upper layers with mbufs in exactly the same manner as it does on the standard kernel. It is the link layer that demultiplexes the incoming traffic to the appropriate stack instance... Overall, there's a lot of cleanup and possibly restructuring work left to be done on the vimage code in p4, with documenting the new interfaces probably being the top priority. I'm glad to see such a considerable amount of (sudden) interest for pushing this code into the main tree, so now being smoked out of my rathole I'll be happy to work with Julian and other folks to bring the vimage code closer to CVS and help maintaining it one way or another once it hopefully gets there, be it weeks or months until we reach that point - the sooner the better of course... Cheers, Marko
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200802280343.57576.zec>