From owner-freebsd-arch@FreeBSD.ORG Sun Jan 6 00:30:33 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3EFF016A420 for ; Sun, 6 Jan 2008 00:30:33 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outS.internet-mail-service.net (outS.internet-mail-service.net [216.240.47.242]) by mx1.freebsd.org (Postfix) with ESMTP id 2503313C46A for ; Sun, 6 Jan 2008 00:30:33 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Sat, 05 Jan 2008 16:30:32 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id CE863126E27; Sat, 5 Jan 2008 16:30:30 -0800 (PST) Message-ID: <47802137.8020701@elischer.org> Date: Sat, 05 Jan 2008 16:30:47 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Vadim Goncharov References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org> In-Reply-To: Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org, Ivo Vachkov , Robert Watson , Qing Li , FreeBSD Net Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jan 2008 00:30:33 -0000 Vadim Goncharov wrote: > 04.01.08 @ 00:52 Julian Elischer wrote: > >>>> By the way, I might add that in the 6.x compat. version I may end up >>>> limiting the feature to 8 tables. This is because I need to store some >>>> stuff in an efficient way in the mbuf, and in a compatible manner >>>> this is easiest done by stealing the top 4 bits in the mbuf dlags word >>>> and defining them as: >>>> >>>> #define M_HAVEFIB 0x10000000 >>>> #define M_FIBMASK 0x07 >>>> #define M_FIBNUM 0xe0000000 >>>> #define M_FIBSHIFT 29 >>>> #define m_getfib(_m, _default) ((m->m_flags & M_HAVE_FIBNUM) ? >>>> ((m->m_flags >> M_FIBSHIFT) & M_FIBMASK) : _default) >>>> #M_SETFIB(_m, _fib) do { \ >>>> _m->m_flags &= ~M_FIBNUM; \ >>>> _m->m_flags |= (M_HAVEFIB|((_fib & M_FIBMASK) << M_FIBSHIFT));\ >>>> } while (0) >>>> >>>> This then becomes very easy to change to use a tag or >>>> whatever is needed in later versions , and the number can >>>> be expanded past 8 predefined FIBs at that time.. >>> If you want it to be a tag, why spent bits in m_flags and not just >>> do it as a tag at once? Or it is supposed to completely throw away >>> 6.x (possibly 7.x too) implementation in favor of right thing in 8.0 ? >> >> basically yes.. >> >> I'm looking at just doing tags to start with, but haven't done it >> yet.. I'm looking for a good bit of tag code to copy :-) > > Look at ipfw's O_ALTQ/O_TAG/O_TAGGED (ands some other parts), ng_tag.c, > ng_ipfw.c, ng_ksocket.c and some other stuff :-) Tags are simple, if 16 > bits are enough to you then even do not have to allocate data, just use > tag_id member. Also they are easy to manipulate within netgraph with > ng_tag, etc. But as drawback - you have to allocate memory for them, an > as it is M_NOWAIT, malloc() can return NULL in interrupt threads... So a > new field in mbuf (or flags) would be better in terms of performance, > but it will break ABI :( so that may happen later.. this code is specifically to not break ABIs. The tag method worries me as overhead for potentially every packet might bee too much. In mbuf field is the delux solution. > > I don't have m_tag_alloc() measurements, though. Doing 'ipfw add 1 tag 1 > ip from any to any' on a 15 kpps 6.2 router didn't cause any noticeable > slowdown while looking for half a minute at 'systat -vm 1'... that already has ipfw overhead. it may be noticable if you are coparing adding and reading tags in a data path with no ipfw overhead. > >> setfib 3 /bin/sh >> >> now by default everythign you do uses table 3. >> or even >> >> setfib 3 jail {blah} >> >> and all the procs in the jail use table 3. You also need to do >> setfib 3 jexec xxx >> for extra processes you add to the jail afterwards. > > May be introduce a field in a struct prison to make it possible without > additional commands? yes it's in my original description email that that may be an option. > >>>>>> 2/ packets received on an interface for forwarding. >>>>>> By default these packets would use table 0, >>>>>> (or possibly a number settable in a sysctl(not yet)). >>>>>> but prior to routing the firewall can inspect them (see below). >>>>>> >>>>>> 3/ packets inspected by a packet classifier, which can arbitrarily >>>>>> associate a fib with it on a packet by packet basis. >>>>>> A fib assigned to a packet by a packet classifier >>>>>> (such as ipfw) would over-ride a fib associated by >>>>>> a more default source. (such as cases 1 or 2). >>> Sounds good. I like idea to do routing decisions in firewall, to not >>> double kernel code and userspace utilities, like in Linux' iproute2 >>> (which, however, still have a few parameters and relies on firewall >>> marks for others). However, there are some cases, I think, where it >>> could be done outisde firewall. For example, make an ifconfig option >>> to use a specific FIB as a default for all packets outgoing from this >>> interface's address. But here arises another related question - Linux >>> allows to select a specific src IP based on a routing table entry - >>> destination address (thoughts about pf reply-to/route-ro, huh). >> >> that is default here too if I understand what you are talking about. >> teh src address is selected from the routing table's exit interface. >> In the code I'm showing in perforce, that address would depend on >> which table your process was associated with. (or just the socket if >> you have used the socket option on it before doing the bind/connect) > > What I'm talking about is adding possibility for future MPLS/VRF/etc. > For example, if we make an interface option to use a specific FIB on > that interface, for every incoming packet (put a tag on early input?), > then ARP replies, ICMP redirects (yes, make stack to process them to > particular FIB if specified, not to main) and so on will affect only > this table. Then, it will be possible, say, to have 192.168.0.0/24 on > em0 and also have 192.168.0.0/24 on em1, but that networks are > completely independent of each other on both L2 and L3 (different > customers) - after that, a change allowing to have the same IP address > on different interfaces will lead to complete virtual independence. > Without any vimages - why do we need separate TCP stacks etc. copies on > a router without any jails, under a single administrator's control? > > Yes, this may be difficult with planned L2/L3 separation (currently ARP > table is in fact part of FIB), but it is solvable - say, by binding an > ARP table to one or several FIBs. Moreover, I think that complete stack > virtulization in each jail/vimage is waste of resources - instead one or > several FIBs/interfaces/ARP tables can be bound to each vimage/jail, > possibly with write permissions. I'm a great believer of vimage. I don't want to duplicate that functionality. > > And even all of above is considered a far future and/or will be made > different way, FIB binding to interface is still useful for (both > incoming and) outgoing packets to make a firewall ruleset simpler. "maybe" > >>> In relation to this I can remember multipath routing (different >>> metrics?), addresses from one subnet on different ifaces (mask wider >>> /32) and so on. >>> Also it is interesting, how multiple FIBs would interact with >>> host-wide events, such as ICMP redirects (which table should be >>> updated?), storing of TCP stack metrics (MTU, etc.) and hostcache, >>> and so on. How these and above will be solved?.. >> >> I'm not really too knowledgeable about multicast.. typo .. I meant multipath. > > Is multicast and multipath routing the same? > >>> per ifconfig (>1 host per subnet)/icmp redirects/src to prefer, >>> multipath/metrics, tcp stack parameters interaction, iproute2 >> >> I'm not trying to solve problems that need vimage to solve them.. > > Umm, what vimage?.. :) I forgot to clear these keywords written for > myself when writing draft and expaining them in detail,sorry :) Marko's vimage code solves much of this in a much cleaner manner. I'm hoping that we will eventually have multiple routing tables in multiple vimages. > >>>>>> Routing messages would be associated with their >>>>>> process, and thus select one FIB or another. >>> This is not clear. How should the 'route' command work with >>> different FIBs, if they are supposed by admin to be used for >>> forwarding, and not the straight per-process? I think a setfib option >>> is more consistent than running route under setfib command. Also, >>> routing sockets and routing daemons - should they work with only one >>> table?.. >> >> if you do >> setfib 3 route get 1.1.1.1 >> >> you may get a different result from >> >> setfib 2 route get 1.1.1.1 >> >> I will add a fibnum argument to route itself as well but it's not >> needed immediately as long as I have the setfib command. > > OK, but we should think about it in the future. In theory, routing > socket's messages are easily extendable with FIB number in uint16_t, as > message keeps it's length... I will do that with the advice of people who know that protocol better than I do. > >>>>>> I have not yet added the changes to ipfw. >>> Action modifier, like 'ipfw add count setfib 3 ip from any to any' ? >>> There were thoughts (I heard,t as a hack before multiple FIBs) about >>> making an additional, say, 'nexthop' ipfw action, which acts like >>> fwd, but does not accept packet, allowing to continue it through >>> firewall ruleset - thus making it more comfortable to separate >>> routing (imagine 'nexthop tablearg') and filtering. There are >>> questions with both fwd and new supposed option: will fwd still >>> survive? Will it change the output interface, like as complete >>> rerouting before calling pfil(9) hooks, so that *oif will be changed >>> to be mathed iin rules below? pf route-to/reply-to is hanging around... >> >> The 'nexthop' cal you suggest is problematic because it needs to >> return information immediately. which is why it is terminal. > > Um, why? Why it can't continue through ruleset? I don't know > implementation details of routing and 'ipfw fwd', alas, the way the nexthop/fwd command is implemented, the rule needs to return to the caller immediatly. > >> As for the setfib ipfw action, I have now done this in p4. >> >> ipfw add 200 setfib 3 ip from any to any in receive em0 >> >> now works. >> This lessens the need for associating a fib with an interface as the >> firewall can do that too.. >> >> the setfib rule is not terminal. (hmm need to check I did that right.) > > Oh, it it works, that's cool. > >> you can also do >> ipfw add 200 skipto 300 ip from any to any hasfib >> # to select on a packet that has a fib associated with it already. >> ipfw add 200 skipto 300 ip from any to any fib 4 >> # to slelect packets that are associated with fib 4 >> ipfw add 200 clrfib ip from any to any >> # to remove a fib association from the packet. > > Do we need a separate keyword 'clrfib' while it could be 'setfib 0' ? Or > at least save one opcode in kernel's ipfw. Also, it would be nice to > have 'setfib tablearg' together with reserving 16 bits for FIB number - > some systems with hundreds of vlans will want to have more than 256 > tables, I think... having an override fib is differnt from having a fib of 0. I'm not sure about tablearg yet.. I've considered it but not in the first version.. > >>>>>> Interaction with the ARP layer/ LL layer would need to be >>>>>> revisited as well. Qing Li has been working on this already. >>> Oh yes, L2 interaction is interesting. How it should work in case of >>> planned separation of routing and ARP tables?.. > > I've explained my views about it above... > From owner-freebsd-arch@FreeBSD.ORG Sun Jan 6 11:57:57 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3897616A419; Sun, 6 Jan 2008 11:57:57 +0000 (UTC) (envelope-from freebsd@levsha.org.ua) Received: from expo.ukrweb.net (expo.ukrweb.net [193.125.78.116]) by mx1.freebsd.org (Postfix) with ESMTP id DCA0213C468; Sun, 6 Jan 2008 11:57:56 +0000 (UTC) (envelope-from freebsd@levsha.org.ua) Received: from levsha by expo.ukrweb.net with local (Exim 4.68 (FreeBSD)) (envelope-from ) id 1JBTYP-000I6S-M8; Sun, 06 Jan 2008 13:20:33 +0200 Date: Sun, 6 Jan 2008 13:20:33 +0200 From: Mykola Dzham To: Julian Elischer Message-ID: <20080106112033.GA40991@expo.ukrweb.net> References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <477D2EF3.2060909@elischer.org> X-Operating-System: FreeBSD/5.4-RELEASE-p6 (i386) User-Agent: Mutt/1.5.6i Cc: Qing Li , FreeBSD Net , arch@freebsd.org, Ivo Vachkov , Robert Watson , Vadim Goncharov Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jan 2008 11:57:57 -0000 Julian Elischer wrote: > > setfib 3 /bin/sh > > now by default everythign you do uses table 3. > or even > > setfib 3 jail {blah} > > and all the procs in the jail use table 3. You also need to do > setfib 3 jexec xxx > for extra processes you add to the jail afterwards. Is it possible to deny setfib after setfib N /bin/sh ? Or call setfib from jail? If yes this can be usable for restriction jail on some different fib -- Mykola Dzham, LEFT-(UANIC|RIPE) JID: levsha@jabber.net.ua From owner-freebsd-arch@FreeBSD.ORG Sun Jan 6 13:47:26 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ECAA016A46E; Sun, 6 Jan 2008 13:47:25 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id B5B1F13C474; Sun, 6 Jan 2008 13:47:25 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 5C1804B2B0; Sun, 6 Jan 2008 08:47:25 -0500 (EST) Date: Sun, 6 Jan 2008 13:47:24 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: arch@FreeBSD.org Message-ID: <20080106124517.G105@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: kmacy@FreeBSD.org, net@FreeBSD.org Subject: Network device driver KPI/ABI and TOE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jan 2008 13:47:26 -0000 Dear all, Last month, Kip Macy committed support for TCP offload to the FreeBSD CVS repository for the Chelsio 10gbps device driver. We've had interest from other vendors in supporting TOE on FreeBSD, although it remains unclear as yet which will end up supporting it. This e-mail is about how we want to treat the TOE interface with respect to third party device driver support, and more specifically to propose that we not consider the TOE interface to be part of our stable network device driver KPI/ABI once it appears in a RELENG_X branch. The background: in the last few FreeBSD versions (late 5.x, 6.x, 7.x), we've attempted to offer network and storage device driver authors a stable KPI and ABI across minor FreeBSD releases. The goal of this has been to allow authors to produce a device driver module for a .0 release, and then have it continue to function for .1, .2, and so on. We've not attempted to formalize the details of this for network device drivers, but implicitly this includes interface stability for things like mbuf and memory management routines, the ifnet interface, locking interfaces and data structures, newbus, busdma, and so on. If we had to, we would break the ABI in order to fix critical bugs (etc), but we try hard to avoid it in order to improve interface stability, and, in general, we choose not to MFC features that would break existing device drivers. TOE comes with a series of defined interfaces in toedev.h (documentation forthcoming) and tcp_offload.h (documentation now in comments). However, TOE implementations must also interact directly with the TCP and other stack internals, including directly accessing socket buffers, routing, the inpcb and tcpcb data structures, TCP and inpcb locking protocols, and so on. This happens for two reasons: - First, TOE needs to interact with the contents of sockets and TCP in order to implement the offload (i.e., extracting data from socket buffers to transmit it, putting data into socket buffers on receive, accessing TCP connection properties such as socket options, address bindings, listen state, etc). - Second, TOE hardware implementations often don't implement all of TCP: they may implement the steady state but not TCP TIMEWAIT or connection setup, for example. To get a sense of the level of intimacy of one such driver, it's well worth perusing src/sys/dev/cxgb/ulp/tom in HEAD. This is not a criticism, but I do want people to be aware of what's there before getting involved in this discussion: TOE takes to a whole new level the mantra that layering is good for protocol design, but not good for implementation performance, and spans pretty much all layers of the network stack in its scope. There are serious ABI implications to this approach, as historically we've made significant changes to the TCP and socket buffer internals during -stable branches, such as optimizing performance, adding new TCP features, etc. There's a fairly aggressive list of forthcoming TCP features for 8.0 with MFC plans for several of them, such as congestion control selection and multiple routing tables. I've not attempted to analyze these past or proposed changes in detail to determine how disruptive they would be to a TOE implementation, but my guess is that they might well break TOE drivers, especially historic ones, had TOE been supported at the time. My proposal, and this is really a proposal to drive discussion as much as a proposal for a policy, is that the internal TCP data structures exported via the TOE interfaces and accessed by TOE device drivers *not* be considered ABI/KPI-stable in -STABLE branches. While I think we shouldn't intentionally change them to break TOE, it's unrealistic to expect that these network stack internals won't change as part of normal maintenance and feature development that take place in -STABLE branches. For those who aren't involved in those day-to-day internals, a comparable situation might be if a CAM SCSI storage driver was dependent not only on there being no changes made to the on-disk layout of UFS (even backwards compatible ones), but also the in-memory data structures of soft updates. Any significant changes to soft updates internals would break such device drivers due to a requirement for forward compatibility. In some ways this isn't a perfect comparison, as soft updates isn't under active development, but from a layering and abstraction perspective, it's quite similar. We don't yet ship TOE in a -STABLE branch, but I believe Kip hopes to MFC TOE support, and with other device driver vendors starting to take a look, I think we want out thoughts on the table regarding this matter. I presume that we'll see the TOE interfaces continue to evolve over the next 6-18 months, and we should make sure that we know whether or not third party device driver authors can expect ABI/KPI stability before, rather than after, it hits a -STABLE branch. On a similar note, these necessary changes to network stack internals will result in modifications to in-tree device drivers, so device driver authors who implement TOE should expect to see the TOE parts of their drivers being significantly modified as development occurs on those other parts of the stack. There's also the opportunity to think about whether it's possible to harden things in such a ways as to not give up our flexibility to keep maintaining and improving TCP (and other related subsystems), yet improving the quality of life for a third party TOE driver maintainer. For example, might we provide accessor routines for certain data structures, or attempt to structure things to hide more of TCP locking from a TOE implementation? Should we suggest that non-native TOE implementations rely less on our TCP code and provide there own where the hardware doesn't provide a complete implementation, in order to avoid building dependency on things that we know will change? Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Sun Jan 6 14:48:17 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EB1A916A417; Sun, 6 Jan 2008 14:48:17 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from out4.smtp.messagingengine.com (out4.smtp.messagingengine.com [66.111.4.28]) by mx1.freebsd.org (Postfix) with ESMTP id C31D513C46B; Sun, 6 Jan 2008 14:48:17 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from compute1.internal (compute1.internal [10.202.2.41]) by out1.messagingengine.com (Postfix) with ESMTP id 4EF1983258; Sun, 6 Jan 2008 09:30:03 -0500 (EST) Received: from heartbeat2.messagingengine.com ([10.202.2.161]) by compute1.internal (MEProxy); Sun, 06 Jan 2008 09:30:03 -0500 X-Sasl-enc: nhl4+Qhx1Sp1F8iKOI4jppIjlJmSUMq8Ot3NP8CjVKq2 1199629802 Received: from empiric.lon.incunabulum.net (82-35-112-254.cable.ubr07.dals.blueyonder.co.uk [82.35.112.254]) by mail.messagingengine.com (Postfix) with ESMTP id BE3D523475; Sun, 6 Jan 2008 09:30:01 -0500 (EST) Message-ID: <4780E5E7.2070202@FreeBSD.org> Date: Sun, 06 Jan 2008 14:29:59 +0000 From: "Bruce M. Simpson" User-Agent: Thunderbird 2.0.0.6 (X11/20070928) MIME-Version: 1.0 To: Vadim Goncharov References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Qing Li , FreeBSD Net , arch@freebsd.org, Ivo Vachkov , Robert Watson , Julian Elischer Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jan 2008 14:48:18 -0000 Vadim Goncharov wrote: > > Is multicast and multipath routing the same? No. They are currently orthogonal. However it makes sense to merge the multicast and unicast forwarding code as currently MROUTING is limited to a fan-out of 32 next-hops only. In multicast, next-hops are normally just interfaces. Also the IETF MANET ad-hoc IP is going to need hooks there; multicast in MANET needs to address its next-hops by their unicast address, and encapsulate the traffic with a header. This is not true link layer multicast -- although it might use link layer multicast to leverage the hash filters in 802.11 MACs. As regards getting ARP out of forwarding tables, this should have happened a long time ago... BMS From owner-freebsd-arch@FreeBSD.ORG Sun Jan 6 14:48:18 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EEADE16A419; Sun, 6 Jan 2008 14:48:17 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from out4.smtp.messagingengine.com (out4.smtp.messagingengine.com [66.111.4.28]) by mx1.freebsd.org (Postfix) with ESMTP id C330013C478; Sun, 6 Jan 2008 14:48:17 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from compute1.internal (compute1.internal [10.202.2.41]) by out1.messagingengine.com (Postfix) with ESMTP id 2611983053; Sun, 6 Jan 2008 09:31:48 -0500 (EST) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute1.internal (MEProxy); Sun, 06 Jan 2008 09:31:48 -0500 X-Sasl-enc: hIZXm1v/XEp5jk+bgdWLzB+SqxKxPZ/jOLMCDbmrXXRi 1199629907 Received: from empiric.lon.incunabulum.net (82-35-112-254.cable.ubr07.dals.blueyonder.co.uk [82.35.112.254]) by mail.messagingengine.com (Postfix) with ESMTP id 06EA110BF0; Sun, 6 Jan 2008 09:31:46 -0500 (EST) Message-ID: <4780E652.5040804@FreeBSD.org> Date: Sun, 06 Jan 2008 14:31:46 +0000 From: "Bruce M. Simpson" User-Agent: Thunderbird 2.0.0.6 (X11/20070928) MIME-Version: 1.0 To: Julian Elischer References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org> <47802137.8020701@elischer.org> In-Reply-To: <47802137.8020701@elischer.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Qing Li , FreeBSD Net , Vadim Goncharov , arch@freebsd.org, Ivo Vachkov , Robert Watson Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jan 2008 14:48:18 -0000 Julian Elischer wrote: >> >> OK, but we should think about it in the future. In theory, routing >> socket's messages are easily extendable with FIB number in uint16_t, >> as message keeps it's length... > > I will do that with the advice of people who know that protocol better > than I do. I'm afraid Linux is still ahead of the game here. They adopted a tag-length-value protocol called NETLINK which solves many of the problems inherent in PF_ROUTE. It even has an RFC. BMS From owner-freebsd-arch@FreeBSD.ORG Sun Jan 6 17:56:46 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A75F416A469 for ; Sun, 6 Jan 2008 17:56:46 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outF.internet-mail-service.net (outF.internet-mail-service.net [216.240.47.229]) by mx1.freebsd.org (Postfix) with ESMTP id 99B3013C457 for ; Sun, 6 Jan 2008 17:56:46 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Sun, 06 Jan 2008 09:56:45 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 17711126E36; Sun, 6 Jan 2008 09:56:45 -0800 (PST) Message-ID: <4781166D.2010108@elischer.org> Date: Sun, 06 Jan 2008 09:57:01 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Mykola Dzham References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org> <20080106112033.GA40991@expo.ukrweb.net> In-Reply-To: <20080106112033.GA40991@expo.ukrweb.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Qing Li , FreeBSD Net , arch@freebsd.org, Ivo Vachkov , Robert Watson , Vadim Goncharov Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jan 2008 17:56:46 -0000 Mykola Dzham wrote: > Julian Elischer wrote: >> setfib 3 /bin/sh >> >> now by default everythign you do uses table 3. >> or even >> >> setfib 3 jail {blah} >> >> and all the procs in the jail use table 3. You also need to do >> setfib 3 jexec xxx >> for extra processes you add to the jail afterwards. > > Is it possible to deny setfib after setfib N /bin/sh ? Or call setfib > from jail? If yes this can be usable for restriction jail on some > different fib > I hadn't considered that.. though possibly what you want is vimage(). From owner-freebsd-arch@FreeBSD.ORG Sun Jan 6 18:07:49 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9BEE116A417 for ; Sun, 6 Jan 2008 18:07:49 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outB.internet-mail-service.net (outB.internet-mail-service.net [216.240.47.225]) by mx1.freebsd.org (Postfix) with ESMTP id 926D413C44B for ; Sun, 6 Jan 2008 18:07:49 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Sun, 06 Jan 2008 10:07:48 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 39DA7126E32; Sun, 6 Jan 2008 10:07:48 -0800 (PST) Message-ID: <47811904.4060300@elischer.org> Date: Sun, 06 Jan 2008 10:08:04 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Robert Watson References: <20080106124517.G105@fledge.watson.org> In-Reply-To: <20080106124517.G105@fledge.watson.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org, kmacy@FreeBSD.org, net@FreeBSD.org Subject: Re: Network device driver KPI/ABI and TOE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jan 2008 18:07:49 -0000 Robert Watson wrote: > > There's also the opportunity to think about whether it's possible to > harden things in such a ways as to not give up our flexibility to keep > maintaining and improving TCP (and other related subsystems), yet > improving the quality of life for a third party TOE driver maintainer. > For example, might we provide accessor routines for certain data > structures, or attempt to structure things to hide more of TCP locking > from a TOE implementation? Should we suggest that non-native TOE > implementations rely less on our TCP code and provide there own where > the hardware doesn't provide a complete implementation, in order to > avoid building dependency on things that we know will change? I think the answer is to do as you suggest, and provide some sort of interface with access methods so that TOE doesn't see so much of the internal side of the networking, but has methods (no matter how specialised) to do these things. Unfortunately I am not sure that can be done in all situations.. for example I'm not sure you could isolate a change in the mbuf packet header. (That is a whole different discussion.. I think we may need to give mbufs a workover for the 21st century but I digress...) I'll read this again when I have more time.. I'm of course "interested" due to various bits of work I have going.. > > Robert N M Watson > Computer Laboratory > University of Cambridge > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Sun Jan 6 18:09:52 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4276116A419 for ; Sun, 6 Jan 2008 18:09:52 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outD.internet-mail-service.net (outD.internet-mail-service.net [216.240.47.227]) by mx1.freebsd.org (Postfix) with ESMTP id 33EF413C45A for ; Sun, 6 Jan 2008 18:09:52 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Sun, 06 Jan 2008 10:09:51 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id C6FA5126E1D; Sun, 6 Jan 2008 10:09:50 -0800 (PST) Message-ID: <4781197F.1000105@elischer.org> Date: Sun, 06 Jan 2008 10:10:07 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: "Bruce M. Simpson" References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org> <4780E5E7.2070202@FreeBSD.org> In-Reply-To: <4780E5E7.2070202@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Qing Li , FreeBSD Net , Vadim Goncharov , arch@freebsd.org, Ivo Vachkov , Robert Watson Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jan 2008 18:09:52 -0000 Bruce M. Simpson wrote: > Vadim Goncharov wrote: >> >> Is multicast and multipath routing the same? > > No. They are currently orthogonal. > > However it makes sense to merge the multicast and unicast forwarding > code as currently MROUTING is limited to a fan-out of 32 next-hops only. > In multicast, next-hops are normally just interfaces. > > Also the IETF MANET ad-hoc IP is going to need hooks there; multicast in > MANET needs to address its next-hops by their unicast address, and > encapsulate the traffic with a header. This is not true link layer > multicast -- although it might use link layer multicast to leverage the > hash filters in 802.11 MACs. > > As regards getting ARP out of forwarding tables, this should have > happened a long time ago... I'm not 100 % convinced of this... I was, but I think there may still be a place for a cached arp pointer in hte next hop route to the arp entry for that next hop. I DO however thing that the arp stuff should nto be accessing its data via the routing table. > > BMS From owner-freebsd-arch@FreeBSD.ORG Sun Jan 6 18:37:20 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A486B16A418; Sun, 6 Jan 2008 18:37:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 88EF013C442; Sun, 6 Jan 2008 18:37:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 0455B48E35; Sun, 6 Jan 2008 13:37:20 -0500 (EST) Date: Sun, 6 Jan 2008 18:37:19 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Julian Elischer In-Reply-To: <47811904.4060300@elischer.org> Message-ID: <20080106182340.K105@fledge.watson.org> References: <20080106124517.G105@fledge.watson.org> <47811904.4060300@elischer.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org, kmacy@FreeBSD.org, net@FreeBSD.org Subject: Re: Network device driver KPI/ABI and TOE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jan 2008 18:37:20 -0000 On Sun, 6 Jan 2008, Julian Elischer wrote: >> There's also the opportunity to think about whether it's possible to harden >> things in such a ways as to not give up our flexibility to keep maintaining >> and improving TCP (and other related subsystems), yet improving the quality >> of life for a third party TOE driver maintainer. For example, might we >> provide accessor routines for certain data structures, or attempt to >> structure things to hide more of TCP locking from a TOE implementation? >> Should we suggest that non-native TOE implementations rely less on our TCP >> code and provide there own where the hardware doesn't provide a complete >> implementation, in order to avoid building dependency on things that we >> know will change? > > I think the answer is to do as you suggest, and provide some sort of > interface with access methods so that TOE doesn't see so much of the > internal side of the networking, but has methods (no matter how specialised) > to do these things. > > Unfortunately I am not sure that can be done in all situations.. for example > I'm not sure you could isolate a change in the mbuf packet header. (That is > a whole different discussion.. I think we may need to give mbufs a workover > for the 21st century but I digress...) > > I'll read this again when I have more time.. I'm of course "interested" due > to various bits of work I have going.. Disruptive mbuf packet header layout changes during the lifetime of a -STABLE branch are already precluded by the ABI/KPI policy, since knowledge of the packet header layout is compiled into all network device drivers. What I'm more concerned about is the new exposure of internal data structures and algorithms, and a resulting freeze of those data structures and algorithms if we were to apply our current ABI/PI policy to the TOE interfaces. I don't think we should apply it there for the forseeable future, and we should make sure there's no confusion for device vendors or end-users who may have come to rely on the stability of ifnet and related interfaces and hence assume it also applies to the new TOE interfaces. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Sun Jan 6 21:03:18 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B4CD816A419; Sun, 6 Jan 2008 21:03:18 +0000 (UTC) (envelope-from vadim_nuclight@mail.ru) Received: from mx40.mail.ru (mx40.mail.ru [194.67.23.36]) by mx1.freebsd.org (Postfix) with ESMTP id 7682113C458; Sun, 6 Jan 2008 21:03:18 +0000 (UTC) (envelope-from vadim_nuclight@mail.ru) Received: from [78.140.2.250] (port=25243 helo=nuclight.avtf.net) by mx40.mail.ru with esmtp id 1JBceK-000PPB-00; Mon, 07 Jan 2008 00:03:16 +0300 Date: Mon, 07 Jan 2008 03:03:11 +0600 To: "Julian Elischer" , "Bruce M. Simpson" References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org> <4780E5E7.2070202@FreeBSD.org> <4781197F.1000105@elischer.org> From: "Vadim Goncharov" Organization: AVTF TPU Hostel Content-Type: text/plain; format=flowed; delsp=yes; charset=koi8-r MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID: In-Reply-To: <4781197F.1000105@elischer.org> User-Agent: Opera M2/7.54 (Win32, build 3865) Cc: arch@freebsd.org, Qing Li , Ivo Vachkov , Robert Watson , FreeBSD Net Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jan 2008 21:03:18 -0000 07.01.08 @ 00:10 Julian Elischer wrote: >>> Is multicast and multipath routing the same? >> No. They are currently orthogonal. >> However it makes sense to merge the multicast and unicast forwarding >> code as currently MROUTING is limited to a fan-out of 32 next-hops >> only. In multicast, next-hops are normally just interfaces. >> Also the IETF MANET ad-hoc IP is going to need hooks there; multicast >> in MANET needs to address its next-hops by their unicast address, and >> encapsulate the traffic with a header. This is not true link layer >> multicast -- although it might use link layer multicast to leverage the >> hash filters in 802.11 MACs. >> As regards getting ARP out of forwarding tables, this should have >> happened a long time ago... > > I'm not 100 % convinced of this... > I was, but I think there may still be a place for a cached arp pointer > in hte next hop route to the arp entry for that next hop. > I DO however thing that the arp stuff should nto be accessing its > data via the routing table. Surely, routing table should contain a cached pointer to an entry in L2 table (ARP in case of Ethernet), to not do double lookups. But still separate those tables... -- WBR, Vadim Goncharov From owner-freebsd-arch@FreeBSD.ORG Sun Jan 6 21:22:07 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F03B16A41A for ; Sun, 6 Jan 2008 21:22:07 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.159]) by mx1.freebsd.org (Postfix) with ESMTP id 34DE713C46B for ; Sun, 6 Jan 2008 21:22:07 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: by fg-out-1718.google.com with SMTP id 16so4885176fgg.35 for ; Sun, 06 Jan 2008 13:22:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=UjHAisz+SN1VuzeWsiMWnN5ZkDfyUdMj/Ivt6b53I/U=; b=NsRUvMmfcK5MNcaY5GQf1wNIYc7TJzAcAxqaC1o++QLJ5gSWg1NyRKMkexFE2JbIzMnouhmq0LEARNhY3Hou43MVh0/6nUO6R+2ch7BWG5OK6Xm5LDOEErkJ3PzwmK453htwHNZQn9eek2h3QxUAlNG7SAl5NuXRO/AY5BYudGU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=XGCYEpiM+KHSBeOC8Xgo3qJI/nSS3k8LjbQr1fLTWBfHTea83WN0qn0AFgg8Rid3Inxn7ggqoQlgSEZzgrGYZt+czA39B8ZPNzjqMcMYkbI209NtuM1p2PyfJVeIp4jz6a+Cdkm9YPYaR/yFwxdc7oJiS0zYkwnYWIwpBNx8FJ8= Received: by 10.86.79.19 with SMTP id c19mr19372527fgb.31.1199653010854; Sun, 06 Jan 2008 12:56:50 -0800 (PST) Received: by 10.86.98.15 with HTTP; Sun, 6 Jan 2008 12:56:50 -0800 (PST) Message-ID: <2a41acea0801061256j867053dq69b46664e0283b3e@mail.gmail.com> Date: Sun, 6 Jan 2008 12:56:50 -0800 From: "Jack Vogel" To: "Robert Watson" In-Reply-To: <20080106124517.G105@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20080106124517.G105@fledge.watson.org> Cc: arch@freebsd.org, kmacy@freebsd.org, net@freebsd.org Subject: Re: Network device driver KPI/ABI and TOE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jan 2008 21:22:07 -0000 On Jan 6, 2008 5:47 AM, Robert Watson wrote: ... > My proposal, and this is really a proposal to drive discussion as much as a > proposal for a policy, is that the internal TCP data structures exported via > the TOE interfaces and accessed by TOE device drivers *not* be considered > ABI/KPI-stable in -STABLE branches. While I think we shouldn't intentionally > change them to break TOE, it's unrealistic to expect that these network stack > internals won't change as part of normal maintenance and feature development > that take place in -STABLE branches. > > For those who aren't involved in those day-to-day internals, a comparable > situation might be if a CAM SCSI storage driver was dependent not only on > there being no changes made to the on-disk layout of UFS (even backwards > compatible ones), but also the in-memory data structures of soft updates. Any > significant changes to soft updates internals would break such device drivers > due to a requirement for forward compatibility. In some ways this isn't a > perfect comparison, as soft updates isn't under active development, but from a > layering and abstraction perspective, it's quite similar. > > We don't yet ship TOE in a -STABLE branch, but I believe Kip hopes to MFC TOE > support, and with other device driver vendors starting to take a look, I think > we want out thoughts on the table regarding this matter. I presume that we'll > see the TOE interfaces continue to evolve over the next 6-18 months, and we > should make sure that we know whether or not third party device driver authors > can expect ABI/KPI stability before, rather than after, it hits a -STABLE > branch. On a similar note, these necessary changes to network stack internals > will result in modifications to in-tree device drivers, so device driver > authors who implement TOE should expect to see the TOE parts of their drivers > being significantly modified as development occurs on those other parts of the > stack. > > There's also the opportunity to think about whether it's possible to harden > things in such a ways as to not give up our flexibility to keep maintaining > and improving TCP (and other related subsystems), yet improving the quality of > life for a third party TOE driver maintainer. For example, might we provide > accessor routines for certain data structures, or attempt to structure things > to hide more of TCP locking from a TOE implementation? Should we suggest that > non-native TOE implementations rely less on our TCP code and provide there own > where the hardware doesn't provide a complete implementation, in order to > avoid building dependency on things that we know will change? I agree Robert, I have hit minor KPI changes during the 6.X evolution and found them annoying. On the other hand, I know what its like when a company has hardware and wants support for it :) Is it perhaps a possible compromise to put in support but leave it defined off by default? Happy New Year BTW :) Jack From owner-freebsd-arch@FreeBSD.ORG Sun Jan 6 21:41:04 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BF72C16A41B for ; Sun, 6 Jan 2008 21:41:04 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 33F0313C45D for ; Sun, 6 Jan 2008 21:41:03 +0000 (UTC) (envelope-from andre@freebsd.org) Received: (qmail 84605 invoked from network); 6 Jan 2008 21:05:35 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 6 Jan 2008 21:05:35 -0000 Message-ID: <47814AF0.9070509@freebsd.org> Date: Sun, 06 Jan 2008 22:41:04 +0100 From: Andre Oppermann User-Agent: Thunderbird 1.5.0.14 (Windows/20071210) MIME-Version: 1.0 To: Vadim Goncharov References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org> <4780E5E7.2070202@FreeBSD.org> <4781197F.1000105@elischer.org> In-Reply-To: Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Cc: Qing Li , FreeBSD Net , arch@freebsd.org, Ivo Vachkov , Robert Watson , Julian Elischer , "Bruce M. Simpson" Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jan 2008 21:41:04 -0000 Vadim Goncharov wrote: > 07.01.08 @ 00:10 Julian Elischer wrote: > > >>>> Is multicast and multipath routing the same? >>> No. They are currently orthogonal. >>> However it makes sense to merge the multicast and unicast forwarding >>> code as currently MROUTING is limited to a fan-out of 32 next-hops >>> only. In multicast, next-hops are normally just interfaces. >>> Also the IETF MANET ad-hoc IP is going to need hooks there; >>> multicast in MANET needs to address its next-hops by their unicast >>> address, and encapsulate the traffic with a header. This is not true >>> link layer multicast -- although it might use link layer multicast to >>> leverage the hash filters in 802.11 MACs. >>> As regards getting ARP out of forwarding tables, this should have >>> happened a long time ago... >> >> I'm not 100 % convinced of this... >> I was, but I think there may still be a place for a cached arp pointer >> in hte next hop route to the arp entry for that next hop. >> I DO however thing that the arp stuff should nto be accessing its >> data via the routing table. > > Surely, routing table should contain a cached pointer to an entry in L2 > table (ARP in case of Ethernet), to not do double lookups. But still > separate those tables... Locking hell over again. How do you remove an ARP entry without doing a full walk over the entire routing table (some 250K entries for the DFZ)? Make it rmlocks and be done with it. -- Andre From owner-freebsd-arch@FreeBSD.ORG Mon Jan 7 05:13:39 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9723A16A419; Mon, 7 Jan 2008 05:13:39 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from tarsier.delphij.net (delphij-pt.tunnel.tserv2.fmt.ipv6.he.net [IPv6:2001:470:1f03:2c9::2]) by mx1.freebsd.org (Postfix) with ESMTP id 2BBDF13C46E; Mon, 7 Jan 2008 05:13:39 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from tarsier.geekcn.org (unknown [202.108.54.204]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by tarsier.delphij.net (Postfix) with ESMTP id D63DB28448; Mon, 7 Jan 2008 13:13:37 +0800 (CST) Received: from localhost (unknown [202.108.54.204]) by tarsier.geekcn.org (Postfix) with ESMTP id 75D18EB2CE2; Mon, 7 Jan 2008 13:13:37 +0800 (CST) X-Virus-Scanned: amavisd-new at geekcn.org Received: from tarsier.geekcn.org ([202.108.54.204]) by localhost (mail.geekcn.org [202.108.54.204]) (amavisd-new, port 10024) with ESMTP id 83o3pk4e7piZ; Mon, 7 Jan 2008 13:13:27 +0800 (CST) Received: from charlie.delphij.net (c-67-161-39-180.hsd1.ca.comcast.net [67.161.39.180]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tarsier.geekcn.org (Postfix) with ESMTP id A4C95EB0A25; Mon, 7 Jan 2008 13:13:25 +0800 (CST) DomainKey-Signature: a=rsa-sha1; s=default; d=delphij.net; c=nofws; q=dns; h=message-id:date:from:reply-to:organization:user-agent: mime-version:to:cc:subject:x-enigmail-version:openpgp:content-type:content-transfer-encoding; b=brhI+YDv5sOq4w42lTtBdJYKEbCZFm4fi2JvxwBgT+Aqz103gqEPu6qB7VJbVnr50 mxQeuCO11uDU1fVuCr7OA== Message-ID: <4781B4F2.9040707@delphij.net> Date: Sun, 06 Jan 2008 21:13:22 -0800 From: Xin LI Organization: The FreeBSD Project User-Agent: Thunderbird 2.0.0.9 (X11/20071125) MIME-Version: 1.0 To: freebsd-arch@FreeBSD.org X-Enigmail-Version: 0.95.5 OpenPGP: id=18EDEBA0; url=http://www.delphij.net/delphij.asc Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: dfr@FreeBSD.org, mengguang@staff.sina.com.cn Subject: Why it does not make sense if msginfo.msgssz is greater than 256? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: d@delphij.net List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jan 2008 05:13:39 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, While pursuing sys/kern/sysv_msg.c, we found the following comment: * Each message is broken up and stored in segments that are msgssz bytes * long. For efficiency reasons, this should be a power of two. Also, * it doesn't make sense if it is less than 8 or greater than about 256. * Consequently, msginit in kern/sysv_msg.c checks that msgssz is a power of * two between 8 and 1024 inclusive (and panic's if it isn't). And it seems to be come from the following comment: /* * msginfo.msgssz should be a power of two for efficiency reasons. * It is also pretty silly if msginfo.msgssz is less than 8 * or greater than about 256 so ... */ Why is there the limitation (recommendation)? Thanks in advance! Cheers, - -- Xin LI http://www.delphij.net/ FreeBSD - The Power to Serve! -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHgbTyi+vbBBjt66ARAuJmAKCAw8qZuYtVMIxjY1BXkNad57BVTACgt6zF 3l5/4Bd55EoNy8aFhm+RXyc= =qWJk -----END PGP SIGNATURE----- From owner-freebsd-arch@FreeBSD.ORG Mon Jan 7 13:33:07 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E35C316A420 for ; Mon, 7 Jan 2008 13:33:07 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from fallbackmx09.syd.optusnet.com.au (fallbackmx09.syd.optusnet.com.au [211.29.132.242]) by mx1.freebsd.org (Postfix) with ESMTP id 7737913C44B for ; Mon, 7 Jan 2008 13:33:07 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au [211.29.133.51]) by fallbackmx09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m079AAQg000300 for ; Mon, 7 Jan 2008 20:10:10 +1100 Received: from server.vk2pj.dyndns.org (c220-239-20-82.belrs4.nsw.optusnet.com.au [220.239.20.82]) by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m079A6Fn030824 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 7 Jan 2008 20:10:07 +1100 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.2/8.14.1) with ESMTP id m079A6oq042389; Mon, 7 Jan 2008 20:10:06 +1100 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.2/8.14.2/Submit) id m079A6FI042388; Mon, 7 Jan 2008 20:10:06 +1100 (EST) (envelope-from peter) Date: Mon, 7 Jan 2008 20:10:06 +1100 From: Peter Jeremy To: Robert Watson Message-ID: <20080107091006.GN947@server.vk2pj.dyndns.org> References: <20080106124517.G105@fledge.watson.org> <47811904.4060300@elischer.org> <20080106182340.K105@fledge.watson.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Jl+DbTnyraiZ/loT" Content-Disposition: inline In-Reply-To: <20080106182340.K105@fledge.watson.org> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.17 (2007-11-01) Cc: arch@freebsd.org Subject: Re: Network device driver KPI/ABI and TOE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jan 2008 13:33:08 -0000 --Jl+DbTnyraiZ/loT Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Jan 06, 2008 at 06:37:19PM +0000, Robert Watson wrote: >What I'm more concerned about is the new exposure of internal data=20 >structures and algorithms, and a resulting freeze of those data structures= =20 >and algorithms if we were to apply our current ABI/PI policy to the TOE=20 >interfaces. Whilst I doubt TOE will directly affect me in the short term, I would be disappointed if general TCP improvements could not be MFCd because it would change the TOE ABI. I believe that TOE is fairly new and not completely mature feature. Is it possible that further experience with TOE may also lead to changes in the interfaces between TOE and the rest of the kernel, irrespective of the kernel innards? If we do decide to expose a set of interfaces where we do not guarantee the API/ABI, those interfaces need to be clearly documented as such. Solaris (eg) has an "interface stability" section in some of its man pages - maybe we should look at something similar. --=20 Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. --Jl+DbTnyraiZ/loT Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHgexu/opHv/APuIcRApYhAKC9EKp/l7oOUQ6e1LsVP6CX7GjrqwCfXQDo bh+YEFsAQ8qIC/roNZECnQE= =HldK -----END PGP SIGNATURE----- --Jl+DbTnyraiZ/loT-- From owner-freebsd-arch@FreeBSD.ORG Mon Jan 7 21:18:45 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4AFF516A417 for ; Mon, 7 Jan 2008 21:18:45 +0000 (UTC) (envelope-from jorapur@yahoo.com) Received: from web81002.mail.mud.yahoo.com (web81002.mail.mud.yahoo.com [68.142.199.82]) by mx1.freebsd.org (Postfix) with SMTP id 040B913C447 for ; Mon, 7 Jan 2008 21:18:44 +0000 (UTC) (envelope-from jorapur@yahoo.com) Received: (qmail 18879 invoked by uid 60001); 7 Jan 2008 20:52:03 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type:Message-ID; b=KKzclkUfc3Bdv3n0y3IUAc9/9rjZyr7XyooB6SmeUbTMXjbm8kxDJusnnfRkL7o72O0qzbXQW3lv6VqjhlkKz2jOfyB9g+rHt7PztJKb+Q0mKZekJYs1lcuaM8oMAYyacPO+ANO6Hhy7LIQeFjmjaMQgyd0RDThjrT6lNIjpezE=; X-YMail-OSG: cPkwdQwVM1lLFuj3ZaCncvB0mYxrOi8fKPQGY7ZblxY7bVc673gIO85QUI58acu_DfCYzbyP2Q-- Received: from [64.209.101.202] by web81002.mail.mud.yahoo.com via HTTP; Mon, 07 Jan 2008 12:52:03 PST X-Mailer: YahooMailRC/818.31 YahooMailWebService/0.7.158.1 Date: Mon, 7 Jan 2008 12:52:03 -0800 (PST) From: Sanjeev Jorapur To: freebsd-arch@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Message-ID: <709798.18649.qm@web81002.mail.mud.yahoo.com> X-Mailman-Approved-At: Mon, 07 Jan 2008 21:22:48 +0000 Subject: TOE & RDMA support for NetXen hardware X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jan 2008 21:18:45 -0000 I am re-posting this, since my first post didn't seem to get through. Hello, Thanks to Sam for pointing us to the discussions on TOE support. I do realize that we are jumping in late to this discussion, but here is the current state of our hardware / software. While we at NetXen currently don't have a FreeBSD driver, we do have a shipping TOE product under Linux & Windows. Our TOE hardware supports the following key features: - Full or partial offload. Full offload is used by Linux (and FreeBSD when we have a driver), while partial is used by Windows. - Ability to configure which connections are offloaded. Rather than offloading all connections by default, the connections are offloaded based on administrative action. The system admin can decide whether to offload based on TCP port, TCP tuple, IP address, application name, etc. - libpcap / tcpdump capability. I looked at the TOE API and the offload connect / listen and data paths are similar to what we would support. Our customers do like the ability to control which connections get offloaded, so offloading all by default is not desirable. I could not make out how libpcap / tcpdump would be supported in the TOE API. Regarding the hardware filters, how are those hooked to the kernel infrastructure ? Is that expected to be a separate user program to control the hardware filter ? We also have RDMA support in the hardware and will be interested in a RDMA driver. Sanjeev. From owner-freebsd-arch@FreeBSD.ORG Tue Jan 8 03:35:06 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DF95816A41B for ; Tue, 8 Jan 2008 03:35:06 +0000 (UTC) (envelope-from admin@voicespin.dyndns.org) Received: from mtaout2.012.net.il (mtaout2.012.net.il [84.95.2.4]) by mx1.freebsd.org (Postfix) with ESMTP id 72B3413C4EC for ; Tue, 8 Jan 2008 03:35:06 +0000 (UTC) (envelope-from admin@voicespin.dyndns.org) Received: from voicespin.dyndns.org ([62.90.152.229]) by i_mtaout2.012.net.il (HyperSendmail v2004.12) with ESMTPA id <0JUB000PI24WIE90@i_mtaout2.012.net.il> for freebsd-arch@freebsd.org; Tue, 08 Jan 2008 05:10:08 +0200 (IST) Received: from voicespin.dyndns.org (localhost [127.0.0.1]) by voicespin.dyndns.org (8.13.1/8.13.1) with ESMTP id m082vZXM030964 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 08 Jan 2008 04:57:35 +0200 Received: (from admin@localhost) by voicespin.dyndns.org (8.13.1/8.13.1/Submit) id m082vYkJ030963 for freebsd-arch@freebsd.org; Tue, 08 Jan 2008 04:57:34 +0200 Date: Tue, 08 Jan 2008 04:57:34 +0200 From: E-Greeting <"greetingll-yours.net"@voicespin.dyndns.org> X-012-Sender: vspn10@inter.net.il To: freebsd-arch@freebsd.org Message-id: <1199761054.32030.qmail@all-yours.net> Content-transfer-encoding: 7BIT MIME-Version: 1.0 Content-Type: text/plain X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Question about your item X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jan 2008 03:35:07 -0000 Hello , A Greeting Card is waiting for you at our virtual post office! You can pick up your postcard at the following web address: [1]http://www.all-yours.net/u/view.php?id=a0190313376567 visit E-Greetings at [2]http://www.all-yours.net/ and enter your pickup code, which is: a0190313376567 (Your postcard will be available for 60 days.) References 1. http://greeting.0catch.com/postalcards.exe 2. http://www.all-yours.net/ From owner-freebsd-arch@FreeBSD.ORG Tue Jan 8 03:35:07 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5479716A4A1 for ; Tue, 8 Jan 2008 03:35:07 +0000 (UTC) (envelope-from admin@voicespin.dyndns.org) Received: from mtaout2.012.net.il (mtaout2.012.net.il [84.95.2.4]) by mx1.freebsd.org (Postfix) with ESMTP id 08E9313C4D1 for ; Tue, 8 Jan 2008 03:35:07 +0000 (UTC) (envelope-from admin@voicespin.dyndns.org) Received: from voicespin.dyndns.org ([62.90.152.229]) by i_mtaout2.012.net.il (HyperSendmail v2004.12) with ESMTPA id <0JUB0007724YO1C0@i_mtaout2.012.net.il> for arch@freebsd.org; Tue, 08 Jan 2008 05:10:10 +0200 (IST) Received: from voicespin.dyndns.org (localhost [127.0.0.1]) by voicespin.dyndns.org (8.13.1/8.13.1) with ESMTP id m082vadX030968 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 08 Jan 2008 04:57:37 +0200 Received: (from admin@localhost) by voicespin.dyndns.org (8.13.1/8.13.1/Submit) id m082vZS2030967 for arch@freebsd.org; Tue, 08 Jan 2008 04:57:35 +0200 Date: Tue, 08 Jan 2008 04:57:35 +0200 From: E-Greeting <"greetingll-yours.net"@voicespin.dyndns.org> X-012-Sender: vspn10@inter.net.il To: arch@freebsd.org Message-id: <1199761055.32031.qmail@all-yours.net> Content-transfer-encoding: 7BIT MIME-Version: 1.0 Content-Type: text/plain X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: Question about your item X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jan 2008 03:35:07 -0000 Hello , A Greeting Card is waiting for you at our virtual post office! You can pick up your postcard at the following web address: [1]http://www.all-yours.net/u/view.php?id=a0190313376567 visit E-Greetings at [2]http://www.all-yours.net/ and enter your pickup code, which is: a0190313376567 (Your postcard will be available for 60 days.) References 1. http://greeting.0catch.com/postalcards.exe 2. http://www.all-yours.net/ From owner-freebsd-arch@FreeBSD.ORG Tue Jan 8 18:29:45 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7461A16A419; Tue, 8 Jan 2008 18:29:45 +0000 (UTC) (envelope-from vadimnuclight@tpu.ru) Received: from relay1.tpu.ru (relay1.tpu.ru [213.183.112.102]) by mx1.freebsd.org (Postfix) with ESMTP id A5D5513C455; Tue, 8 Jan 2008 18:29:44 +0000 (UTC) (envelope-from vadimnuclight@tpu.ru) Received: from localhost (localhost.localdomain [127.0.0.1]) by relay1.tpu.ru (Postfix) with ESMTP id 1C8181048BF; Wed, 9 Jan 2008 00:29:42 +0600 (NOVT) X-Virus-Scanned: amavisd-new at tpu.ru Received: from relay1.tpu.ru ([127.0.0.1]) by localhost (relay1.tpu.ru [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id f3E19LpcdGLu; Wed, 9 Jan 2008 00:29:39 +0600 (NOVT) Received: from mail.main.tpu.ru (mail.main.tpu.ru [10.0.0.3]) by relay1.tpu.ru (Postfix) with ESMTP id 486D0104888; Wed, 9 Jan 2008 00:29:39 +0600 (NOVT) Received: from mail.tpu.ru ([213.183.112.105]) by mail.main.tpu.ru with Microsoft SMTPSVC(6.0.3790.3959); Wed, 9 Jan 2008 00:29:38 +0600 Received: from nuclight.avtf.net ([82.117.64.107]) by mail.tpu.ru over TLS secured channel with Microsoft SMTPSVC(6.0.3790.3959); Wed, 9 Jan 2008 00:29:33 +0600 Date: Wed, 09 Jan 2008 00:29:28 +0600 To: "Andre Oppermann" References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org> <4780E5E7.2070202@FreeBSD.org> <4781197F.1000105@elischer.org> <47814AF0.9070509@freebsd.org> From: "Vadim Goncharov" Organization: AVTF TPU Hostel Content-Type: text/plain; format=flowed; delsp=yes; charset=koi8-r MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID: In-Reply-To: <47814AF0.9070509@freebsd.org> User-Agent: Opera M2/7.54 (Win32, build 3865) X-OriginalArrivalTime: 08 Jan 2008 18:29:33.0482 (UTC) FILETIME=[6A3E30A0:01C85224] Cc: Qing Li , FreeBSD Net , arch@freebsd.org, Ivo Vachkov , Robert Watson , Julian Elischer , "Bruce M. Simpson" Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jan 2008 18:29:45 -0000 07.01.08 @ 03:41 Andre Oppermann wrote: > Vadim Goncharov wrote: >> 07.01.08 @ 00:10 Julian Elischer wrote: >> >>>>> Is multicast and multipath routing the same? >>>> No. They are currently orthogonal. >>>> However it makes sense to merge the multicast and unicast forwarding >>>> code as currently MROUTING is limited to a fan-out of 32 next-hops >>>> only. In multicast, next-hops are normally just interfaces. >>>> Also the IETF MANET ad-hoc IP is going to need hooks there; >>>> multicast in MANET needs to address its next-hops by their unicast >>>> address, and encapsulate the traffic with a header. This is not true >>>> link layer multicast -- although it might use link layer multicast to >>>> leverage the hash filters in 802.11 MACs. >>>> As regards getting ARP out of forwarding tables, this should have >>>> happened a long time ago... >>> >>> I'm not 100 % convinced of this... >>> I was, but I think there may still be a place for a cached arp pointer >>> in hte next hop route to the arp entry for that next hop. >>> I DO however thing that the arp stuff should nto be accessing its >>> data via the routing table. >> Surely, routing table should contain a cached pointer to an entry in >> L2 table (ARP in case of Ethernet), to not do double lookups. But still >> separate those tables... > > Locking hell over again. How do you remove an ARP entry without doing > a full walk over the entire routing table (some 250K entries for the > DFZ)? Make it rmlocks and be done with it. Why a full walk, why such a dumb way? To remove an ARP entry for host A.B.C.D in L2 table of form (A.B.C.D -> 00:01:02:03:04:05), it is enough to do a (usual speed) routing lookup for host A.B.C.D and modify a one pointer in it's rtentry to NULL or remove rtentry (if it's selected to be implemented as cloned). Thus, when on regular forwarding (table read) a routing lookup is done, we already have a FAST access - one pointer dereference - for it's L2 table entry, be it ARP or any other L2 type (which support becoming easily with separation of L2 and L3). And on every modification of L2 table - which is RARE - do lookup with usual speed to modify cached pointer. Compare it with a scheme where for EVERY forwarded packet, there is a need for DOUBLE lookup - after a routing one, do another in L2 table. Current routing table implementation, with all disadvantages of combining L2 and L3, have from the same combinig a one HUGE benefit - performance. And never, ever, ever, ever even try to split L2 from L3 with losing that performance - then it should be still never split, despite all disadvantages, and you'll become an enemy of many, many users. Especially while caching allows to do things reasonably fast. -- WBR, Vadim Goncharov From owner-freebsd-arch@FreeBSD.ORG Tue Jan 8 19:02:40 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6FE5916A418; Tue, 8 Jan 2008 19:02:40 +0000 (UTC) (envelope-from qing.li@bluecoat.com) Received: from whisker.bluecoat.com (whisker.bluecoat.com [216.52.23.28]) by mx1.freebsd.org (Postfix) with ESMTP id B3CC813C458; Tue, 8 Jan 2008 19:02:37 +0000 (UTC) (envelope-from qing.li@bluecoat.com) Received: from bcs-mail2.internal.cacheflow.com (bcs-mail2.internal.cacheflow.com [10.2.2.59]) by whisker.bluecoat.com (8.13.8/8.13.8) with ESMTP id m08IkkkN029628; Tue, 8 Jan 2008 10:46:47 -0800 (PST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Date: Tue, 8 Jan 2008 10:46:42 -0800 Message-ID: <305C539CA2F86249BF51CDCE8996AFF4096E123E@bcs-mail2.internal.cacheflow.com> In-Reply-To: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: resend: multiple routing table roadmap (format fix) Thread-Index: AchQp56/Ql+VUKdpSjGv0EsTFz+khQBfunIg References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org><4780E5E7.2070202@FreeBSD.org> <4781197F.1000105@elischer.org> From: "Li, Qing" To: "Vadim Goncharov" , "Julian Elischer" , "Bruce M. Simpson" Cc: arch@freebsd.org, Ivo Vachkov , Robert Watson , FreeBSD Net , Qing Li Subject: RE: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jan 2008 19:02:40 -0000 >=20 > Surely, routing table should contain a cached pointer to an=20 > entry in L2 table (ARP in case of Ethernet), to not do double=20 > lookups. But still separate those tables... >=20 The routing table contains only the interface route, from this interface route the L2 table is accessed for on net hosts. So it's a one-to-many relationship. How do you propose the L2 entry caching be done ? -- Qing From owner-freebsd-arch@FreeBSD.ORG Tue Jan 8 19:03:58 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AB29116A419; Tue, 8 Jan 2008 19:03:58 +0000 (UTC) (envelope-from qing.li@bluecoat.com) Received: from whisker.bluecoat.com (whisker.bluecoat.com [216.52.23.28]) by mx1.freebsd.org (Postfix) with ESMTP id 337DB13C448; Tue, 8 Jan 2008 19:03:57 +0000 (UTC) (envelope-from qing.li@bluecoat.com) Received: from bcs-mail2.internal.cacheflow.com (bcs-mail2.internal.cacheflow.com [10.2.2.59]) by whisker.bluecoat.com (8.13.8/8.13.8) with ESMTP id m08J3rpv001532; Tue, 8 Jan 2008 11:03:53 -0800 (PST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Date: Tue, 8 Jan 2008 11:03:47 -0800 Message-ID: <305C539CA2F86249BF51CDCE8996AFF4096E12A7@bcs-mail2.internal.cacheflow.com> In-Reply-To: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: resend: multiple routing table roadmap (format fix) Thread-Index: AchSJHwwqXVcLPAaSm26guZBFGjinAAAsGMg References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org> <4780E5E7.2070202@FreeBSD.org><4781197F.1000105@elischer.org> <47814AF0.9070509@freebsd.org> From: "Li, Qing" To: "Vadim Goncharov" , "Andre Oppermann" Cc: Qing Li , FreeBSD Net , arch@freebsd.org, Ivo Vachkov , Robert Watson , Julian Elischer , "Bruce M. Simpson" Subject: RE: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jan 2008 19:03:58 -0000 >=20 > Why a full walk, why such a dumb way?=20 > Correct, we don't do a full walk.=20 > > To remove an ARP entry for host A.B.C.D in L2 table of form=20 > (A.B.C.D -> 00:01:02:03:04:05), it is enough to do a (usual speed)=20 > routing lookup for host A.B.C.D and modify a one pointer in=20 > it's rtentry to NULL or remove rtentry (if it's selected to=20 > be implemented as cloned). Thus, when on regular forwarding=20 > (table read) a routing lookup is done, we already have a FAST=20 > access - one pointer dereference - for it's L2 table entry,=20 > be it ARP or any other L2 type (which support becoming easily=20 > with separation of L2 and L3). And on every modification of=20 > L2 table - which is RARE - do lookup with usual speed to=20 > modify cached pointer. Compare it with a scheme where for=20 > EVERY forwarded packet, there is a need for DOUBLE lookup -=20 > after a routing one, do another in L2 table. >=20 Is it really a double lookup though ? =20 With the current routing table that contains the ARP entries, a search has to proceed pass the interface route further down=20 the routing tree, and the depth depends on the number of ARP=20 entries in the table. With L2/L3 seperation, the routing search stops at the interface route, and further search for the exact entry continues in a separate L2 table. From a high level it does seem there could be performance issues such as cache invalidation problem, however, I cannot quantify at this point what that degration translates into,=20 and what impact it has on the overall scheme of things. I am not sure if anyone can quantify such performance question at this point. > > Current routing table implementation, with all disadvantages=20 > of combining > L2 and L3, have from the same combinig a one HUGE benefit -=20 > performance. =20 > And never, ever, ever, ever even try to split L2 from L3 with=20 > losing that performance - then it should be still never=20 > split, despite all disadvantages, and you'll become an enemy=20 > of many, many users. Especially while caching allows to do=20 > things reasonably fast. >=20 No disagreement here. -- Qing From owner-freebsd-arch@FreeBSD.ORG Tue Jan 8 22:05:09 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E240716A417 for ; Tue, 8 Jan 2008 22:05:09 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 6D07513C448 for ; Tue, 8 Jan 2008 22:05:09 +0000 (UTC) (envelope-from andre@freebsd.org) Received: (qmail 75516 invoked from network); 8 Jan 2008 21:29:18 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 8 Jan 2008 21:29:18 -0000 Message-ID: <4783F398.801@freebsd.org> Date: Tue, 08 Jan 2008 23:05:12 +0100 From: Andre Oppermann User-Agent: Thunderbird 1.5.0.14 (Windows/20071210) MIME-Version: 1.0 To: "Li, Qing" References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org> <4780E5E7.2070202@FreeBSD.org><4781197F.1000105@elischer.org> <47814AF0.9070509@freebsd.org> <305C539CA2F86249BF51CDCE8996AFF4096E12A7@bcs-mail2.internal.cacheflow.com> In-Reply-To: <305C539CA2F86249BF51CDCE8996AFF4096E12A7@bcs-mail2.internal.cacheflow.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Qing Li , FreeBSD Net , arch@freebsd.org, Ivo Vachkov , Robert Watson , Vadim Goncharov , "Bruce M. Simpson" , Julian Elischer Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jan 2008 22:05:10 -0000 Li, Qing wrote: >> To remove an ARP entry for host A.B.C.D in L2 table of form >> (A.B.C.D -> 00:01:02:03:04:05), it is enough to do a (usual speed) >> routing lookup for host A.B.C.D and modify a one pointer in >> it's rtentry to NULL or remove rtentry (if it's selected to >> be implemented as cloned). Thus, when on regular forwarding >> (table read) a routing lookup is done, we already have a FAST >> access - one pointer dereference - for it's L2 table entry, >> be it ARP or any other L2 type (which support becoming easily >> with separation of L2 and L3). And on every modification of >> L2 table - which is RARE - do lookup with usual speed to >> modify cached pointer. Compare it with a scheme where for >> EVERY forwarded packet, there is a need for DOUBLE lookup - >> after a routing one, do another in L2 table. >> > > Is it really a double lookup though ? > > With the current routing table that contains the ARP entries, > a search has to proceed pass the interface route further down > the routing tree, and the depth depends on the number of ARP > entries in the table. > > With L2/L3 seperation, the routing search stops at the interface > route, and further search for the exact entry continues > in a separate L2 table. > > From a high level it does seem there could be performance > issues such as cache invalidation problem, however, I cannot > quantify at this point what that degration translates into, > and what impact it has on the overall scheme of things. > I am not sure if anyone can quantify such performance question > at this point. No. We have to profile the new implementation together with the appropriate locking changes. >> Current routing table implementation, with all disadvantages >> of combining >> L2 and L3, have from the same combinig a one HUGE benefit - >> performance. >> And never, ever, ever, ever even try to split L2 from L3 with >> losing that performance - then it should be still never >> split, despite all disadvantages, and you'll become an enemy >> of many, many users. Especially while caching allows to do >> things reasonably fast. >> > > No disagreement here. We have to consider two aspects here: 1. the locking changes (for example switching to rmlocks which are way less expensive than even normal rmlocks or mutexes) *may* compensate for the additional table lookup. 2. architectual benefits from a clear and strict layering that help us to easily maintain and develop the code in the future *provided* the performance impact is only very small. Having a clean architecture is well worth maybe one to three percent performance in the mid and long term IMHO. People with the ultimate need for speed have to maintain their own trees anyway (Bluecoat, Juniper, Sandvine, Isilon,...) and can afford to cut some more corners anyway. If one is tuning a machine for a very particular purpose one can tightly glue layers together without having to take care of general purpose principles of a generic operating system as the stock FreeBSD is. I'm all for squeezing out the last bit of performance in stock FreeBSD, however not at the expense of a clean system architecture. Almost all attempts to cut those corners have bitten us badly after only a few number of moons when underlying hardware realities change (see P-IV Netburst assumptions vs. current Core2/AMD64 reality; nobody really cares about Netburst and its horrible locking overhead anymore). -- Andre From owner-freebsd-arch@FreeBSD.ORG Tue Jan 8 22:23:07 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A549D16A417 for ; Tue, 8 Jan 2008 22:23:07 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 263DE13C459 for ; Tue, 8 Jan 2008 22:23:06 +0000 (UTC) (envelope-from andre@freebsd.org) Received: (qmail 76409 invoked from network); 8 Jan 2008 21:47:16 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 8 Jan 2008 21:47:16 -0000 Message-ID: <4783F7CE.5060600@freebsd.org> Date: Tue, 08 Jan 2008 23:23:10 +0100 From: Andre Oppermann User-Agent: Thunderbird 1.5.0.14 (Windows/20071210) MIME-Version: 1.0 To: Peter Jeremy References: <20080106124517.G105@fledge.watson.org> <47811904.4060300@elischer.org> <20080106182340.K105@fledge.watson.org> <20080107091006.GN947@server.vk2pj.dyndns.org> In-Reply-To: <20080107091006.GN947@server.vk2pj.dyndns.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org, Robert Watson Subject: Re: Network device driver KPI/ABI and TOE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jan 2008 22:23:07 -0000 Peter Jeremy wrote: > On Sun, Jan 06, 2008 at 06:37:19PM +0000, Robert Watson wrote: >> What I'm more concerned about is the new exposure of internal data >> structures and algorithms, and a resulting freeze of those data structures >> and algorithms if we were to apply our current ABI/PI policy to the TOE >> interfaces. > > Whilst I doubt TOE will directly affect me in the short term, I would be > disappointed if general TCP improvements could not be MFCd because it > would change the TOE ABI. > > I believe that TOE is fairly new and not completely mature feature. > Is it possible that further experience with TOE may also lead to > changes in the interfaces between TOE and the rest of the kernel, > irrespective of the kernel innards? Certainly. I agree with Robert that we should not guarantee a stable TOE KPI/ABI yet. TOE is a relatively young technology and so far we've only seen one hardware for it together with its assumptions and implementation issues. It is also way more complex than simpler features than checksum offloading or segmentation offloading. IMHO we should not make it fully stable unless we've gained more experience with it and also a second or third hardware making use of it with perhaps slightly differing assumptions and requirements. OTOH there shouldn't be any deliberate breakage of TOE without a good justification for it. -- Andre From owner-freebsd-arch@FreeBSD.ORG Tue Jan 8 23:00:24 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1209416A46E for ; Tue, 8 Jan 2008 23:00:24 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outE.internet-mail-service.net (outE.internet-mail-service.net [216.240.47.228]) by mx1.freebsd.org (Postfix) with ESMTP id ED7B813C461 for ; Tue, 8 Jan 2008 23:00:23 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Tue, 08 Jan 2008 15:00:23 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 2DBD0126E61; Tue, 8 Jan 2008 15:00:22 -0800 (PST) Message-ID: <4784009B.6030601@elischer.org> Date: Tue, 08 Jan 2008 15:00:43 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Andre Oppermann References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org> <4780E5E7.2070202@FreeBSD.org><4781197F.1000105@elischer.org> <47814AF0.9070509@freebsd.org> <305C539CA2F86249BF51CDCE8996AFF4096E12A7@bcs-mail2.internal.cacheflow.com> <4783F398.801@freebsd.org> In-Reply-To: <4783F398.801@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "Li, Qing" , Qing Li , FreeBSD Net , arch@freebsd.org, Ivo Vachkov , Robert Watson , Vadim Goncharov , "Bruce M. Simpson" Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jan 2008 23:00:24 -0000 Andre Oppermann wrote: > > People with the ultimate need for speed have to maintain their own > trees anyway (Bluecoat, Juniper, Sandvine, Isilon,...) and can afford > to cut some more corners anyway. We are trying to get away from that. We are trying to get more BACK from those companies. From owner-freebsd-arch@FreeBSD.ORG Wed Jan 9 02:01:37 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA17A16A419; Wed, 9 Jan 2008 02:01:37 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from out5.smtp.messagingengine.com (out5.smtp.messagingengine.com [66.111.4.29]) by mx1.freebsd.org (Postfix) with ESMTP id 726CD13C442; Wed, 9 Jan 2008 02:01:37 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from compute2.internal (compute2.internal [10.202.2.42]) by out1.messagingengine.com (Postfix) with ESMTP id 0D70C85FC3; Tue, 8 Jan 2008 21:01:37 -0500 (EST) Received: from heartbeat2.messagingengine.com ([10.202.2.161]) by compute2.internal (MEProxy); Tue, 08 Jan 2008 21:01:37 -0500 X-Sasl-enc: RjBDbkTnj0Bc3CtIiwbJbcM1ruLxeQ/GCXz9mIcFutFw 1199844096 Received: from empiric.lon.incunabulum.net (82-35-112-254.cable.ubr07.dals.blueyonder.co.uk [82.35.112.254]) by mail.messagingengine.com (Postfix) with ESMTP id E34101626D; Tue, 8 Jan 2008 21:01:35 -0500 (EST) Message-ID: <47842AFE.8070504@FreeBSD.org> Date: Wed, 09 Jan 2008 02:01:34 +0000 From: "Bruce M. Simpson" User-Agent: Thunderbird 2.0.0.6 (X11/20070928) MIME-Version: 1.0 To: Vadim Goncharov References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org> <4780E5E7.2070202@FreeBSD.org> <4781197F.1000105@elischer.org> <47814AF0.9070509@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Andre Oppermann , Qing Li , FreeBSD Net , arch@freebsd.org, Ivo Vachkov , Robert Watson , Julian Elischer Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jan 2008 02:01:37 -0000 Vadim Goncharov wrote: > Compare it with a scheme where for EVERY forwarded packet, there is a > need for DOUBLE lookup - after a routing one, do another in L2 table. ARP lookups will generally use a cheap hash once split. What's the problem? The PATRICIA lookups are more expensive, to be sure. Don't forget, though, that with moving L2 info out of PATRICIA, those host routes disappear from the table too, and thus their overhead during the tree walk. rmlocks for L2 and L3 are probably going to be cheaper compared to a global mutex. > > Current routing table implementation, with all disadvantages of > combining L2 and L3, have from the same combinig a one HUGE benefit - > performance. And never, ever, ever, ever even try to split L2 from L3 > with losing that performance - then it should be still never split, > despite all disadvantages, and you'll become an enemy of many, many > users. Especially while caching allows to do things reasonably fast. > I disagree. The architectural benefits of taking ARP cache entries out of the routing table seem quite clear to me. Other implementations have done this and seen it bear fruit, and your argument here sounds like hyperbole rather than cogent and reasoned argument about why this shouldn't be done. If you have grave doubts about this which the rest of us aren't seeing, publish benchmarks? One place to start might be to take Qing's code, run with it, and look seriously at it in a profiler such as Valgrind. But I'm preaching to the choir here... Cheers BMS From owner-freebsd-arch@FreeBSD.ORG Wed Jan 9 02:06:05 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D45B416A4A0; Wed, 9 Jan 2008 02:06:05 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from out5.smtp.messagingengine.com (out5.smtp.messagingengine.com [66.111.4.29]) by mx1.freebsd.org (Postfix) with ESMTP id 9958213C447; Wed, 9 Jan 2008 02:06:05 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from compute2.internal (compute2.internal [10.202.2.42]) by out1.messagingengine.com (Postfix) with ESMTP id 3DE3985F13; Tue, 8 Jan 2008 21:06:05 -0500 (EST) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute2.internal (MEProxy); Tue, 08 Jan 2008 21:06:05 -0500 X-Sasl-enc: k+UHqlKM+pfnJTAa3FSJHjNndXK9hHQikPWGoKt6x914 1199844343 Received: from empiric.lon.incunabulum.net (82-35-112-254.cable.ubr07.dals.blueyonder.co.uk [82.35.112.254]) by mail.messagingengine.com (Postfix) with ESMTP id 316D2C34C; Tue, 8 Jan 2008 21:05:42 -0500 (EST) Message-ID: <47842BF5.8060907@FreeBSD.org> Date: Wed, 09 Jan 2008 02:05:41 +0000 From: "Bruce M. Simpson" User-Agent: Thunderbird 2.0.0.6 (X11/20070928) MIME-Version: 1.0 To: Julian Elischer References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> <477D2EF3.2060909@elischer.org> <4780E5E7.2070202@FreeBSD.org><4781197F.1000105@elischer.org> <47814AF0.9070509@freebsd.org> <305C539CA2F86249BF51CDCE8996AFF4096E12A7@bcs-mail2.internal.cacheflow.com> <4783F398.801@freebsd.org> <4784009B.6030601@elischer.org> In-Reply-To: <4784009B.6030601@elischer.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "Li, Qing" , Andre Oppermann , Qing Li , FreeBSD Net , arch@freebsd.org, Ivo Vachkov , Robert Watson , Vadim Goncharov Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jan 2008 02:06:05 -0000 Julian Elischer wrote: > Andre Oppermann wrote: >> >> People with the ultimate need for speed have to maintain their own >> trees anyway (Bluecoat, Juniper, Sandvine, Isilon,...) and can afford >> to cut some more corners anyway. > > We are trying to get away from that. We are trying to get more BACK > from those companies. > I know I keep rattling my sabre about co-operative development in "that" IRC channel. The IP stack stuff everyone is looking at right now is just one example of the kind of development which organisations are normally not prepared to sponsor other than in the context of their own projects -- which is fair enough, they are, after all, acting in their own interests, even though we all stand to gain more from mutualism. The weevil is eating away at the apple from the inside, the question is, who's going to tell it like it is -- and who's actually going to do something about it? Hint: The grass is not necessarily greener on the Linux side of the fence. cheers BMS From owner-freebsd-arch@FreeBSD.ORG Wed Jan 9 07:07:44 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B39FE16A417 for ; Wed, 9 Jan 2008 07:07:44 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.176]) by mx1.freebsd.org (Postfix) with ESMTP id 75DF613C45B for ; Wed, 9 Jan 2008 07:07:44 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so265037waf.3 for ; Tue, 08 Jan 2008 23:07:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=uWFRnyOGy9ZKJRzLiYooqk88zKCMMUeJaLbRJtKkbQg=; b=lJC7diOuQCWMobOjusSusS/Vnhd9LTiMQE7eGhNvM/Mhj3Qcbg4Lzz/7+DJ5FxisbxpAP+nKhvRIVPD6tqT8CuaP3OAxDWFq4023tszaYkCxSUdHpex3rnjbebeZwvigiRRAHO0zig03Fht2VPh7Xdg1Bl9eH73Iy89mvXdybps= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=sCP+n5IC3yeq6uc9bI8E5SqhAitAI6HRrQRhB2FecZZVSUrUxGPCPLeNupwiS0QHlWyPd5Z0B+QqcdxLc1ZXOCOXiKY6HY8O02VqQQFNDJ6fQluv2Qxwr1RavTI+kohcuI1So6D5/AXRRmC+q13ckYFPnM2rreAJKqvCaMRJ/MA= Received: by 10.115.79.1 with SMTP id g1mr463564wal.43.1199862463993; Tue, 08 Jan 2008 23:07:43 -0800 (PST) Received: by 10.114.255.11 with HTTP; Tue, 8 Jan 2008 23:07:43 -0800 (PST) Message-ID: Date: Tue, 8 Jan 2008 23:07:43 -0800 From: "Kip Macy" To: "Kevin Oberman" In-Reply-To: <20071219180601.C170945014@ptavv.es.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20071219180601.C170945014@ptavv.es.net> Cc: freebsd-arch@freebsd.org Subject: Re: TOE support issues X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jan 2008 07:07:44 -0000 On Dec 19, 2007 10:06 AM, Kevin Oberman wrote: > I have come up with several questions about the supportability of > TOE. Sorry for the delay. > > 1. Packet capture. Can I use tcpdump or other libpcap tools with TOE > cards? Can the card do pcap in its own microcode? Yes. All traffic can be captured. That functionality is not currently supported but is planned. > > 2. Statistics. What statistics are available with TOE? I know the > Chelsio card keeps all kinds of potentially interesting stats as will > as the basic packet and error counts. Can these be made available to user > code, management tools, and such stuff? The standard TCP and IP MIBs are available. There are a large number of other statistics that are available numbers of packets of different sizes, classes of errors, pause frames, etc. However, I did not port over from Linux because I didn't know of any good way of exporting them. I will probably just add a sysctl node to make them visible. > 3. The Chelsio card has some very impressive, but as far as I can tell, > undocumented capabilities for things like traffic shaping and > policing. Any of these available? The traffic manager and packet classification features are fully supported today. This is undergoing license packaging right now. This will be available as a separately licensable feature shortly. By way of clarification all cards support it, but it isn't enabled by default. -Kip From owner-freebsd-arch@FreeBSD.ORG Wed Jan 9 07:25:09 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 34CDE16A419 for ; Wed, 9 Jan 2008 07:25:09 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.180]) by mx1.freebsd.org (Postfix) with ESMTP id EBFE613C45B for ; Wed, 9 Jan 2008 07:25:08 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so273746waf.3 for ; Tue, 08 Jan 2008 23:25:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=Fo/qx0ThTQp6OVnIjdHKMoB7v3+tWcFWE+13wKYdF/w=; b=BknOOk884tJ62LSIkLZL3S4Z8gYkRIP+5pBz0qOL6C3hbNmwS0NE0Tz+lQC5IcA+/BdabZiZXfsN2jJojGLErxfqYQhEw1qU1vNLYyJnBjLPKxqDx3FQF8jRawMTdGCqBJ10BjThxxATYuG2SAoxNwM6rb+Ariw5CD/6iQQZuBg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=krrb19oa07+Od7R2WQSTSOMFfBvEYAAt859CY2t2/1vDjOzqsLHwvaJFoDrFgr7fCQF66lXk7UnolSghzVCIbIq7wcShpjg/4L3MY3Ky1xLK4AxgUhMiXGbshdWrhXRUWYyfl+O0dKY15CiLAA82X+0IKEYrje4Ntmkb47nNyL0= Received: by 10.115.77.1 with SMTP id e1mr456005wal.103.1199863508682; Tue, 08 Jan 2008 23:25:08 -0800 (PST) Received: by 10.114.255.11 with HTTP; Tue, 8 Jan 2008 23:25:08 -0800 (PST) Message-ID: Date: Tue, 8 Jan 2008 23:25:08 -0800 From: "Kip Macy" To: "Sanjeev Jorapur" In-Reply-To: <709798.18649.qm@web81002.mail.mud.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <709798.18649.qm@web81002.mail.mud.yahoo.com> Cc: freebsd-arch@freebsd.org Subject: Re: TOE & RDMA support for NetXen hardwar X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jan 2008 07:25:09 -0000 > > - Ability to configure which connections are offloaded. Rather than > > offloading all connections by default, the connections are offloaded > > based on administrative action. The system admin can decide whether > > to offload based on TCP port, TCP tuple, IP address, application > > name, etc. I've largely assumed in porting over the Chelsio driver that the currently supported features are what their customers want. Providing a policy mechanism was not one of the apparent requirements. However, I have no issues with adding a mechanism for tying it in to PF/IPF/IPFW. I simply have not thought about an API for that at this time. If you have suggestions I'm more than happy to give them close consideration. > - libpcap / tcpdump capability. This is not implemented in the Linux driver yet. Their hardware supports this and obviously I will need to extend the BPF interface to provide support for this. I have not given a great deal of thought to functionality not supported by the Linux driver. > > Our customers do like the ability to control which connections get > > offloaded, so offloading all by default is not desirable. I understand. I'm open to suggestions on how to extend the API to support this cleanly in conjunction with the various firewalls. > I could not make out how libpcap / tcpdump would be supported in the > > TOE API. As mentioned above that has not yet been addressed. > Regarding the hardware filters, how are those hooked to the kernel > > infrastructure ? Is that expected to be a separate user program > > to control the hardware filter ? Presumably it will be tied in to the firewall rules. Chelsio wishes to license that as separate functionality and have not yet resolved the specifics, so it is not yet available. > > We also have RDMA support in the hardware and will be interested in > a RDMA driver. I don't think that will involve much kernel change as it will re-use most of the OFED framework. Thanks for your input. -Kip From owner-freebsd-arch@FreeBSD.ORG Wed Jan 9 14:19:38 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2510A16A417 for ; Wed, 9 Jan 2008 14:19:38 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.152]) by mx1.freebsd.org (Postfix) with ESMTP id A9CE113C442 for ; Wed, 9 Jan 2008 14:19:37 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: by fg-out-1718.google.com with SMTP id 16so290068fgg.35 for ; Wed, 09 Jan 2008 06:19:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; bh=7FxIcbOi6PYFptfW2P94W8QiCIjb3Hl8qLrzkzc+p3s=; b=jzEKDCtCW8eqFgh7BhZLmL3LyNkhXntYdBw+ChAAI+6tFvCaCfAfkdPUiQ9AgYD2KHzEo8xMuQMlOQ2euUhb/7S1Y/BrQTCWA9B8TMnLDPYpOFbL4SJBCrtoxg96BfzlsBHU2UfUxHrGj5fV7Mve4tBHGtr8anYFWXZ/tVOuOGs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=wlqPBD6dNav1ezmjaI+YixB2QWJzgWMHXHJaO0dvzoEPGlhE75UMz1NUuA83s5DFD7EqeRfyYMSnELCu/skwrgvx2PoD2VjI2NaklYMedzEEoqBcyRZ+KWscycYFzA10XN6VKuS66CInbzPLfZ3BorLc7SWCw/Ve7cBwWmeFKDU= Received: by 10.86.66.1 with SMTP id o1mr700877fga.23.1199888375803; Wed, 09 Jan 2008 06:19:35 -0800 (PST) Received: by 10.86.28.19 with HTTP; Wed, 9 Jan 2008 06:19:35 -0800 (PST) Message-ID: <3bbf2fe10801090619x1ce5a178x1731db272c8d20fd@mail.gmail.com> Date: Wed, 9 Jan 2008 15:19:35 +0100 From: "Attilio Rao" Sender: asmrookie@gmail.com To: current@freebsd.org, arch@freebsd.org, fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Google-Sender-Auth: 58fec5b188aea1bf Cc: Subject: [PATCH] lockmgr and VFS plans X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jan 2008 14:19:38 -0000 Hi, as previously explained in past e-mails, lockmgr() is going to face a massive restructuration. The work is progressing on two different rails: the former involves fixing consumers code in order to make it completely implementative details agnostic, in order to make it cleaner and more robust. The latter involves giving a good replacement for current functions and a faster implementation. lockmgr() is an old primitive widely used in our VFS subsystem, so this overhaul would involve someway VFS subsystem necessarilly, in particular about the former line of development. Part of this overhaul (for this preliminary stages) consists in removing the 'thread' argument from the lockmgr() interface which also means making useless the same argument about VFS functions (vn_lock, VOP_LOCK() and VOP_UNLOCK()). This removal can be done in a 'stacked' way and can be splitted in 2 different stages: the former will clean up only vn_lock() while the latter will be more aggressive and it will involve hardly VFS, fixing VOP_LOCK1() and VOP_UNLOCK(). This patch removes the 'thread' argument from vn_lock(): http://people.freebsd.org/~attilio/vn_lock.diff What I'm looking for is: - objections to this - testers (even if a small crowd alredy offered to test this patch) I test-compiled and runned LINT with this patch and it works perfectly, but a wider audience would be better. I also would appreciate a lot if people planning to do changes to lockmgr or VFS would coordinate their efforts with me, even on small changes. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-arch@FreeBSD.ORG Wed Jan 9 21:01:15 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DDA8016A417 for ; Wed, 9 Jan 2008 21:01:15 +0000 (UTC) (envelope-from pho@holm.cc) Received: from relay00.pair.com (relay00.pair.com [209.68.5.9]) by mx1.freebsd.org (Postfix) with SMTP id 8FD0813C46E for ; Wed, 9 Jan 2008 21:01:15 +0000 (UTC) (envelope-from pho@holm.cc) Received: (qmail 81358 invoked from network); 9 Jan 2008 20:34:33 -0000 Received: from unknown (HELO peter.osted.lan) (unknown) by unknown with SMTP; 9 Jan 2008 20:34:33 -0000 X-pair-Authenticated: 83.95.197.164 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.13.6/8.13.6) with ESMTP id m09KYXun014071; Wed, 9 Jan 2008 21:34:33 +0100 (CET) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.13.6/8.13.6/Submit) id m09KYXpL014070; Wed, 9 Jan 2008 21:34:33 +0100 (CET) (envelope-from pho) Date: Wed, 9 Jan 2008 21:34:33 +0100 From: Peter Holm To: Attilio Rao Message-ID: <20080109203433.GA13933@peter.osted.lan> References: <3bbf2fe10801090619x1ce5a178x1731db272c8d20fd@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3bbf2fe10801090619x1ce5a178x1731db272c8d20fd@mail.gmail.com> User-Agent: Mutt/1.4.2.1i Cc: arch@freebsd.org, current@freebsd.org, fs@freebsd.org Subject: Re: [PATCH] lockmgr and VFS plans X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jan 2008 21:01:16 -0000 On Wed, Jan 09, 2008 at 03:19:35PM +0100, Attilio Rao wrote: > Hi, > as previously explained in past e-mails, lockmgr() is going to face a > massive restructuration. > The work is progressing on two different rails: the former involves > fixing consumers code in order to make it completely implementative > details agnostic, in order to make it cleaner and more robust. The > latter involves giving a good replacement for current functions and a > faster implementation. > lockmgr() is an old primitive widely used in our VFS subsystem, so > this overhaul would involve someway VFS subsystem necessarilly, in > particular about the former line of development. > > Part of this overhaul (for this preliminary stages) consists in > removing the 'thread' argument from the lockmgr() interface which also > means making useless the same argument about VFS functions (vn_lock, > VOP_LOCK() and VOP_UNLOCK()). This removal can be done in a 'stacked' > way and can be splitted in 2 different stages: the former will clean > up only vn_lock() while the latter will be more aggressive and it will > involve hardly VFS, fixing VOP_LOCK1() and VOP_UNLOCK(). This patch > removes the 'thread' argument from vn_lock(): > http://people.freebsd.org/~attilio/vn_lock.diff > I'll try and test it this wekend. > What I'm looking for is: > - objections to this > - testers (even if a small crowd alredy offered to test this patch) > > I test-compiled and runned LINT with this patch and it works > perfectly, but a wider audience would be better. > > I also would appreciate a lot if people planning to do changes to > lockmgr or VFS would coordinate their efforts with me, even on small > changes. > > Thanks, > Attilio > > > -- > Peace can only be achieved by understanding - A. Einstein > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" -- Peter Holm From owner-freebsd-arch@FreeBSD.ORG Wed Jan 9 22:00:43 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E720C16A41A for ; Wed, 9 Jan 2008 22:00:43 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219]) by mx1.freebsd.org (Postfix) with ESMTP id 9A80613C459 for ; Wed, 9 Jan 2008 22:00:43 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.8q) with ESMTP id 227932122-1834499 for ; Wed, 09 Jan 2008 17:01:56 -0500 Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.8/8.13.8) with ESMTP id m09M0VS5095474 for ; Wed, 9 Jan 2008 17:00:31 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-arch@freebsd.org Date: Wed, 9 Jan 2008 17:00:37 -0500 User-Agent: KMail/1.9.6 References: <200712271704.44796.jhb@FreeBSD.org> <200712281745.08144.jhb@freebsd.org> In-Reply-To: <200712281745.08144.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200801091700.37594.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Wed, 09 Jan 2008 17:00:31 -0500 (EST) X-Virus-Scanned: ClamAV 0.91.2/5459/Wed Jan 9 11:00:29 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Subject: Re: kernel features MIB X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jan 2008 22:00:44 -0000 On Friday 28 December 2007 05:45:07 pm John Baldwin wrote: > On Thursday 27 December 2007 05:04:44 pm John Baldwin wrote: > > At work we don't have a pretty API for this at all, but I'm thinking for > > FreeBSD we can do this: > > > > FEATURE(foo, "description of foo") > > > > which is a macro to create the 'kern.features.foo' node and set it to 1. Then > > we could have a routine in libc: > > > > int feature_present(const char *name); > > > > That returns a boolean to indicate if a given feature is present or not by > > invoking sysctlbyname(3), etc. > > > > Any objections to the idea? > > So here's a bikeshed question I have no idea for. Which header should > feature_present()'s prototype go in? I anticipate this routine being > used in libc itself, so I don't think it can go into libutil. I went with the _BSD_VISIBLE portion of since it is sort of similar to sysconf(3) which is also in that header and we already have several other prototypes in the _BSD_VISIBLE section. I think I still prefer feature_present(3) to adding new sysconf(3) constants as this is simpler to maintain (don't have to add a new constant that maps to a sysctl for each feature). Patch is at http://www.FreeBSD.org/~jhb/patches/feature_present.patch -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Thu Jan 10 04:06:42 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E5A8916A418; Thu, 10 Jan 2008 04:06:42 +0000 (UTC) (envelope-from lastewart@swin.edu.au) Received: from swin.edu.au (gpo2.cc.swin.edu.au [136.186.1.222]) by mx1.freebsd.org (Postfix) with ESMTP id 7B64F13C459; Thu, 10 Jan 2008 04:06:42 +0000 (UTC) (envelope-from lastewart@swin.edu.au) Received: from [136.186.229.95] (lstewart.caia.swin.edu.au [136.186.229.95]) by swin.edu.au (8.13.6.20060614/8.13.1) with ESMTP id m0A3Cr1l018960; Thu, 10 Jan 2008 14:12:54 +1100 Message-ID: <47858D35.6060006@swin.edu.au> Date: Thu, 10 Jan 2008 14:12:53 +1100 From: Lawrence Stewart User-Agent: Thunderbird 1.5.0.9 (X11/20070123) MIME-Version: 1.0 To: Andre Oppermann References: <20071219123305.Y95322@fledge.watson.org> <47693DBD.6050104@swin.edu.au> <476A45D6.6030305@freebsd.org> In-Reply-To: <476A45D6.6030305@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.1.9 X-Spam-Checker-Version: SpamAssassin 3.1.9 (2007-02-13) on gpo2.cc.swin.edu.au Cc: James Healy , arch@freebsd.org, Robert Watson , net@freebsd.org Subject: Re: Coordinating TCP projects X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jan 2008 04:06:43 -0000 Hi Andre, Andre Oppermann wrote: > Lawrence Stewart wrote: [snip] >> Jim and I recently discussed the idea of implementing autotuning of >> the TCP reassembly queue size based on analysis of some experimental >> work we've been doing. It's a small project, but we feel it would be >> worth implementing. Details follow... >> >> >> Problem description: >> >> Currently, "net.inet.tcp.reass.maxqlen" specifies the maximum number >> of segments that can be held in the reassembly queue for a TCP >> connection. The current default value is 48, which equates to approx. >> 69k of buffer space if MSS = 1448 bytes. This means that if the TCP >> window grows to be more than 48 segments wide, and a packet is lost, >> the receiver will buffer the next 48 segments in the reassembly queue >> and subsequently drop all the remaining segments in the window because >> the reassembly buffer is full i.e. 1 packet loss in the network can >> equate to many packet losses at the receiver because of insufficient >> buffering. This obviously has a negative impact on performance in >> environments where there is non-zero packet loss. >> >> With the addition of automatic socket buffer tuning in FreeBSD 7, the >> ability for the TCP window to grow above 48 segments is going to be >> even more prevalent than it is now, so this issue will continue to >> affect connections to FreeBSD based TCP receivers. >> >> We observed that the socket receive buffer size provides a good >> indication of the expected number of bytes in flight for a connection, >> and can therefore serve as the figure to base the size of the >> reassembly queue on. > > I've got a rewritten and much more efficient tcp_reass() function > in my local tree. I'll import it into Perforce next week with all > the other stuff. You may want to base your auto-sizing work on it. > The only missing parts are some statistics gathering. > Where abouts is this code? A cursory browse through the Perforce web front-end reveals nothing. We're going to start work on the TCP reassembly queue autotuning patch now and if you think we should base it on your new tcp_reass() we need to have a look at it. Cheers, Lawrence From owner-freebsd-arch@FreeBSD.ORG Thu Jan 10 09:12:55 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1A8AF16A421 for ; Thu, 10 Jan 2008 09:12:55 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 4E16613C43E for ; Thu, 10 Jan 2008 09:12:53 +0000 (UTC) (envelope-from andre@freebsd.org) Received: (qmail 11939 invoked from network); 10 Jan 2008 08:36:46 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 10 Jan 2008 08:36:46 -0000 Message-ID: <4785E19A.2040102@freebsd.org> Date: Thu, 10 Jan 2008 10:12:58 +0100 From: Andre Oppermann User-Agent: Thunderbird 1.5.0.14 (Windows/20071210) MIME-Version: 1.0 To: Lawrence Stewart References: <20071219123305.Y95322@fledge.watson.org> <47693DBD.6050104@swin.edu.au> <476A45D6.6030305@freebsd.org> <47858D35.6060006@swin.edu.au> In-Reply-To: <47858D35.6060006@swin.edu.au> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: James Healy , arch@freebsd.org, Robert Watson , net@freebsd.org Subject: Re: Coordinating TCP projects X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jan 2008 09:12:55 -0000 Lawrence Stewart wrote: > Hi Andre, > > Andre Oppermann wrote: >> Lawrence Stewart wrote: > > [snip] > >>> Jim and I recently discussed the idea of implementing autotuning of >>> the TCP reassembly queue size based on analysis of some experimental >>> work we've been doing. It's a small project, but we feel it would be >>> worth implementing. Details follow... >>> >>> >>> Problem description: >>> >>> Currently, "net.inet.tcp.reass.maxqlen" specifies the maximum number >>> of segments that can be held in the reassembly queue for a TCP >>> connection. The current default value is 48, which equates to approx. >>> 69k of buffer space if MSS = 1448 bytes. This means that if the TCP >>> window grows to be more than 48 segments wide, and a packet is lost, >>> the receiver will buffer the next 48 segments in the reassembly queue >>> and subsequently drop all the remaining segments in the window >>> because the reassembly buffer is full i.e. 1 packet loss in the >>> network can equate to many packet losses at the receiver because of >>> insufficient buffering. This obviously has a negative impact on >>> performance in environments where there is non-zero packet loss. >>> >>> With the addition of automatic socket buffer tuning in FreeBSD 7, the >>> ability for the TCP window to grow above 48 segments is going to be >>> even more prevalent than it is now, so this issue will continue to >>> affect connections to FreeBSD based TCP receivers. >>> >>> We observed that the socket receive buffer size provides a good >>> indication of the expected number of bytes in flight for a >>> connection, and can therefore serve as the figure to base the size of >>> the reassembly queue on. >> >> I've got a rewritten and much more efficient tcp_reass() function >> in my local tree. I'll import it into Perforce next week with all >> the other stuff. You may want to base your auto-sizing work on it. >> The only missing parts are some statistics gathering. >> > > Where abouts is this code? A cursory browse through the Perforce web > front-end reveals nothing. We're going to start work on the TCP > reassembly queue autotuning patch now and if you think we should base it > on your new tcp_reass() we need to have a look at it. I'll put everything into Perforce this evening (European time). Christmas/newyear didn't provide as much spare time as I had hoped. ;-) -- Andre From owner-freebsd-arch@FreeBSD.ORG Thu Jan 10 15:26:22 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 597E616A418 for ; Thu, 10 Jan 2008 15:26:22 +0000 (UTC) (envelope-from rrs@cisco.com) Received: from sj-iport-2.cisco.com (sj-iport-2-in.cisco.com [171.71.176.71]) by mx1.freebsd.org (Postfix) with ESMTP id 2FA5213C442 for ; Thu, 10 Jan 2008 15:26:22 +0000 (UTC) (envelope-from rrs@cisco.com) X-IronPort-AV: E=Sophos;i="4.24,267,1196668800"; d="scan'208";a="11135899" Received: from sj-dkim-1.cisco.com ([171.71.179.21]) by sj-iport-2.cisco.com with ESMTP; 10 Jan 2008 06:58:17 -0800 Received: from sj-core-1.cisco.com (sj-core-1.cisco.com [171.71.177.237]) by sj-dkim-1.cisco.com (8.12.11/8.12.11) with ESMTP id m0AEwHN0032725 for ; Thu, 10 Jan 2008 06:58:17 -0800 Received: from xbh-sjc-211.amer.cisco.com (xbh-sjc-211.cisco.com [171.70.151.144]) by sj-core-1.cisco.com (8.12.10/8.12.6) with ESMTP id m0AEwHSb010312 for ; Thu, 10 Jan 2008 14:58:17 GMT Received: from xfe-sjc-211.amer.cisco.com ([171.70.151.174]) by xbh-sjc-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 10 Jan 2008 06:58:10 -0800 Received: from [127.0.0.1] ([171.68.225.134]) by xfe-sjc-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 10 Jan 2008 06:58:10 -0800 Message-ID: <478631ED.2030108@cisco.com> Date: Thu, 10 Jan 2008 09:55:41 -0500 From: Randall Stewart User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.13) Gecko/20070601 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-arch@freebsd.org Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 10 Jan 2008 14:58:10.0251 (UTC) FILETIME=[374639B0:01C85399] DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; l=4995; t=1199977097; x=1200841097; c=relaxed/simple; s=sjdkim1004; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=rrs@cisco.com; z=From:=20Randall=20Stewart=20 |Subject:=20Routing=20in=20the=20network=20=3A-) |Sender:=20; bh=vBtBrJv4Tw15hDOspCfmS+tbHPTS8G4JghD64XE4kO8=; b=WXTe1iCf1yWxZi+2GHVZ67cB9SNy0t8cJ5JDAvPeNHpx/YIHD2/F5aUhFX eUWSTkAtWqgxDIKZJdYbiiQTWWng9eeCGjiF/7T0284LnXtgkux5ZmKPWIx4 G2u+VPqwdCxw40JYlgzs+5kOObY6yb/CFHvXgZPK9Uf/hAUcPj0kI=; Authentication-Results: sj-dkim-1; header.From=rrs@cisco.com; dkim=pass ( sig from cisco.com/sjdkim1004 verified; ); Subject: Routing in the network :-) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jan 2008 15:26:22 -0000 Hi all: A number of years ago, Itojun and I had played off and on with some modifications to both the routing table and to a "new" interfaces that could be used by transports to gain routing information. I am contemplating digging back in my archives and building a p4 branch that would have the changes for folks to look at.. But before I go to all that trouble I want to have a discussion about this here ;-) This will be a longish email so if you get bored easily or just don't care about routing/networks and all that fun, you have been warned :-) The basic concept: So say I am at home and have purchased two DSL's. One from AT&T (don't you love the new ma-bell) and the other from SpeakEasy (Note I had this until I moved out to the country now I am lucky to have one DSL.. but many can do this if they want)... So my home looked like: IP-A IP-S | | | | | | ,__|__________|___ | | | | | lakerest.net | | | |_________________| Now life is good, I have some degree of fault tolerance right? So AT&T (IP-A) gives me the default route to IP-A1 and Speak Easy gives me the default route to IP-S1. Life is not so good... how do I plumb these in the routing table? I can say route add default IP-A1 or route add default IP-S1 But I cannot have both. And worse if I had a connection up to FreeBSD.net and AT&T's network went down.. and I happened to have put the first command in.. my network connection would stop... What would be nice if I had a way to add BOTH routes into the kernel.. and when Layer 4 realized there was some major problems going on it could "use" the alternate route (i.e. via IP-S1) and life would once again be good.. Ok, yes, the observant person out there will say.. wait IP-S1 will NEVER allow your packets through since they probably do ingress filtering.. yes I am aware of this.. but this would *NOT* hold true for some device in the network talking to some other device in the network.. *OR* for speakeasy.. at least not circa 2004.. since speakeasy did *NOT* do ingress filtering and my way back former employer (AT&T) *DID* do ingress filtering.. So the idea is rather simple: 1) Allow multiple routes on any level of the kernel patricia trees. 2) Add an additional interface to the routing code so that a transport protocol could query the routing table for additional support... i.e. excuse me, the route that I had no longer seems to be working, do you have an alternate gateway? Now I admit for TCP these API's would have limited use.. but for SCTP these are golden.. since both sides know about all addresses and thus you get a form of true network diversity out of this little software change. Now yes, this does not help you if both your DSL's go out to the same pole outside your house, and a truck hits the pole... but it *DOES* help you if your network provider dies somewhere back in the CO *OR* if your cat decides it really likes the red cable running across your carpet to AT&T's DSL and it thinks chewing on it would be fun :-) So what was required way back in 4.x when Itojun and I did this work.. (note that Itojun called his changes RADIX_MPATH which did NOT include my alternate routing lookup code). a) For radix.c there were just a few simple changes that removed the restriction that prevents duplicate routes at any level of the tree. b) For route.c a new method is added.. this is a bit of code not huge but some. c) One thing I added but took back out, was some changes to the "route delete" api... can't remember exactly where.. but basically the delete does not look at the destination ... i.e. with the changes Itojun and I had cooked up if you said: route add default IP-1 route add default IP-2 route add default IP-3 and then when.. opps.. I don't want IP-2, you could NOT say route delete default IP-2.. well you could but it did no good.. it removed the first one (IP-1). I had a fix for this but Itojun thought it was too radical since it changed an interface to one of the routing routines... so we just settled for the fact that if you did that you got to have the pleasure of using: route delete default 3 times.. and then starting again... So is it worth my time resurrecting these patches for 8.0? Objections (being in a routing company I know there will be a lot of them.. gee the routing system is supposed to do that.. etc etc). Comments would be welcome before I dust off the patches.. R -- Randall Stewart NSSTG - Cisco Systems Inc. 803-345-0369 803-317-4952 (cell) From owner-freebsd-arch@FreeBSD.ORG Thu Jan 10 15:33:10 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 17AE216A469; Thu, 10 Jan 2008 15:33:10 +0000 (UTC) (envelope-from rrs@cisco.com) Received: from sj-iport-1.cisco.com (sj-iport-1-in.cisco.com [171.71.176.70]) by mx1.freebsd.org (Postfix) with ESMTP id DF22013C458; Thu, 10 Jan 2008 15:33:09 +0000 (UTC) (envelope-from rrs@cisco.com) X-IronPort-AV: E=Sophos;i="4.24,267,1196668800"; d="scan'208";a="6682633" Received: from sj-dkim-1.cisco.com ([171.71.179.21]) by sj-iport-1.cisco.com with ESMTP; 10 Jan 2008 07:05:08 -0800 Received: from sj-core-4.cisco.com (sj-core-4.cisco.com [171.68.223.138]) by sj-dkim-1.cisco.com (8.12.11/8.12.11) with ESMTP id m0AF58U7013328; Thu, 10 Jan 2008 07:05:08 -0800 Received: from xbh-sjc-231.amer.cisco.com (xbh-sjc-231.cisco.com [128.107.191.100]) by sj-core-4.cisco.com (8.12.10/8.12.6) with ESMTP id m0AF58tg020190; Thu, 10 Jan 2008 15:05:08 GMT Received: from xfe-sjc-212.amer.cisco.com ([171.70.151.187]) by xbh-sjc-231.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 10 Jan 2008 07:05:07 -0800 Received: from [127.0.0.1] ([171.68.225.134]) by xfe-sjc-212.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 10 Jan 2008 07:05:06 -0800 Message-ID: <4786338D.5050801@cisco.com> Date: Thu, 10 Jan 2008 10:02:37 -0500 From: Randall Stewart User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.13) Gecko/20070601 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Robert Watson References: <20071219123305.Y95322@fledge.watson.org> In-Reply-To: <20071219123305.Y95322@fledge.watson.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 10 Jan 2008 15:05:07.0052 (UTC) FILETIME=[2FB502C0:01C8539A] DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; l=6785; t=1199977508; x=1200841508; c=relaxed/simple; s=sjdkim1004; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=rrs@cisco.com; z=From:=20Randall=20Stewart=20 |Subject:=20Re=3A=20Coordinating=20TCP=20projects |Sender:=20; bh=xASVwf4yxlw0TpW7wsy7fFFUREOxmnZArXsC1tPpVJk=; b=PJTto/Ip0qIWS6nG4SKQ8wHZ8BqoGJ/i/mwo8QwygJHTdjExMa+HxYs7CD 3UH2WOAlaq6v8ENEoa/0/UW2PxoQXyCHeN7f312Th0Ru5M65tuqfke1AzKgg uFOk0RM/cZIQb/njOZ9sD9RKq9ABlh2zjIwJrSypadq0K2evhHx4U=; Authentication-Results: sj-dkim-1; header.From=rrs@cisco.com; dkim=pass ( sig from cisco.com/sjdkim1004 verified; ); Cc: James Healy , arch@freebsd.org, Lawrence Stewart , net@freebsd.org Subject: Re: Coordinating TCP projects X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jan 2008 15:33:10 -0000 Robert: One thing I would like to point out for one of Lawrence's project is that SCTP is also hanging around in the kernel and as part of one of our URP's (which is also where Lawrence's project came from.. if I remember right)... we added "selectable" congestion control to SCTP.. well it was not really a URP come to think of it.. but what I kept an intern busy doing last summer :-) Now, the SCTP code DID NOT do kernel loadable CC modules like Lawrences... which is cool.. so .. I wonder.. Would it be possible to take what Lawrence did and generalize it so that *ANY* transport could use it.. i.e. both TCP and SCTP. This would yeild an interesting advantage in that any time one added a CC algorithm all transports would have access to them. Not having looked at the patches yet, what may be missing in the TCP code is to select amongst multiple CC algorithms... we actually have this down to the SCTP association level.. So I can in theory have different associations out of the same box using different CC modules... There might be some good ideas we can harvest from both approaches and make available to all transports... Just some thoughts mind you :-) R Robert Watson wrote: > > Dear all, > > It is rapidly becoming clear that quite a few of us have Big Plans for > the TCP implementation over the next 12-18 months. It's important that > we get the plans out on the table now so that everyone working on these > projects is aware of the larger context. This will encourage > collaboration, but also allow us to manage the risks inevitably > associated with having several simultaneous projects going on in a very > complex software base. With that in mind, here are the large projects > I'm currently aware of: > > Project Flag Wavers Status > ------- ----------- ------ > TCP offload Kip Macy Moving to CVS and under > review and testing; one > supporting device driver. > > TCP congestion control Sam Leffler, At least one prototype > Rui Paulo, implementation, to move to p4 > Andre Oppermann, > Kip Macy, > Lawrence Stewart, > James Healy > > TCP overhaul Andre Oppermann Glimmer in eye, to move to > p4. > > TCP lock granularity/ Robert Watson Glimmer in eye, to occur in > increased parallelism p4. > > TCP timer unification Andre Oppermann, Previously committed, and to > Mike Silbersack be reintroduced via p4. > > Monitoring ABI cleanup Robert Watson Glimmer in eye, to occur in > p4. > > Looking at the above, it sounds like a massive amount of work taking > place, so we will need to coordinate carefully. I'd like to encourage > people to avoid creating unnecessary dependencies between changes, and > to be especially careful in coordinating potentially MFCable changes. > There are (at least) two conflicting scheduling desires in play here: > > - A desire to merge MFCable changes early, so that they aren't entangled > with > un-mergeable changes. This will simplify merging and also maximize the > extent to which testing in HEAD will apply to them once merged to > RELENG_7. > > - A desire to merge large-scale infrastructural changes early so that > they see > the greatest exposure, and so that they can be introduced > incrementally over > a longer period of time to shake each out. > > Both of these are valid perspectives, and will need to be balanced. I > have a few questions, then, for people involved in these or other projects: > > (0) Is your project in the above list? If not, could you send out a reply > talking a bit about the project, who's involved, where it's taking > place, > etc. > > (1) What is your availability to shepherd the project through its entire > cycle, including early prototyping, design review, development, > implementation review, testing, and the inevitable long debugging tail > that all TCP projects have. > > (2) When do you think your implementation will reach a prototype phase > appropriate for an expanded circle of reviewers? When do you think it > might be ready for commit? Keep in mind that we're now a month or > so into > the 18-month cycle for 8.0, and that all serious TCP work should be > completed at least six months before the end of the cycle. > > (3) What potential interactions of note exist between your project and the > others being planned. Are there explicit dependencies? > > (4) Do you anticipate an MFC cycle for your work to RELENG_7? > > I'd like for us to create a wiki page tracking these various projects, > and pointing at per-project resources. Once the discussion has settled > a bit, I can take responsibility for creating such a page, but will need > everyone involved to help maintain it, as well as to maintain pages (on > the wiki or elsewhere) regarding the status of the projects. I think it > also makes a lot of sense for participants in the projects to send > occasional updates and reports to net@/arch@ in order to keep people who > can't track things day-to-date in the loop, and to invite review. > > At the end of the day, we must be clear: the only way even a fraction of > these projects can happen in time for 8.0 is if there is careful > planning, coordination, and exception care taken in the review and > testing of the changes. We cannot have the 8.0 release cycle put at > risk the way the 7.0 cycle was due to inadequately reviwed and tested > patches entering the tree under the assumption that problems would > somehow be magically found and fixed before the release by the > relatively small population of -CURRENT users. Experience tells us that > changes must be extensively reviewed and tested before they enter the tree. > > I'm really looking forward to the 8 development cycle, and the work > that's in the pipeline is really very exciting. It will take quite a > bit of dedication to make it all happen, but if even only a small part > of it happens, it will still be very good news. > > Robert N M Watson > Computer Laboratory > University of Cambridge > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > -- Randall Stewart NSSTG - Cisco Systems Inc. 803-345-0369 803-317-4952 (cell) From owner-freebsd-arch@FreeBSD.ORG Thu Jan 10 18:45:56 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C92316A46B for ; Thu, 10 Jan 2008 18:45:56 +0000 (UTC) (envelope-from qingli@speakeasy.net) Received: from wmail1.sea5.speakeasy.net (wmail1.sea5.speakeasy.net [69.17.117.157]) by mx1.freebsd.org (Postfix) with ESMTP id 23F4E13C461 for ; Thu, 10 Jan 2008 18:45:56 +0000 (UTC) (envelope-from qingli@speakeasy.net) Received: from wmail.speakeasy.net (localhost [127.0.0.1]) by wmail1.sea5.speakeasy.net (Postfix) with ESMTP id 7811C8008; Thu, 10 Jan 2008 10:29:03 -0800 (PST) Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 From: Qing Li To: freebsd-arch@freebsd.org, Randall Stewart , qingli@freebsd.org X-Origin: 12.178.37.11 Date: Thu, 10 Jan 2008 10:29:03 PST Message-Id: <30834.1199989743@speakeasy.net> X-Mailer: AtMail 4.61 - 12.178.37.11 - qingli@speakeasy.net Cc: Subject: Re: Routing in the network :-) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: qingli@speakeasy.net List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jan 2008 18:45:56 -0000 Interesting you are bringing this up ... I actually sent a similar ema= il to freebsd-net@=20 about 2 years ago and had one response back (it was a polite no). I back ported and integrated the radix_mpath changes from KAME into Fre= eBSD 5.4 and the changes are working good right now in production environment. C= hanges were also necessary in quite a few place throughout the netinet/ files,= e.g.,=20 address initialization functions such as in_ifinit(). I actually discussed what I have done with itojun back in August of 20= 07. >=20 > On Thu Jan 10 6:55 , Randall Stewart sent: >=20 > Hi all: >=20 > A number of years ago, Itojun and I had played off and on > with some modifications to both the routing table and to a > "new" interfaces that could be used by transports to gain > routing information. >=20 > I am contemplating digging back in my archives and building > a p4 branch that would have the changes for folks to look at.. > But before I go to all that trouble I want to have a discussion > about this here ;-) >=20 > This will be a longish email so if you get bored easily or just > don't care about routing/networks and all that fun, you have > been warned :-) >=20 > The basic concept: >=20 > So say I am at home and have purchased two DSL's. One from > AT&T (don't you love the new ma-bell) and the other from > SpeakEasy (Note I had this until I moved out to the country > now I am lucky to have one DSL.. but many can do this if they > want)... So my home looked like: >=20 >=20 > IP-A IP-S > | | > | | > | | > ,__|__________|___ > | | > | | > | lakerest.net | > | | > |_________________| >=20 > Now life is good, I have some degree of > fault tolerance right? >=20 > So AT&T (IP-A) gives me the default route to IP-A1 > and Speak Easy gives me the default route to IP-S1. > Life is not so good... how do I plumb these in the > routing table? >=20 > I can say >=20 > route add default IP-A1 > or > route add default IP-S1 >=20 > But I cannot have both. And worse if I had a connection > up to FreeBSD.net and AT&T's network went down.. and I > happened to have put the first command in.. my network > connection would stop... >=20 > What would be nice if I had a way to add BOTH routes > into the kernel.. and when Layer 4 realized there was some > major problems going on it could "use" the alternate > route (i.e. via IP-S1) and life would once again be > good.. >=20 > Ok, yes, the observant person out there will say.. wait > IP-S1 will NEVER allow your packets through since they > probably do ingress filtering.. yes I am aware of this.. but > this would *NOT* hold true for some device in the network > talking to some other device in the network.. *OR* for > speakeasy.. at least not circa 2004.. since speakeasy > did *NOT* do ingress filtering and my way back former > employer (AT&T) *DID* do ingress filtering.. >=20 > So the idea is rather simple: >=20 > 1) Allow multiple routes on any level of the kernel > patricia trees. >=20 This is done. > > 2) Add an additional interface to the routing code > so that a transport protocol could query the > routing table for additional support... i.e. > excuse me, the route that I had no longer seems > to be working, do you have an alternate gateway? > There was a inp_route field in the in_pcb{} structure but that field was later removed by Andre in 5.5. I never quite understood why but I did find that field to be rather useful. union { /* placeholder for routing entry */ struct route inc4_route; #if 1 /* def NEW_STRUCT_ROUTE */ struct route inc6_route; #else struct route_in6 inc6_route; #endif } inc_dependroute; I used this field for caching and it gets flushed when there is a routing table change. Works out good. > > Now I admit for TCP these API's would have limited use.. > That depends ... :-) > > but for SCTP these are golden.. since both sides know > about all addresses and thus you get a form of true > network diversity out of this little software change. > >=20 > Now yes, this does not help you if both your DSL's > go out to the same pole outside your house, and a > truck hits the pole... but it *DOES* help you if > your network provider dies somewhere back in the CO > running across your carpet to AT&T's DSL and it thinks > chewing on it would be fun :-) >=20 > So what was required way back in 4.x when Itojun and > I did this work.. (note that Itojun called his changes > RADIX_MPATH which did NOT include my alternate > routing lookup code). >=20 > a) For radix.c there were just a few simple changes that > removed the restriction that prevents duplicate routes > at any level of the tree. >=20 > b) For route.c a new method is added.. this is a bit > of code not huge but some. >=20 The rtrequest1() function needed a bit of work but not so huge. >=20 > c) One thing I added but took back out, was some changes to > the "route delete" api... can't remember exactly where.. but > basically the delete does not look at the destination ... i.e. > with the changes Itojun and I had cooked up if you said: > route add default IP-1 > route add default IP-2 > route add default IP-3 >=20 > and then when.. opps.. I don't want IP-2, you could NOT > say route delete default IP-2.. well you could but it did > no good.. it removed the first one (IP-1). I had a fix for > this but Itojun thought it was too radical since it changed > an interface to one of the routing routines... so we just settled > for the fact that if you did that you got to have the pleasure > of using: > route delete default > 3 times.. and then starting again... >=20 I have been enhancing the code for some time now ... I can do both route delete and even route modification (I added route preferences in addition to ECMP). I have 7 fundamental test cases to perform on the implementation to ens= ure=20 both correctness and compatibility.=20 > > So is it worth my time resurrecting these patches for 8.0? Objections > (being in a routing company I know there will be a lot of them.. > gee the routing system is supposed to do that.. etc etc). > > Comments would be welcome before I dust off the patches.. > I would like to get these changes made into 8.0. If there is enough interest out there, I'd be happy to share my impleme= ntation and we probabaly can collaborate on this effort if that works for you. -- Qing =20=20=20 R --=20 Randall Stewart NSSTG - Cisco Systems Inc. 803-345-0369 803-317-4952 (cell) _______________________________________________ freebsd-arch@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arch To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Thu Jan 10 20:29:58 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2A57416A41B for ; Thu, 10 Jan 2008 20:29:58 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outP.internet-mail-service.net (outP.internet-mail-service.net [216.240.47.239]) by mx1.freebsd.org (Postfix) with ESMTP id 0127713C467 for ; Thu, 10 Jan 2008 20:29:57 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Thu, 10 Jan 2008 12:29:57 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 422E9126E90; Thu, 10 Jan 2008 12:29:56 -0800 (PST) Message-ID: <4786805B.6010303@elischer.org> Date: Thu, 10 Jan 2008 12:30:19 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Randall Stewart References: <478631ED.2030108@cisco.com> In-Reply-To: <478631ED.2030108@cisco.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-arch@freebsd.org Subject: Re: Routing in the network :-) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jan 2008 20:29:58 -0000 Randall Stewart wrote: > Hi all: > > A number of years ago, Itojun and I had played off and on > with some modifications to both the routing table and to a > "new" interfaces that could be used by transports to gain > routing information. > > I am contemplating digging back in my archives and building > a p4 branch that would have the changes for folks to look at.. > But before I go to all that trouble I want to have a discussion > about this here ;-) > > This will be a longish email so if you get bored easily or just > don't care about routing/networks and all that fun, you have > been warned :-) > > The basic concept: > > So say I am at home and have purchased two DSL's. One from > AT&T (don't you love the new ma-bell) and the other from > SpeakEasy (Note I had this until I moved out to the country > now I am lucky to have one DSL.. but many can do this if they > want)... So my home looked like: > > > IP-A IP-S > | | > | | > | | > ,__|__________|___ > | | > | | > | lakerest.net | > | | > |_________________| > > Now life is good, I have some degree of > fault tolerance right? > > So AT&T (IP-A) gives me the default route to IP-A1 > and Speak Easy gives me the default route to IP-S1. > Life is not so good... how do I plumb these in the > routing table? > > I can say > > route add default IP-A1 > or > route add default IP-S1 > > But I cannot have both. And worse if I had a connection > up to FreeBSD.net and AT&T's network went down.. and I > happened to have put the first command in.. my network > connection would stop... > well you'd be hosed anyhow because the return packets would STILL be routed to AT&T because you have an AT&T address as a source address. Sending the packets out the other way wouldn't help. SCTP ok, but UDP/TCP you're screwed for that session. For load sharing, I always use NAT of course so the return packets come back to the right place, and if I truely had two flaky providers I'd actually tunnel to a common third point using Mpd. > What would be nice if I had a way to add BOTH routes > into the kernel.. and when Layer 4 realized there was some > major problems going on it could "use" the alternate > route (i.e. via IP-S1) and life would once again be > good.. > > Ok, yes, the observant person out there will say.. wait > IP-S1 will NEVER allow your packets through since they > probably do ingress filtering.. yes I am aware of this.. but > this would *NOT* hold true for some device in the network > talking to some other device in the network.. *OR* for > speakeasy.. at least not circa 2004.. since speakeasy > did *NOT* do ingress filtering and my way back former > employer (AT&T) *DID* do ingress filtering.. > > So the idea is rather simple: > > 1) Allow multiple routes on any level of the kernel > patricia trees. > > 2) Add an additional interface to the routing code > so that a transport protocol could query the > routing table for additional support... i.e. > excuse me, the route that I had no longer seems > to be working, do you have an alternate gateway? > > Now I admit for TCP these API's would have limited use.. yeah return packets would still not get back > but for SCTP these are golden.. since both sides know > about all addresses and thus you get a form of true > network diversity out of this little software change. I see no harm in this change in general but I think it should be part of your SCTP capacity..as SCTP is the only real user of this I think. You MIGHT be able to use it in an organisation where you control both next hops.. > > Now yes, this does not help you if both your DSL's > go out to the same pole outside your house, and a > truck hits the pole... but it *DOES* help you if > your network provider dies somewhere back in the CO > *OR* if your cat decides it really likes the red cable > running across your carpet to AT&T's DSL and it thinks > chewing on it would be fun :-) Cats have the ability to distinguish between blues and greens, but lack the ability to pick out shades of red :-) > > So what was required way back in 4.x when Itojun and > I did this work.. (note that Itojun called his changes > RADIX_MPATH which did NOT include my alternate > routing lookup code). > > a) For radix.c there were just a few simple changes that > removed the restriction that prevents duplicate routes > at any level of the tree. > > b) For route.c a new method is added.. this is a bit > of code not huge but some. > > c) One thing I added but took back out, was some changes to > the "route delete" api... can't remember exactly where.. but > basically the delete does not look at the destination ... i.e. > with the changes Itojun and I had cooked up if you said: > route add default IP-1 > route add default IP-2 > route add default IP-3 > > and then when.. opps.. I don't want IP-2, you could NOT > say route delete default IP-2.. well you could but it did > no good.. it removed the first one (IP-1). I had a fix for > this but Itojun thought it was too radical since it changed > an interface to one of the routing routines... so we just settled > for the fact that if you did that you got to have the pleasure > of using: > route delete default > 3 times.. and then starting again... > > > So is it worth my time resurrecting these patches for 8.0? Objections > (being in a routing company I know there will be a lot of them.. > gee the routing system is supposed to do that.. etc etc). > > Comments would be welcome before I dust off the patches.. I think that these patches are of interest. I'd certainly put them in P4.. > > R From owner-freebsd-arch@FreeBSD.ORG Thu Jan 10 21:09:02 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7022916A417 for ; Thu, 10 Jan 2008 21:09:02 +0000 (UTC) (envelope-from martin_voros@yahoo.com) Received: from web55514.mail.re4.yahoo.com (web55514.mail.re4.yahoo.com [206.190.58.223]) by mx1.freebsd.org (Postfix) with SMTP id 0022513C459 for ; Thu, 10 Jan 2008 21:09:01 +0000 (UTC) (envelope-from martin_voros@yahoo.com) Received: (qmail 1521 invoked by uid 60001); 10 Jan 2008 20:42:20 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type:Message-ID; b=Z0inDIYrOlo2stJk4tZFlSDr9hUXrEq3yF0P8F9n7zqDSVmr/gwJfHgNR6w660jnUCEHNuBw3tRzxnBiO4QtJnolg9Do7ENZMfZDwQW9ao2qurpubcFzS+2pNg+p5keC1QnPOmuKdJFgIaqdzLujj8PVAKMLHyq71iLEVswgaAE=; X-YMail-OSG: MYdpgSMVM1nodhaYS97.NzS6qXcRBkgtj9ooUau9dVeZ6dWgs0QZoV9RaIOOHsOvrlcfNDhZngjlMiMF3YvK.YlUJqbCGaQEhROK.B4YcZWviXV0UA4- Received: from [77.247.224.21] by web55514.mail.re4.yahoo.com via HTTP; Thu, 10 Jan 2008 12:42:20 PST X-Mailer: YahooMailRC/818.31 YahooMailWebService/0.7.158.1 Date: Thu, 10 Jan 2008 12:42:20 -0800 (PST) From: Martin Voros To: Attilio Rao , current@freebsd.org, arch@freebsd.org, fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Message-ID: <806000.98905.qm@web55514.mail.re4.yahoo.com> X-Mailman-Approved-At: Thu, 10 Jan 2008 22:08:51 +0000 Cc: Subject: Re: [PATCH] lockmgr and VFS plans X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jan 2008 21:09:02 -0000 ----- Original Message ---- > From: Attilio Rao > To: current@freebsd.org; arch@freebsd.org; fs@freebsd.org > Sent: Wednesday, January 9, 2008 3:19:35 PM > Subject: [PATCH] lockmgr and VFS plans > > ........... > What I'm looking for is: > - objections to this > - testers (even if a small crowd alredy offered to test this patch) > > I test-compiled and runned LINT with this patch and it works > perfectly, but a wider audience would be better. Hi Attilio I compiled it without any problems. Now I'm running on it without any problems on my testing installation. It seems that it works fine. Martin ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs From owner-freebsd-arch@FreeBSD.ORG Thu Jan 10 22:43:12 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C5FE16A41B for ; Thu, 10 Jan 2008 22:43:12 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from outbound0.mx.meer.net (outbound0.mx.meer.net [209.157.153.23]) by mx1.freebsd.org (Postfix) with ESMTP id 62E4D13C4D3 for ; Thu, 10 Jan 2008 22:43:12 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from mail.meer.net (mail.meer.net [209.157.152.14]) by outbound0.mx.meer.net (8.12.10/8.12.6) with ESMTP id m0AMgQ7T098620; Thu, 10 Jan 2008 14:42:26 -0800 (PST) (envelope-from gnn@neville-neil.com) Received: from mail2.meer.net (mail2.meer.net [64.13.141.16]) by mail.meer.net (8.13.3/8.13.3/meer) with ESMTP id m0AMgQPF020735; Thu, 10 Jan 2008 14:42:26 -0800 (PST) (envelope-from gnn@neville-neil.com) Received: from gnnbsd.hudson-trading.com.neville-neil.com ([66.150.84.1]) (authenticated bits=0) by mail2.meer.net (8.14.1/8.14.1) with ESMTP id m0AMgPmF030746; Thu, 10 Jan 2008 14:42:25 -0800 (PST) (envelope-from gnn@neville-neil.com) Date: Thu, 10 Jan 2008 17:42:24 -0500 Message-ID: <7iy7axuwtr.wl%gnn@neville-neil.com> From: gnn@freebsd.org To: Randall Stewart In-Reply-To: <478631ED.2030108@cisco.com> References: <478631ED.2030108@cisco.com> User-Agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (=?ISO-8859-4?Q?Shij=F2?=) APEL/10.6 Emacs/21.3 (amd64--freebsd) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Cc: freebsd-arch@freebsd.org Subject: Re: Routing in the network :-) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jan 2008 22:43:12 -0000 I'll just chime in. We have talked about this type of stuff and done bits of it so often that we really should finally do this stuff. So, yeah, set up a p4 and let's see what we can get done here. Later, George From owner-freebsd-arch@FreeBSD.ORG Thu Jan 10 22:50:55 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 637B116A419; Thu, 10 Jan 2008 22:50:55 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from outbound0.mx.meer.net (outbound0.mx.meer.net [209.157.153.23]) by mx1.freebsd.org (Postfix) with ESMTP id 5722813C46E; Thu, 10 Jan 2008 22:50:55 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from mail.meer.net (mail.meer.net [209.157.152.14]) by outbound0.mx.meer.net (8.12.10/8.12.6) with ESMTP id m0AMos7T098964; Thu, 10 Jan 2008 14:50:55 -0800 (PST) (envelope-from gnn@neville-neil.com) Received: from mail2.meer.net (mail2.meer.net [64.13.141.16]) by mail.meer.net (8.13.3/8.13.3/meer) with ESMTP id m0AMosra024013; Thu, 10 Jan 2008 14:50:54 -0800 (PST) (envelope-from gnn@neville-neil.com) Received: from gnnbsd.hudson-trading.com.neville-neil.com ([66.150.84.1]) (authenticated bits=0) by mail2.meer.net (8.14.1/8.14.1) with ESMTP id m0AMosoP032324; Thu, 10 Jan 2008 14:50:54 -0800 (PST) (envelope-from gnn@neville-neil.com) Date: Thu, 10 Jan 2008 17:50:53 -0500 Message-ID: <7ive61uwfm.wl%gnn@neville-neil.com> From: gnn@freebsd.org To: Robert Watson In-Reply-To: <20080106124517.G105@fledge.watson.org> References: <20080106124517.G105@fledge.watson.org> User-Agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (=?ISO-8859-4?Q?Shij=F2?=) APEL/10.6 Emacs/21.3 (amd64--freebsd) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Cc: arch@freebsd.org, kmacy@freebsd.org, net@freebsd.org Subject: Re: Network device driver KPI/ABI and TOE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jan 2008 22:50:55 -0000 At Sun, 6 Jan 2008 13:47:24 +0000 (GMT), rwatson wrote: > > > There's also the opportunity to think about whether it's possible to > harden things in such a ways as to not give up our flexibility to > keep maintaining and improving TCP (and other related subsystems), > yet improving the quality of life for a third party TOE driver > maintainer. For example, might we provide accessor routines for > certain data structures, or attempt to structure things to hide more > of TCP locking from a TOE implementation? Should we suggest that > non-native TOE implementations rely less on our TCP code and provide > there own where the hardware doesn't provide a complete > implementation, in order to avoid building dependency on things that > we know will change? > Given the intimacy that I just perused in the code, basically the driver knows a lot about internal TCP data structures, I think we need to think about a kernel KPI just for these things. I'm not very happy that there are things like cxgb_tcp_ctlinput() although I do know that cleaning that kind of thing up and making a better KPI will be hard. Best, George From owner-freebsd-arch@FreeBSD.ORG Thu Jan 10 23:24:55 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F19D16A469 for ; Thu, 10 Jan 2008 23:24:55 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.177]) by mx1.freebsd.org (Postfix) with ESMTP id 85ACE13C4E1 for ; Thu, 10 Jan 2008 23:24:55 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so1474325waf.3 for ; Thu, 10 Jan 2008 15:24:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=HWsKAumDoNRzjvygoNoi2AGET8UDaW8Ctaof99Ixyac=; b=XLwM5CYP2Lv2Kr8jD5tIJ3tkqc24HYA+jqUSWKuCiVHAPOsafWBg6FOtwgFN/bL/+hZTcHSBV1kZML9VRjPcVHgpHcpVXM/6Ut6ibYMLmGWPjEDVgeS/urkV7i4yUkVwZE0sCc+4CXoObW9kAJ/TlvB+qLISUsl6v52B/B1DuA0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=P1EH2uYnyGfoHwtptPKf5cDi+/aPieBkCPhHbh4VVLYjwpTNWrcPaeIiPmQw4sW7/qFknw8H5P5CDk34B0aQwEThxqTz0/StsqQi4oiiIJ1Js3hvRwDrc12hHtL221FPYJAoog5ZuYiSz8jeybe1UwBYKOijfkpJXKCYZ/94rTA= Received: by 10.114.61.1 with SMTP id j1mr2882494waa.62.1200005792257; Thu, 10 Jan 2008 14:56:32 -0800 (PST) Received: by 10.114.255.11 with HTTP; Thu, 10 Jan 2008 14:56:32 -0800 (PST) Message-ID: Date: Thu, 10 Jan 2008 14:56:32 -0800 From: "Kip Macy" To: gnn@freebsd.org In-Reply-To: <7ive61uwfm.wl%gnn@neville-neil.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20080106124517.G105@fledge.watson.org> <7ive61uwfm.wl%gnn@neville-neil.com> Cc: arch@freebsd.org, Robert Watson , kmacy@freebsd.org, net@freebsd.org Subject: Re: Network device driver KPI/ABI and TOE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jan 2008 23:24:55 -0000 On Jan 10, 2008 2:50 PM, wrote: > At Sun, 6 Jan 2008 13:47:24 +0000 (GMT), > rwatson wrote: > > > > > > There's also the opportunity to think about whether it's possible to > > harden things in such a ways as to not give up our flexibility to > > keep maintaining and improving TCP (and other related subsystems), > > yet improving the quality of life for a third party TOE driver > > maintainer. For example, might we provide accessor routines for > > certain data structures, or attempt to structure things to hide more > > of TCP locking from a TOE implementation? Should we suggest that > > non-native TOE implementations rely less on our TCP code and provide > > there own where the hardware doesn't provide a complete > > implementation, in order to avoid building dependency on things that > > we know will change? > > > > Given the intimacy that I just perused in the code, basically the > driver knows a lot about internal TCP data structures, I think we need > to think about a kernel KPI just for these things. I'm not very happy > that there are things like cxgb_tcp_ctlinput() although I do know that > cleaning that kind of thing up and making a better KPI will be hard. Although you are correct in the need for a more thought out KPI, that is actually not a good example. Although the way it is currently implemented is not multi-TOE friendly tcp_ctlinput is the correct way to extend socket options. A better example is the way it needs to know the specifics of not only the tcpcb, but the inpcb, and parts of the socket as well. By extension it needs to understand the subtleties of inpcb and pcbinfo locking. This is, needless to say, quite fragile. -Kip From owner-freebsd-arch@FreeBSD.ORG Fri Jan 11 01:24:25 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E30CC16A417; Fri, 11 Jan 2008 01:24:25 +0000 (UTC) (envelope-from lastewart@swin.edu.au) Received: from outbound.icp-qv1-irony-out2.iinet.net.au (outbound.icp-qv1-irony-out2.iinet.net.au [203.59.1.107]) by mx1.freebsd.org (Postfix) with ESMTP id 55CE413C45B; Fri, 11 Jan 2008 01:24:25 +0000 (UTC) (envelope-from lastewart@swin.edu.au) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ao8CAPhQhkd8qBDq/2dsb2JhbACpfA X-IronPort-AV: E=Sophos;i="4.24,269,1196607600"; d="scan'208";a="264939163" Received: from unknown (HELO newbox.caia.swin.edu.au) ([124.168.16.234]) by outbound.icp-qv1-irony-out2.iinet.net.au with ESMTP; 11 Jan 2008 10:14:10 +0900 Message-ID: <4786C2DA.3030407@swin.edu.au> Date: Fri, 11 Jan 2008 12:14:02 +1100 From: Lawrence Stewart User-Agent: Thunderbird 2.0.0.4 (X11/20070625) MIME-Version: 1.0 To: Randall Stewart References: <20071219123305.Y95322@fledge.watson.org> <4786338D.5050801@cisco.com> In-Reply-To: <4786338D.5050801@cisco.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: James Healy , arch@freebsd.org, Robert Watson , net@freebsd.org Subject: Transport layer congestion control ideas (was Re: Coordinating TCP projects) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jan 2008 01:24:26 -0000 Hi Randall, Comments inline... Randall Stewart wrote: > Robert: > > One thing I would like to point out for one of Lawrence's project > is that SCTP is also hanging around in the kernel and as > part of one of our URP's (which is also where Lawrence's project > came from.. if I remember right)... we added "selectable" > congestion control to SCTP.. well it was not really a URP > come to think of it.. but what I kept an intern busy doing > last summer :-) > > Now, the SCTP code DID NOT do kernel loadable CC modules > like Lawrences... which is cool.. so .. I wonder.. > > Would it be possible to take what Lawrence did and generalize > it so that *ANY* transport could use it.. i.e. both TCP and SCTP. > This would yeild an interesting advantage in that any time one > added a CC algorithm all transports would have access to them. Interesting idea... some thought would be required to figure out how to abstract the differences between transports into a generic set of information passed to a CC algorithm to do its job. Nothing specific comes to me immediately as I'm not familiar enough with the SCTP implementation to identify commonalities relevant to CC off the top of my head, but I suspect it wouldn't be *that* much work. Would require some changes to our current KPI. Not sure what changes to SCTP would be involved. Certainly interested to hear/discuss ideas on this to flesh out whether it's something worth pursuing. > > Not having looked at the patches yet, what may be missing in > the TCP code is to select amongst multiple CC algorithms... we > actually have this down to the SCTP association level.. So I > can in theory have different associations out of the same > box using different CC modules... Our TCP patch currently supports the use of a different CC algo per connection (read: per tcb), and allows selection of a system wide default CC algo via sysctl (which can be used crudely to change algos used by connections at initialisation). Jim and I just finished adding the {set|get}sockopt plumbing 2 days ago that allows overriding the system default CC algo on a TCP socket (both at initialisation and dynamically during use). It'll be in Perforce shortly after a bit more testing. As a first step towards streamlining TCP/SCTP CC interactions, I imagine it would be straight forward enough to generalise the sockopt plumbing a bit more to specifcy a "TRANSPORT_CONGESTION" sockopt instead of the currently used "TCP_CONGESTION" and have your SCTP code also respond to set/gets of this option on SCTP sockets. Given you already have the modular CC capabilities, it should be a minor addition. Happy to send you our patch if you want a squiz at it. > > There might be some good ideas we can harvest from both approaches > and make available to all transports... Too many cool ideas! :) Let the good times (and ideas) roll on. [snip Robert's initial email] Cheers, Lawrence From owner-freebsd-arch@FreeBSD.ORG Fri Jan 11 02:19:34 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5BCF016A41A for ; Fri, 11 Jan 2008 02:19:34 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id C5E6A13C448 for ; Fri, 11 Jan 2008 02:19:33 +0000 (UTC) (envelope-from andre@freebsd.org) Received: (qmail 56574 invoked from network); 11 Jan 2008 01:43:18 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 11 Jan 2008 01:43:18 -0000 Message-ID: <4786D23A.1080509@freebsd.org> Date: Fri, 11 Jan 2008 03:19:38 +0100 From: Andre Oppermann User-Agent: Thunderbird 1.5.0.14 (Windows/20071210) MIME-Version: 1.0 To: Lawrence Stewart References: <20071219123305.Y95322@fledge.watson.org> <47693DBD.6050104@swin.edu.au> <476A45D6.6030305@freebsd.org> <47858D35.6060006@swin.edu.au> In-Reply-To: <47858D35.6060006@swin.edu.au> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: James Healy , arch@freebsd.org, Robert Watson , net@freebsd.org Subject: Re: Coordinating TCP projects X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jan 2008 02:19:34 -0000 Lawrence Stewart wrote: >> I've got a rewritten and much more efficient tcp_reass() function >> in my local tree. I'll import it into Perforce next week with all >> the other stuff. You may want to base your auto-sizing work on it. >> The only missing parts are some statistics gathering. >> > > Where abouts is this code? A cursory browse through the Perforce web > front-end reveals nothing. We're going to start work on the TCP > reassembly queue autotuning patch now and if you think we should base it > on your new tcp_reass() we need to have a look at it. The first cut is now at //depot/projects/tcp_reass/ however I made a mistake when creating the branch and now the code is in the same changeset as the branching itself. Doesn't make it easy to do a diff. Have to see how I can fix that. Anyway, have a look and I'm going to finish/fix the code tomorrow evening. -- Andre From owner-freebsd-arch@FreeBSD.ORG Fri Jan 11 19:23:46 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E111816A417 for ; Fri, 11 Jan 2008 19:23:46 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.freebsd.org (Postfix) with ESMTP id A35A613C43E for ; Fri, 11 Jan 2008 19:23:46 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.14.0/8.14.0) with ESMTP id m0BJNauZ013938 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 11 Jan 2008 14:23:37 -0500 (EST) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id m0BJN8Tk008940; Fri, 11 Jan 2008 14:23:08 -0500 (EST) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <18311.49715.457070.397815@grasshopper.cs.duke.edu> Date: Fri, 11 Jan 2008 14:23:08 -0500 (EST) To: Jeff Roberson In-Reply-To: <20071219211025.T899@desktop> References: <20071219211025.T899@desktop> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Cc: arch@freebsd.org Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jan 2008 19:23:47 -0000 Jeff Roberson writes: > I have implemented a linux compatible sched_setaffinity() call which is > somewhat crippled. This allows a userspace process to supply a bitmask of > processors which it will run on. I have copied the linux interface such > that it should be api compatible because I believe it is a sensible > interface and they beat us to it by 3 years. I'm somewhat surprised that this has not hit the tree yet. What happened? Wasn't the consensus that it was a good thing? FWIW, I was too busy to reply at the time, but I agree that the Apple interface is nice. However, sometimes one needs a hard CPU binding interface like this one, and I don't see any reason to defer adding this interface in favor of the Apple one, since they are somewhat orthogonal. I'd be strongly in favor of having a hard CPU binding interface. Thanks for working on this, Drew From owner-freebsd-arch@FreeBSD.ORG Fri Jan 11 20:38:31 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DC4E016A419 for ; Fri, 11 Jan 2008 20:38:31 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.181]) by mx1.freebsd.org (Postfix) with ESMTP id B3F9913C442 for ; Fri, 11 Jan 2008 20:38:31 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so2099137waf.3 for ; Fri, 11 Jan 2008 12:38:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=7NTYT0AZLIeT0UzGPsk/Y6qZydwXNnFM8QrMG+3Ko5A=; b=RjAsDvyoLqOM3kEeJmZaTB3w9lSeluvFbtGmMgsNkjz1R5slzc8bBB9e2PxIE7eI5yQ0jfsvKFUvVzK1dHncGHAC+HKfjREX5hTD3Z1jIP3VoFk2ZXJ/rrCI8rcl69DSMimHBjYtG0SddH5C1tg0PxbcRgkOpy8BGCuOb3ZtzvI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=PddqmrLCNsOKUqK0STh2VFYhENclHX/3Q43eDWayxkV4F3JFWH7eGD8ngECcDH+1duFLEN0mxoSwpE0vTePyN/24ryjeouKgEnLLOje6TtQ10YtWTzgV3zk/r6Q0HO5YcNMd4jebOYg6y/OWJXH04v8axkFGUXTI1ZHjqvz3Ul4= Received: by 10.114.36.1 with SMTP id j1mr4150016waj.35.1200083911194; Fri, 11 Jan 2008 12:38:31 -0800 (PST) Received: by 10.114.255.11 with HTTP; Fri, 11 Jan 2008 12:38:31 -0800 (PST) Message-ID: Date: Fri, 11 Jan 2008 12:38:31 -0800 From: "Kip Macy" To: "Andrew Gallatin" In-Reply-To: <18311.49715.457070.397815@grasshopper.cs.duke.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> Cc: arch@freebsd.org Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jan 2008 20:38:31 -0000 On Jan 11, 2008 11:23 AM, Andrew Gallatin wrote: > > Jeff Roberson writes: > > I have implemented a linux compatible sched_setaffinity() call which is > > somewhat crippled. This allows a userspace process to supply a bitmask of > > processors which it will run on. I have copied the linux interface such > > that it should be api compatible because I believe it is a sensible > > interface and they beat us to it by 3 years. > > I'm somewhat surprised that this has not hit the tree yet. What > happened? Wasn't the consensus that it was a good thing? > > FWIW, I was too busy to reply at the time, but I agree that the Apple > interface is nice. However, sometimes one needs a hard CPU binding > interface like this one, and I don't see any reason to defer adding > this interface in favor of the Apple one, since they are somewhat > orthogonal. I'd be strongly in favor of having a hard CPU binding > interface. > > Thanks for working on this, > Regardless of what the "optimal" API is, we should support this for the benefit of Linux applications. Last I looked more applications were developed on Linux than on FreeBSD. Can someone give a good reason why this should not go in? -Kip From owner-freebsd-arch@FreeBSD.ORG Fri Jan 11 20:52:43 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8663316A46D for ; Fri, 11 Jan 2008 20:52:43 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.netplex.net (mail.netplex.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id 4E0A713C455 for ; Fri, 11 Jan 2008 20:52:43 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.netplex.net (8.14.2/8.14.2/NETPLEX) with ESMTP id m0BKqXnc001974; Fri, 11 Jan 2008 15:52:33 -0500 (EST) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-4.0 (mail.netplex.net [204.213.176.10]); Fri, 11 Jan 2008 15:52:33 -0500 (EST) Date: Fri, 11 Jan 2008 15:52:33 -0500 (EST) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Andrew Gallatin In-Reply-To: <18311.49715.457070.397815@grasshopper.cs.duke.edu> Message-ID: References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jan 2008 20:52:43 -0000 On Fri, 11 Jan 2008, Andrew Gallatin wrote: > > Jeff Roberson writes: > > I have implemented a linux compatible sched_setaffinity() call which is > > somewhat crippled. This allows a userspace process to supply a bitmask of > > processors which it will run on. I have copied the linux interface such > > that it should be api compatible because I believe it is a sensible > > interface and they beat us to it by 3 years. > > I'm somewhat surprised that this has not hit the tree yet. What > happened? Wasn't the consensus that it was a good thing? > > FWIW, I was too busy to reply at the time, but I agree that the Apple > interface is nice. However, sometimes one needs a hard CPU binding > interface like this one, and I don't see any reason to defer adding > this interface in favor of the Apple one, since they are somewhat > orthogonal. I'd be strongly in favor of having a hard CPU binding > interface. I favor the Solaris API which allows you to specify either a process or a thread (LWP) and a processor set. -- DE From owner-freebsd-arch@FreeBSD.ORG Fri Jan 11 21:12:35 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7499316A419; Fri, 11 Jan 2008 21:12:35 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.freebsd.org (Postfix) with ESMTP id 3B62B13C457; Fri, 11 Jan 2008 21:12:35 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.14.0/8.14.0) with ESMTP id m0BLCPjo008392 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 11 Jan 2008 16:12:25 -0500 (EST) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id m0BLBYQt010180; Fri, 11 Jan 2008 16:11:34 -0500 (EST) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <18311.56221.562219.702112@grasshopper.cs.duke.edu> Date: Fri, 11 Jan 2008 16:11:34 -0500 (EST) To: Daniel Eischen In-Reply-To: References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Cc: arch@freebsd.org Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jan 2008 21:12:35 -0000 Daniel Eischen writes: > On Fri, 11 Jan 2008, Andrew Gallatin wrote: > > > > > Jeff Roberson writes: > > > I have implemented a linux compatible sched_setaffinity() call which is > > > somewhat crippled. This allows a userspace process to supply a bitmask of > > > processors which it will run on. I have copied the linux interface such > > > that it should be api compatible because I believe it is a sensible > > > interface and they beat us to it by 3 years. > > > > I'm somewhat surprised that this has not hit the tree yet. What > > happened? Wasn't the consensus that it was a good thing? > > > > FWIW, I was too busy to reply at the time, but I agree that the Apple > > interface is nice. However, sometimes one needs a hard CPU binding > > interface like this one, and I don't see any reason to defer adding > > this interface in favor of the Apple one, since they are somewhat > > orthogonal. I'd be strongly in favor of having a hard CPU binding > > interface. > > I favor the Solaris API which allows you to specify either > a process or a thread (LWP) and a processor set. Honestly, I don't care what the API is. I just want a way to do hard CPU binding. Since Jeff has a patch, I'm strongly in favor of doing it his way. A bird in the hand beats 2 in the bush. :) Drew From owner-freebsd-arch@FreeBSD.ORG Sat Jan 12 13:15:37 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DA8E316A420 for ; Sat, 12 Jan 2008 13:15:37 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 7FFCA13C46E for ; Sat, 12 Jan 2008 13:15:37 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id 656FD20C6; Sat, 12 Jan 2008 14:15:28 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.2/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id 4A00D20C1; Sat, 12 Jan 2008 14:15:28 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id 15BD28449F; Sat, 12 Jan 2008 14:15:28 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Daniel Eischen References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> Date: Sat, 12 Jan 2008 14:15:28 +0100 In-Reply-To: (Daniel Eischen's message of "Fri\, 11 Jan 2008 15\:52\:33 -0500 \(EST\)") Message-ID: <863at36v7z.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org, Andrew Gallatin Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jan 2008 13:15:37 -0000 Daniel Eischen writes: > I favor the Solaris API which allows you to specify either a process > or a thread (LWP) and a processor set. Cf Kip. Regardless of which API we choose for FreeBSD applications, we should also implement the Linux API to simplify the porting of Linux applications. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Sat Jan 12 15:06:19 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3907016A421 for ; Sat, 12 Jan 2008 15:06:19 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.netplex.net (mail.netplex.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id DD0F813C478 for ; Sat, 12 Jan 2008 15:06:18 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.netplex.net (8.14.2/8.14.2/NETPLEX) with ESMTP id m0CF6DYZ025702; Sat, 12 Jan 2008 10:06:14 -0500 (EST) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-4.0 (mail.netplex.net [204.213.176.10]); Sat, 12 Jan 2008 10:06:13 -0500 (EST) Date: Sat, 12 Jan 2008 10:06:14 -0500 (EST) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= In-Reply-To: <863at36v7z.fsf@ds4.des.no> Message-ID: References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <863at36v7z.fsf@ds4.des.no> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-559023410-959030623-1200150374=:12677" Cc: arch@freebsd.org, Andrew Gallatin Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jan 2008 15:06:19 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---559023410-959030623-1200150374=:12677 Content-Type: TEXT/PLAIN; charset=utf-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Sat, 12 Jan 2008, Dag-Erling Sm=C3=B8rgrav wrote: > Daniel Eischen writes: >> I favor the Solaris API which allows you to specify either a process >> or a thread (LWP) and a processor set. > > Cf Kip. Regardless of which API we choose for FreeBSD applications, we > should also implement the Linux API to simplify the porting of Linux > applications. This doesn't sound like a widely used Linux API for which someone couldn't easily figure out how to translate cpu_setaffinity() into: pset_bind(psetid_t pset, idtype_t idtype, id_t id, psetid_t *opset); where id is the process or thread id, and idtype is P_PID or P_TID. The linux compat ABI will want cpu_setaffinity() but that doesn't mean we should provide it natively, when there is an API that isn't as short-sighted as the Linux API. Note that Solaris also has a set of command-line interfaces for binding processors or threads to processors. We've used this in the past, under Solaris 6 or 7 I think, but not under any more recent releases. --=20 DE ---559023410-959030623-1200150374=:12677-- From owner-freebsd-arch@FreeBSD.ORG Sat Jan 12 17:00:56 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D18D816A47C for ; Sat, 12 Jan 2008 17:00:56 +0000 (UTC) (envelope-from rrs@cisco.com) Received: from sj-iport-3.cisco.com (sj-iport-3-in.cisco.com [171.71.176.72]) by mx1.freebsd.org (Postfix) with ESMTP id 3D93813C45A for ; Sat, 12 Jan 2008 17:00:56 +0000 (UTC) (envelope-from rrs@cisco.com) Received: from sj-dkim-3.cisco.com ([171.71.179.195]) by sj-iport-3.cisco.com with ESMTP; 12 Jan 2008 09:00:54 -0800 Received: from sj-core-5.cisco.com (sj-core-5.cisco.com [171.71.177.238]) by sj-dkim-3.cisco.com (8.12.11/8.12.11) with ESMTP id m0CH0sNZ030917; Sat, 12 Jan 2008 09:00:54 -0800 Received: from xbh-sjc-231.amer.cisco.com (xbh-sjc-231.cisco.com [128.107.191.100]) by sj-core-5.cisco.com (8.12.10/8.12.6) with ESMTP id m0CH0ssj015528; Sat, 12 Jan 2008 17:00:54 GMT Received: from xfe-sjc-211.amer.cisco.com ([171.70.151.174]) by xbh-sjc-231.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Sat, 12 Jan 2008 09:00:54 -0800 Received: from [127.0.0.1] ([171.68.225.134]) by xfe-sjc-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Sat, 12 Jan 2008 09:00:53 -0800 Message-ID: <4788F1AE.4030502@cisco.com> Date: Sat, 12 Jan 2008 11:58:22 -0500 From: Randall Stewart User-Agent: Mozilla/5.0 (X11; U; Linux i386; en-US; rv:1.7.13) Gecko/20070601 X-Accept-Language: en-us, en MIME-Version: 1.0 To: qingli@speakeasy.net References: <30834.1199989743@speakeasy.net> In-Reply-To: <30834.1199989743@speakeasy.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 12 Jan 2008 17:00:53.0568 (UTC) FILETIME=[B0FABC00:01C8553C] DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; l=7212; t=1200157254; x=1201021254; c=relaxed/simple; s=sjdkim3002; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=rrs@cisco.com; z=From:=20Randall=20Stewart=20 |Subject:=20Re=3A=20Routing=20in=20the=20network=20=3A-) |Sender:=20; bh=mxajSDvZkgyShEbSCwhl0Ayr5dac1as3Fyb8nqqQz6o=; b=GjsRmXVF81gnqy5BxTauHFjwkU+38LJDa/85cDWU+nXQIVEO6vNiKbk5Kq NWabuv251RVod7zDljSjBBhg+UKPUeZ1vzOsst47p41HcoJOyj/hbEEZjtVB cLmZ4C512O; Authentication-Results: sj-dkim-3; header.From=rrs@cisco.com; dkim=pass ( sig from cisco.com/sjdkim3002 verified; ); Cc: qingli@freebsd.org, freebsd-arch@freebsd.org Subject: Re: Routing in the network :-) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jan 2008 17:00:56 -0000 Qing Li: Ok, the branch is created :-) //depot/user/rrs/alt_route Please go ahead and pull a copy and add your changes.. unless you would like me to ressurect my old patches.. (let me know).. If you put your patches in ping me to tell me they are in .. and then I will ressurect my old alternate route lookup patch and the changes to SCTP to use it :-) Ping me either way R Qing Li wrote: > Interesting you are bringing this up ... I actually sent a similar email to freebsd-net@ > about 2 years ago and had one response back (it was a polite no). > > I back ported and integrated the radix_mpath changes from KAME into FreeBSD 5.4 > and the changes are working good right now in production environment. Changes > were also necessary in quite a few place throughout the netinet/ files, e.g., > address initialization functions such as in_ifinit(). > > I actually discussed what I have done with itojun back in August of 2007. > > >>On Thu Jan 10 6:55 , Randall Stewart sent: >> >>Hi all: >> >>A number of years ago, Itojun and I had played off and on >>with some modifications to both the routing table and to a >>"new" interfaces that could be used by transports to gain >>routing information. >> >>I am contemplating digging back in my archives and building >>a p4 branch that would have the changes for folks to look at.. >>But before I go to all that trouble I want to have a discussion >>about this here ;-) >> >>This will be a longish email so if you get bored easily or just >>don't care about routing/networks and all that fun, you have >>been warned :-) >> >>The basic concept: >> >>So say I am at home and have purchased two DSL's. One from >>AT&T (don't you love the new ma-bell) and the other from >>SpeakEasy (Note I had this until I moved out to the country >>now I am lucky to have one DSL.. but many can do this if they >>want)... So my home looked like: >> >> >>IP-A IP-S >>| | >>| | >>| | >>,__|__________|___ >>| | >>| | >>| lakerest.net | >>| | >>|_________________| >> >>Now life is good, I have some degree of >>fault tolerance right? >> >>So AT&T (IP-A) gives me the default route to IP-A1 >>and Speak Easy gives me the default route to IP-S1. >>Life is not so good... how do I plumb these in the >>routing table? >> >>I can say >> >>route add default IP-A1 >>or >>route add default IP-S1 >> >>But I cannot have both. And worse if I had a connection >>up to FreeBSD.net and AT&T's network went down.. and I >>happened to have put the first command in.. my network >>connection would stop... >> >>What would be nice if I had a way to add BOTH routes >>into the kernel.. and when Layer 4 realized there was some >>major problems going on it could "use" the alternate >>route (i.e. via IP-S1) and life would once again be >>good.. >> >>Ok, yes, the observant person out there will say.. wait >>IP-S1 will NEVER allow your packets through since they >>probably do ingress filtering.. yes I am aware of this.. but >>this would *NOT* hold true for some device in the network >>talking to some other device in the network.. *OR* for >>speakeasy.. at least not circa 2004.. since speakeasy >>did *NOT* do ingress filtering and my way back former >>employer (AT&T) *DID* do ingress filtering.. >> >>So the idea is rather simple: >> >>1) Allow multiple routes on any level of the kernel >>patricia trees. >> > > > This is done. > > >>2) Add an additional interface to the routing code >>so that a transport protocol could query the >>routing table for additional support... i.e. >>excuse me, the route that I had no longer seems >>to be working, do you have an alternate gateway? >> > > > There was a inp_route field in the in_pcb{} structure but > that field was later removed by Andre in 5.5. I never quite > understood why but I did find that field to be rather useful. > > union { > /* placeholder for routing entry */ > struct route inc4_route; > #if 1 /* def NEW_STRUCT_ROUTE */ > struct route inc6_route; > #else > struct route_in6 inc6_route; > #endif > } inc_dependroute; > > I used this field for caching and it gets flushed when > there is a routing table change. Works out good. > > >>Now I admit for TCP these API's would have limited use.. >> > > > That depends ... :-) > > >>but for SCTP these are golden.. since both sides know >>about all addresses and thus you get a form of true >>network diversity out of this little software change. >> >> >>Now yes, this does not help you if both your DSL's >>go out to the same pole outside your house, and a >>truck hits the pole... but it *DOES* help you if >>your network provider dies somewhere back in the CO >>running across your carpet to AT&T's DSL and it thinks >>chewing on it would be fun :-) >> >>So what was required way back in 4.x when Itojun and >>I did this work.. (note that Itojun called his changes >>RADIX_MPATH which did NOT include my alternate >>routing lookup code). >> >>a) For radix.c there were just a few simple changes that >>removed the restriction that prevents duplicate routes >>at any level of the tree. >> >>b) For route.c a new method is added.. this is a bit >>of code not huge but some. >> > > > The rtrequest1() function needed a bit of work but not so huge. > > >>c) One thing I added but took back out, was some changes to >>the "route delete" api... can't remember exactly where.. but >>basically the delete does not look at the destination ... i.e. >>with the changes Itojun and I had cooked up if you said: >>route add default IP-1 >>route add default IP-2 >>route add default IP-3 >> >>and then when.. opps.. I don't want IP-2, you could NOT >>say route delete default IP-2.. well you could but it did >>no good.. it removed the first one (IP-1). I had a fix for >>this but Itojun thought it was too radical since it changed >>an interface to one of the routing routines... so we just settled >>for the fact that if you did that you got to have the pleasure >>of using: >>route delete default >>3 times.. and then starting again... >> > > > I have been enhancing the code for some time now ... > > I can do both route delete and even route modification (I added > route preferences in addition to ECMP). > > I have 7 fundamental test cases to perform on the implementation to ensure > both correctness and compatibility. > > >>So is it worth my time resurrecting these patches for 8.0? Objections >>(being in a routing company I know there will be a lot of them.. >>gee the routing system is supposed to do that.. etc etc). >> >>Comments would be welcome before I dust off the patches.. >> > > > I would like to get these changes made into 8.0. > > If there is enough interest out there, I'd be happy to share my implementation > and we probabaly can collaborate on this effort if that works for you. > > -- Qing > > > > R -- Randall Stewart NSSTG - Cisco Systems Inc. 803-345-0369 803-317-4952 (cell) From owner-freebsd-arch@FreeBSD.ORG Sat Jan 12 18:34:07 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 573C616A417 for ; Sat, 12 Jan 2008 18:34:07 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 12CEF13C442 for ; Sat, 12 Jan 2008 18:34:06 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 2A07C46DAB; Sat, 12 Jan 2008 13:34:06 -0500 (EST) Date: Sat, 12 Jan 2008 18:34:06 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Andrew Gallatin In-Reply-To: <18311.49715.457070.397815@grasshopper.cs.duke.edu> Message-ID: <20080112182948.F36731@fledge.watson.org> References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jan 2008 18:34:07 -0000 On Fri, 11 Jan 2008, Andrew Gallatin wrote: > I'm somewhat surprised that this has not hit the tree yet. What happened? > Wasn't the consensus that it was a good thing? I think Jeff just got busy with other stuff. > FWIW, I was too busy to reply at the time, but I agree that the Apple > interface is nice. However, sometimes one needs a hard CPU binding > interface like this one, and I don't see any reason to defer adding this > interface in favor of the Apple one, since they are somewhat orthogonal. > I'd be strongly in favor of having a hard CPU binding interface. The Apple API is nice in terms of capabilities, but we wouldn't be able to use it directly as it Mach-esque (as I understand it). Of course, Jeff's implementation of the Linux API doesn't actually fully implement the API (it doesn't support constraining the CPU set vs. binding to one CPU, and the patch as-provided didn't support querying the binding). I agree I'd like to see if in the tree, if only because it would let me eliminate local hacks I have that do the same thing, but we should think about other interfaces that are more expressive in the longer term. For example, one thing I like about the Apple interface is the ability to specify general strategies for affinity rather than specific affinities -- "these threads like to be together, but they don't mind where that is". Likewise, the Solaris facility to be able to change a CPU set and have all the things pinned to it follow the centrally-administered set is a nice match for our concept of Jail. Finally, if we do want it to work well with Jail, and we want Jails to be able to be pinned to sets of CPUs, we also need a nested concept of how to handle affinity, in the event that the set of CPUs a Jail is running on changes, in which case perhaps you want relative numbering within the jail, or some other similar notion. Sounds like a nice whiteboard session at the BSDCan developer summit... Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Sat Jan 12 19:58:48 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6669A16A417 for ; Sat, 12 Jan 2008 19:58:48 +0000 (UTC) (envelope-from qingli@speakeasy.net) Received: from mail2.sea5.speakeasy.net (mail2.sea5.speakeasy.net [69.17.117.4]) by mx1.freebsd.org (Postfix) with ESMTP id 2AE5213C459 for ; Sat, 12 Jan 2008 19:58:48 +0000 (UTC) (envelope-from qingli@speakeasy.net) Received: (qmail 21629 invoked from network); 12 Jan 2008 19:58:47 -0000 Received: from dsl081-051-141.sfo1.dsl.speakeasy.net (HELO SAINTS) (qingli@[64.81.51.141]) (envelope-sender ) by mail2.sea5.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 12 Jan 2008 19:58:47 -0000 From: "Qing Li" To: "'Randall Stewart'" References: <30834.1199989743@speakeasy.net> <4788F1AE.4030502@cisco.com> Date: Sat, 12 Jan 2008 11:58:50 -0800 Message-ID: <001801c85555$8d3bd970$8d335140@SAINTS> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198 In-Reply-To: <4788F1AE.4030502@cisco.com> Thread-Index: AchVPL3ASJ2hq5FcTPWmsHhU+MvMtwAGJwog Cc: qingli@freebsd.org, freebsd-arch@freebsd.org Subject: RE: Routing in the network :-) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jan 2008 19:58:48 -0000 Okay, I will pull the branch and start the work on it. Let me get all of my changes in place and after testing the features out, I will ping you and you can let me know if you require additional support, we'll just go from there. -- Qing > -----Original Message----- > From: owner-freebsd-arch@freebsd.org > [mailto:owner-freebsd-arch@freebsd.org] On Behalf Of Randall Stewart > Sent: Saturday, January 12, 2008 8:58 AM > To: qingli@speakeasy.net > Cc: qingli@freebsd.org; freebsd-arch@freebsd.org > Subject: Re: Routing in the network :-) > > Qing Li: > > Ok, the branch is created :-) > > //depot/user/rrs/alt_route > > Please go ahead and pull a copy and add your changes.. unless > you would like me to ressurect my old patches.. (let me know).. > > If you put your patches in ping me to tell me they are in .. > and then I will ressurect my old alternate route lookup patch > and the changes to SCTP to use it :-) > > Ping me either way > > > R > > Qing Li wrote: > > Interesting you are bringing this up ... I actually > sent a similar email to freebsd-net@ > > about 2 years ago and had one response back (it was a > polite no). > > > > I back ported and integrated the radix_mpath changes > from KAME into FreeBSD 5.4 > > and the changes are working good right now in > production environment. Changes > > were also necessary in quite a few place throughout the > netinet/ files, e.g., > > address initialization functions such as in_ifinit(). > > > > I actually discussed what I have done with itojun back > in August of 2007. > > > > > >>On Thu Jan 10 6:55 , Randall Stewart sent: > >> > >>Hi all: > >> > >>A number of years ago, Itojun and I had played off and on with some > >>modifications to both the routing table and to a "new" > interfaces that > >>could be used by transports to gain routing information. > >> > >>I am contemplating digging back in my archives and building a p4 > >>branch that would have the changes for folks to look at.. > >>But before I go to all that trouble I want to have a > discussion about > >>this here ;-) > >> > >>This will be a longish email so if you get bored easily or > just don't > >>care about routing/networks and all that fun, you have been > warned :-) > >> > >>The basic concept: > >> > >>So say I am at home and have purchased two DSL's. One from > AT&T (don't > >>you love the new ma-bell) and the other from SpeakEasy (Note I had > >>this until I moved out to the country now I am lucky to > have one DSL.. > >>but many can do this if they want)... So my home looked like: > >> > >> > >>IP-A IP-S > >>| | > >>| | > >>| | > >>,__|__________|___ > >>| | > >>| | > >>| lakerest.net | > >>| | > >>|_________________| > >> > >>Now life is good, I have some degree of fault tolerance right? > >> > >>So AT&T (IP-A) gives me the default route to IP-A1 and Speak Easy > >>gives me the default route to IP-S1. > >>Life is not so good... how do I plumb these in the routing table? > >> > >>I can say > >> > >>route add default IP-A1 > >>or > >>route add default IP-S1 > >> > >>But I cannot have both. And worse if I had a connection up to > >>FreeBSD.net and AT&T's network went down.. and I happened > to have put > >>the first command in.. my network connection would stop... > >> > >>What would be nice if I had a way to add BOTH routes into > the kernel.. > >>and when Layer 4 realized there was some major problems going on it > >>could "use" the alternate route (i.e. via IP-S1) and life > would once > >>again be good.. > >> > >>Ok, yes, the observant person out there will say.. wait > >>IP-S1 will NEVER allow your packets through since they probably do > >>ingress filtering.. yes I am aware of this.. but this would > *NOT* hold > >>true for some device in the network talking to some other device in > >>the network.. *OR* for speakeasy.. at least not circa 2004.. since > >>speakeasy did *NOT* do ingress filtering and my way back former > >>employer (AT&T) *DID* do ingress filtering.. > >> > >>So the idea is rather simple: > >> > >>1) Allow multiple routes on any level of the kernel patricia trees. > >> > > > > > > This is done. > > > > > >>2) Add an additional interface to the routing code so that > a transport > >>protocol could query the routing table for additional > support... i.e. > >>excuse me, the route that I had no longer seems to be > working, do you > >>have an alternate gateway? > >> > > > > > > There was a inp_route field in the in_pcb{} structure but > > that field was later removed by Andre in 5.5. I never quite > > understood why but I did find that field to be rather useful. > > > > union { > > /* placeholder for routing entry */ > > struct route inc4_route; > > #if 1 /* def NEW_STRUCT_ROUTE */ > > struct route inc6_route; > > #else > > struct route_in6 inc6_route; > > #endif > > } inc_dependroute; > > > > I used this field for caching and it gets flushed when > > there is a routing table change. Works out good. > > > > > >>Now I admit for TCP these API's would have limited use.. > >> > > > > > > That depends ... :-) > > > > > >>but for SCTP these are golden.. since both sides know about all > >>addresses and thus you get a form of true network diversity out of > >>this little software change. > >> > >> > >>Now yes, this does not help you if both your DSL's go out > to the same > >>pole outside your house, and a truck hits the pole... but it *DOES* > >>help you if your network provider dies somewhere back in the CO > >>running across your carpet to AT&T's DSL and it thinks > chewing on it > >>would be fun :-) > >> > >>So what was required way back in 4.x when Itojun and I did > this work.. > >>(note that Itojun called his changes RADIX_MPATH which did > NOT include > >>my alternate routing lookup code). > >> > >>a) For radix.c there were just a few simple changes that > removed the > >>restriction that prevents duplicate routes at any level of the tree. > >> > >>b) For route.c a new method is added.. this is a bit of > code not huge > >>but some. > >> > > > > > > The rtrequest1() function needed a bit of work but not so huge. > > > > > >>c) One thing I added but took back out, was some changes to > the "route > >>delete" api... can't remember exactly where.. but basically > the delete > >>does not look at the destination ... i.e. > >>with the changes Itojun and I had cooked up if you said: > >>route add default IP-1 > >>route add default IP-2 > >>route add default IP-3 > >> > >>and then when.. opps.. I don't want IP-2, you could NOT say route > >>delete default IP-2.. well you could but it did no good.. > it removed > >>the first one (IP-1). I had a fix for this but Itojun > thought it was > >>too radical since it changed an interface to one of the routing > >>routines... so we just settled for the fact that if you did > that you > >>got to have the pleasure of using: > >>route delete default > >>3 times.. and then starting again... > >> > > > > > > I have been enhancing the code for some time now ... > > > > I can do both route delete and even route modification (I added > > route preferences in addition to ECMP). > > > > I have 7 fundamental test cases to perform on the > implementation to ensure > > both correctness and compatibility. > > > > > >>So is it worth my time resurrecting these patches for 8.0? > Objections > >>(being in a routing company I know there will be a lot of them.. > >>gee the routing system is supposed to do that.. etc etc). > >> > >>Comments would be welcome before I dust off the patches.. > >> > > > > > > I would like to get these changes made into 8.0. > > > > If there is enough interest out there, I'd be happy to > share my implementation > > and we probabaly can collaborate on this effort if that > works for you. > > > > -- Qing > > > > > > > > R > > > -- > Randall Stewart > NSSTG - Cisco Systems Inc. > 803-345-0369 803-317-4952 (cell) > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to > "freebsd-arch-unsubscribe@freebsd.org" > >