From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 18:10:22 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 634AC16A418 for ; Sat, 15 Dec 2007 18:10:22 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.181]) by mx1.freebsd.org (Postfix) with ESMTP id 27E3413C4EE for ; Sat, 15 Dec 2007 18:10:22 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so2244225waf.3 for ; Sat, 15 Dec 2007 10:10:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=z7Iw+pvwm5uxTOJW2rs+OaFu4bUSZ2tC9EURox+qH9U=; b=aR37tjXjQLeS9mgZ4qbDN9IuhmzbwIAAQ+p/DTgmy+9/Odt91up2cRUbJLInubkMLjBewzQB2eiaRMJVnJ/QQZeSAJliWWuLq+n5yz5UEzsE0mZUMxRIaMfzJvd4jAFrOJZACpSPn/JZHetBQnSLStbFAAL/J+UibkkQQmKxeXU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=rz8/jCZ9cVEJubQLAdBIYl9HjbmnykfpfGV7ItsAWUaaNm3itA7gHS7ULhFhracF/BAdYr5Mna59QhSpgA19Eh0V/40d3++aKVp70IntzfNbvxIQrGvcSMqf/+lw2E/uGMJZr5QiNwJyQn3bbYicUuw6W/9sd3citUmM/zBTVqU= Received: by 10.114.78.1 with SMTP id a1mr324244wab.102.1197742221471; Sat, 15 Dec 2007 10:10:21 -0800 (PST) Received: by 10.114.255.11 with HTTP; Sat, 15 Dec 2007 10:10:21 -0800 (PST) Message-ID: Date: Sat, 15 Dec 2007 10:10:21 -0800 From: "Kip Macy" To: "Robert Watson" In-Reply-To: <20071215152959.V85668@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20071215100351.Q70617@fledge.watson.org> <20071215152959.V85668@fledge.watson.org> Cc: FreeBSD Current , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 18:10:22 -0000 On Dec 15, 2007 8:23 AM, Robert Watson wrote: > > On Sat, 15 Dec 2007, Robert Watson wrote: > > > More comments to follow. > > Some more, back to tcp_offload.c which I didn't look at last time around: > > + ifp = rt->rt_ifp; > + tdev = TOEDEV(ifp); > + if (tdev == NULL) > + return (EINVAL); > + > + if (tdev->tod_can_offload(tdev, so) == 0) > + return (EINVAL); > > I sort of expected to see a if_capenable check for TOE here, but I guess it > doesn't make much difference if the flag is going to be checked at the lower > level. However, in the case of things like TCP checksum offload, we do do the > check at the higher level, so it wouldn't be inconsistent to do that. Yes, IFCAP_TOE should be checked here. > > BTW, could you prepare similar comments for toedev.h? Ok, toedev.h isn't actually originally from me. However, it is a part of the API and so should be documented. > > +struct toe_usrreqs tcp_offload_usrreqs = { > + .tu_send = tcp_output, > + .tu_rcvd = tcp_output, > + .tu_disconnect = tcp_output, > + .tu_abort = tcp_output, > + .tu_detach = tcp_offload_detach_noop, > + .tu_syncache_event = tcp_offload_syncache_event_noop, > +}; > > This structure seems to introduce quite a bit more indirection for > non-offloaded cases than before, especially to do things like this: > > +static void > +tcp_offload_syncache_event_noop(int event, void *toepcb) > +{ > +} This should never be called as sc_tu will only be set by the TOE interface to the syncache. Also bear in mind that this and detach are only called once per connection The real overhead is in calling tcp_output via an indirect function call versus a direct function call. Considering the level of gratuitous overhead in the output path I would be surprised if this were measurable. > > The compiler can't compile these out because it likely doesn't do much in the > way of function pointer analysis, and probably shouldn't. However, it leaves > quite a few code paths heavier weight than before. I think I'd prefer a model > in which a TF_OFFLOAD flag is set on the tcpcb and then we conditionally > invoke tu_foo as a result. I.e.,: > > static __inline int > tcp_gen_send(struct tcpcb *tp) > { > > #ifndef TCP_OFFLOAD_DISABLE > if (tcp->f_flag & TF_OFFLOADED) > return (tp->t_tu->tu_send(tp)); > else > #endif > return (tcp_output(tp)); > } > > This would compile to a straight call to tcp_output() when offloading isn't > compiled in, and when it is compiled in and offloading isn't enabled, we do a > simple flag check rather than invoking a function via a series of pointers. This is what the current code does. See tcp_ofld.h in CVS. The indirection was suggested by Sam as a cleaner abstraction. > > Back to tcp_offload.h: > > +#define SC_ENTRY_PRESENT 1 /* 4-tuple already > present */ > +#define SC_DROP 2 /* connection was > timed out */ > > I think you should give these a different prefix, since they're part of the > interface between the syncache and offload, and SC_ generally are flags used > internal to the syncache. Later you use TCP_OFFLOAD_ as the prefix, so that > may be appropriate here also. > > +#define TCP_OFFLOAD_LISTEN_OPEN 1 > +#define TCP_OFFLOAD_LISTEN_CLOSE 2 > + > +typedef void (*tcp_offload_listen_fn)(void *, int, struct tcpcb *); > +EVENTHANDLER_DECLARE(tcp_offload_listen, tcp_offload_listen_fn); > > Here and with the syncache interface, it seems like you're adding new > operations as modes to functions, whereas elsewhere you're adding multiple > functions. There isn't too much difference, but I think I'd rather see a > slightly wider interface (i.e., more functions) and avoid flags that change > the semantics of particular functions. That is to say: tu_syncache_add and > tu_syncache_drop, and likewise tcp_offload_listen and tcp_offload_unlisten (or > something similar). I've kept the interface to as few functions as possible. I've expanded tcp_output to 4 functions because it was necessary semantically. I'll widen the interface if Sam agrees. > > +int tcp_offload_connect(struct socket *so, struct sockaddr *nam); > > This prototype appear not to be documented. Ok, it appears fairly self-documenting to me =-D > > +/* > + * The tcp_gen_* routines are wrappers around the toe_usrreqs calls, > + * in the non-offloaded case they translate to tcp_output. > > Not really sure about the _gen_ naming, but not sure I have a better > suggestion in mind just yet -- maybe _output_ since they control output. I'm > not entirely thrilled that all of these become inline functions in include > files, although since a couple are used in multiple tcp*.c files, it may be > the least bad of the options. My comments above about possibly restructuring > this to avoid lots of indirection for non-offloaded case strengthen the > argument for inlining, however. > > +#ifndef TCP_OFFLOAD_DISABLE > > I notice you've renamed the option, and I like the new name a lot better. > However, I also notice that it doesn't seem to appear in conf/options or > conf/files. Could you make sure to add it? > > +/* > + * The socket has not been marked as "do not offload" > + */ > +#define SO_OFFLOADABLE(so) ((so->so_options & SO_NO_OFFLOAD) == 0) > > I find myself wondering if the offload option should be a TCP socket option > rather than a socket-layer option, but don't have strong feelings about this. The assumption is that, if you a) pay the extra for a version of the card that supports TOE and b) you load the toe module (it isn't part of the NIC driver) that you want your existing software to have its connections offloaded. I don't know of any customers that want to modify their user applications to selectively enable TOE. > > What do you intend the policy model to be for enabling TOE, in general? That > if the TOE capability is available and the TOE capability is enabled, all TCP > sockets via the enabled interface will be offloaded unless SO_NO_OFFLOAD is > set? That is the current usage model. At some point we may tie it into pf/ipf/ipfw to provide for offload policy. However, currently we offload all connections on an interface until we run out of TCAM entries. >Are you currently anticipating enabling the capability by default? Yes, *if the TOE driver is loaded*. > > Nitpick on style: > > + /* disconnect offload device, if any */ > > Comments that are sentences should begin with an upper case letter and end > with a period. :-) And another one: > > +#ifdef TCP_OFFLOAD_DISABLE > +#define TOEPCB_SET(sc) (0) > +#else > +#define TOEPCB_SET(sc) ((sc)->sc_toepcb != NULL) > +#endif > + > + > > Too many blank lines there. > > I've noticed that in quite a few places, we prefer X_ISSET() to test for a set > flag or other condition, in order to prevent confusion with X_SET() assigning > a value. I think that's pretty sensible, and should do it more myself, so > encourage you to do the same That sounds reasonable. -Kip