From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 16:23:46 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AB4AA16A418; Sat, 15 Dec 2007 16:23:46 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 6F3D113C467; Sat, 15 Dec 2007 16:23:46 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id EECC947E90; Sat, 15 Dec 2007 11:23:45 -0500 (EST) Date: Sat, 15 Dec 2007 16:23:45 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Kip Macy In-Reply-To: <20071215100351.Q70617@fledge.watson.org> Message-ID: <20071215152959.V85668@fledge.watson.org> References: <20071215100351.Q70617@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD Current , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 16:23:46 -0000 On Sat, 15 Dec 2007, Robert Watson wrote: > More comments to follow. Some more, back to tcp_offload.c which I didn't look at last time around: + ifp = rt->rt_ifp; + tdev = TOEDEV(ifp); + if (tdev == NULL) + return (EINVAL); + + if (tdev->tod_can_offload(tdev, so) == 0) + return (EINVAL); I sort of expected to see a if_capenable check for TOE here, but I guess it doesn't make much difference if the flag is going to be checked at the lower level. However, in the case of things like TCP checksum offload, we do do the check at the higher level, so it wouldn't be inconsistent to do that. BTW, could you prepare similar comments for toedev.h? +struct toe_usrreqs tcp_offload_usrreqs = { + .tu_send = tcp_output, + .tu_rcvd = tcp_output, + .tu_disconnect = tcp_output, + .tu_abort = tcp_output, + .tu_detach = tcp_offload_detach_noop, + .tu_syncache_event = tcp_offload_syncache_event_noop, +}; This structure seems to introduce quite a bit more indirection for non-offloaded cases than before, especially to do things like this: +static void +tcp_offload_syncache_event_noop(int event, void *toepcb) +{ +} The compiler can't compile these out because it likely doesn't do much in the way of function pointer analysis, and probably shouldn't. However, it leaves quite a few code paths heavier weight than before. I think I'd prefer a model in which a TF_OFFLOAD flag is set on the tcpcb and then we conditionally invoke tu_foo as a result. I.e.,: static __inline int tcp_gen_send(struct tcpcb *tp) { #ifndef TCP_OFFLOAD_DISABLE if (tcp->f_flag & TF_OFFLOADED) return (tp->t_tu->tu_send(tp)); else #endif return (tcp_output(tp)); } This would compile to a straight call to tcp_output() when offloading isn't compiled in, and when it is compiled in and offloading isn't enabled, we do a simple flag check rather than invoking a function via a series of pointers. Back to tcp_offload.h: +#define SC_ENTRY_PRESENT 1 /* 4-tuple already present */ +#define SC_DROP 2 /* connection was timed out */ I think you should give these a different prefix, since they're part of the interface between the syncache and offload, and SC_ generally are flags used internal to the syncache. Later you use TCP_OFFLOAD_ as the prefix, so that may be appropriate here also. +#define TCP_OFFLOAD_LISTEN_OPEN 1 +#define TCP_OFFLOAD_LISTEN_CLOSE 2 + +typedef void (*tcp_offload_listen_fn)(void *, int, struct tcpcb *); +EVENTHANDLER_DECLARE(tcp_offload_listen, tcp_offload_listen_fn); Here and with the syncache interface, it seems like you're adding new operations as modes to functions, whereas elsewhere you're adding multiple functions. There isn't too much difference, but I think I'd rather see a slightly wider interface (i.e., more functions) and avoid flags that change the semantics of particular functions. That is to say: tu_syncache_add and tu_syncache_drop, and likewise tcp_offload_listen and tcp_offload_unlisten (or something similar). +int tcp_offload_connect(struct socket *so, struct sockaddr *nam); This prototype appear not to be documented. +/* + * The tcp_gen_* routines are wrappers around the toe_usrreqs calls, + * in the non-offloaded case they translate to tcp_output. Not really sure about the _gen_ naming, but not sure I have a better suggestion in mind just yet -- maybe _output_ since they control output. I'm not entirely thrilled that all of these become inline functions in include files, although since a couple are used in multiple tcp*.c files, it may be the least bad of the options. My comments above about possibly restructuring this to avoid lots of indirection for non-offloaded case strengthen the argument for inlining, however. +#ifndef TCP_OFFLOAD_DISABLE I notice you've renamed the option, and I like the new name a lot better. However, I also notice that it doesn't seem to appear in conf/options or conf/files. Could you make sure to add it? +/* + * The socket has not been marked as "do not offload" + */ +#define SO_OFFLOADABLE(so) ((so->so_options & SO_NO_OFFLOAD) == 0) I find myself wondering if the offload option should be a TCP socket option rather than a socket-layer option, but don't have strong feelings about this. What do you intend the policy model to be for enabling TOE, in general? That if the TOE capability is available and the TOE capability is enabled, all TCP sockets via the enabled interface will be offloaded unless SO_NO_OFFLOAD is set? Are you currently anticipating enabling the capability by default? Nitpick on style: + /* disconnect offload device, if any */ Comments that are sentences should begin with an upper case letter and end with a period. :-) And another one: +#ifdef TCP_OFFLOAD_DISABLE +#define TOEPCB_SET(sc) (0) +#else +#define TOEPCB_SET(sc) ((sc)->sc_toepcb != NULL) +#endif + + Too many blank lines there. I've noticed that in quite a few places, we prefer X_ISSET() to test for a set flag or other condition, in order to prevent confusion with X_SET() assigning a value. I think that's pretty sensible, and should do it more myself, so encourage you to do the same Robert N M Watson Computer Laboratory University of Cambridge