From owner-freebsd-arch@FreeBSD.ORG Sun Dec 9 06:53:06 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B20B916A421 for ; Sun, 9 Dec 2007 06:53:06 +0000 (UTC) (envelope-from dougb@FreeBSD.org) Received: from mail2.fluidhosting.com (mx21.fluidhosting.com [204.14.89.4]) by mx1.freebsd.org (Postfix) with SMTP id 4C0DD13C455 for ; Sun, 9 Dec 2007 06:53:05 +0000 (UTC) (envelope-from dougb@FreeBSD.org) Received: (qmail 6813 invoked by uid 399); 9 Dec 2007 06:53:05 -0000 Received: from localhost (HELO lap.dougb.net) (dougb@dougbarton.us@127.0.0.1) by localhost with ESMTP; 9 Dec 2007 06:53:05 -0000 X-Originating-IP: 127.0.0.1 Message-ID: <475B90CF.8040909@FreeBSD.org> Date: Sat, 08 Dec 2007 22:53:03 -0800 From: Doug Barton Organization: http://www.FreeBSD.org/ User-Agent: Thunderbird 2.0.0.9 (X11/20071119) MIME-Version: 1.0 To: Remko Lodder References: <4759DC08.9070600@FreeBSD.org> <20071208163857.GC91919@lor.one-eyed-alien.net> <475B2BD1.7000303@FreeBSD.org> <475B2C7C.40903@FreeBSD.org> In-Reply-To: <475B2C7C.40903@FreeBSD.org> X-Enigmail-Version: 0.95.5 OpenPGP: id=D5B2F0FB Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Gordon M Tetlow , Brooks Davis , freebsd-arch@freebsd.org Subject: Re: Should libgssapi be hidden behind the MK_KERBEROS knob? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Dec 2007 06:53:06 -0000 Remko Lodder wrote: > The attached patch looks good to me, and the proposals as well.. Thanks! There is one tiny typo in the patch though, +.if ${MK_GSSAPI} = "yes" should be +.if ${MK_GSSAPI} == "yes" Doug -- This .signature sanitized for your protection From owner-freebsd-arch@FreeBSD.ORG Sun Dec 9 20:06:01 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E2BCA16A419; Sun, 9 Dec 2007 20:06:00 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (cl-162.ewr-01.us.sixxs.net [IPv6:2001:4830:1200:a1::2]) by mx1.freebsd.org (Postfix) with ESMTP id 88F2813C448; Sun, 9 Dec 2007 20:06:00 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (localhost [127.0.0.1]) by lor.one-eyed-alien.net (8.14.1/8.13.8) with ESMTP id lB9K5xtI002558; Sun, 9 Dec 2007 14:05:59 -0600 (CST) (envelope-from brooks@lor.one-eyed-alien.net) Received: (from brooks@localhost) by lor.one-eyed-alien.net (8.14.1/8.13.8/Submit) id lB9K5xcq002557; Sun, 9 Dec 2007 14:05:59 -0600 (CST) (envelope-from brooks) Date: Sun, 9 Dec 2007 14:05:59 -0600 From: Brooks Davis To: Doug Barton Message-ID: <20071209200559.GA2444@lor.one-eyed-alien.net> References: <4759DC08.9070600@FreeBSD.org> <20071208163857.GC91919@lor.one-eyed-alien.net> <475B2BD1.7000303@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="4Ckj6UjgE2iN1+kY" Content-Disposition: inline In-Reply-To: <475B2BD1.7000303@FreeBSD.org> User-Agent: Mutt/1.5.16 (2007-06-09) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (lor.one-eyed-alien.net [127.0.0.1]); Sun, 09 Dec 2007 14:05:59 -0600 (CST) Cc: Gordon M Tetlow , freebsd-arch@freebsd.org Subject: Re: Should libgssapi be hidden behind the MK_KERBEROS knob? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Dec 2007 20:06:01 -0000 --4Ckj6UjgE2iN1+kY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Dec 08, 2007 at 03:42:09PM -0800, Doug Barton wrote: > Gordon M Tetlow wrote: > >=20 > > On Dec 8, 2007, at 8:38 AM, Brooks Davis wrote: > >=20 > >> On Fri, Dec 07, 2007 at 03:49:28PM -0800, Doug Barton wrote: > >>> If there is a better list for this, don't hesitate to let me know. > >>> > >>> I use WITHOUT_KERBEROS=3Dtrue in /etc/{make|src}.conf, since I don't > >>> need or use it. However, this leads to a problem with building the > >>> kdelibs3 port. The configure script looks for the presence of > >>> libgssapi and the associated headers, and takes that to mean that > >>> kerberos is available, and sets things up accordingly. This causes > >>> the build to fail when it tries to actually link something to a > >>> kerberos library. > >>> > >>> I realize that GSS can be used for other things besides kerberos, but > >>> are we really losing anything by hiding them both under the same knob? > >>> If the answer to that is yes, is there any objection to a WITHOUT_GSS > >>> knob? > >> > >> We wouldn't loose anything today, but a without GSS knob makes more > >> sense to me. There's at least one other GSS system in fairly wide use > >> in the high performance computing world today. > >=20 > > How about WITHOUT_KERBEROS implies WITHOUT_GSSAPI unless people > > specifically ask for GSSAPI? Is that too obscure? >=20 > That sounds totally reasonable. How does the attached look? Seems fine to me. -- Brooks > Index: lib/Makefile > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > RCS file: /usr/local/ncvs/src/lib/Makefile,v > retrieving revision 1.226 > diff -u -r1.226 Makefile > --- lib/Makefile 17 Nov 2007 21:29:02 -0000 1.226 > +++ lib/Makefile 8 Dec 2007 23:24:47 -0000 > @@ -31,7 +31,7 @@ > libbegemot ${_libbluetooth} libbsnmp libbz2 \ > libcalendar libcam libcompat libdevinfo libdevstat libdisk \ > libedit libexpat libfetch libftpio libgeom ${_libgpib} \ > - libgssapi libipsec \ > + ${_libgssapi} libipsec \ > ${_libipx} libkiconv libmagic libmemstat ${_libmilter} ${_libmp} \ > ${_libncp} ${_libngatm} libopie libpam libpcap \ > libpmc ${_libkse} librt ${_libsdp} ${_libsm} ${_libsmb} \ > @@ -62,6 +62,14 @@ > _libsdp=3D libsdp > .endif > =20 > +.if ${MK_KERBEROS} !=3D "no" > +_libgssapi=3D libgssapi > +.else > +.if ${MK_GSSAPI} =3D "yes" > +_libgssapi=3D libgssapi > +.endif > +.endif > + > .if ${MK_IPX} !=3D "no" > _libipx=3D libipx > .endif > Index: share/man/man5/src.conf.5 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > RCS file: /usr/local/ncvs/src/share/man/man5/src.conf.5,v > retrieving revision 1.20 > diff -u -r1.20 src.conf.5 > --- share/man/man5/src.conf.5 19 Oct 2007 14:03:05 -0000 1.20 > +++ share/man/man5/src.conf.5 8 Dec 2007 23:40:23 -0000 > @@ -288,6 +288,10 @@ > .\" from FreeBSD: src/tools/build/options/WITHOUT_GROFF,v 1.1 2006/03/21= 07:50:49 ru Exp > Set to not build > .Xr groff 1 . > +.It Va WITH_GSSAPI > +Set to build libgssapi when > +.Va WITHOUT_KERBEROS > +is set. > .It Va WITH_HESIOD > .\" from FreeBSD: src/tools/build/options/WITH_HESIOD,v 1.1 2006/03/21 0= 7:50:50 ru Exp > Set to build Hesiod support. > @@ -347,6 +351,10 @@ > .Bl -item -compact > .It > .Va WITHOUT_KERBEROS_SUPPORT > +.It > +.Va WITHOUT_GSSAPI > +(unless overridden by > +.Va WITH_GSSAPI ) > .El > .It Va WITHOUT_KERBEROS_SUPPORT > .\" from FreeBSD: src/tools/build/options/WITHOUT_KERBEROS_SUPPORT,v 1.1= 2006/03/21 07:50:50 ru Exp > Index: share/mk/bsd.own.mk > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > RCS file: /usr/local/ncvs/src/share/mk/bsd.own.mk,v > retrieving revision 1.69 > diff -u -r1.69 bsd.own.mk > --- share/mk/bsd.own.mk 20 Oct 2007 19:01:49 -0000 1.69 > +++ share/mk/bsd.own.mk 8 Dec 2007 23:29:05 -0000 > @@ -381,6 +381,7 @@ > # > .for var in \ > BIND_LIBS \ > + GSSAPI \ > HESIOD \ > IDEA > .if defined(WITH_${var}) && defined(WITHOUT_${var}) > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" --4Ckj6UjgE2iN1+kY Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFHXEqmXY6L6fI4GtQRAmgCAKDgfZeVIiAlZSbDZ7nHdlxlFjk71wCfSAvS OTnBXQDdudGTsxpAH9y+XGc= =Zz2M -----END PGP SIGNATURE----- --4Ckj6UjgE2iN1+kY-- From owner-freebsd-arch@FreeBSD.ORG Sun Dec 9 21:05:15 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CBA0A16A417 for ; Sun, 9 Dec 2007 21:05:15 +0000 (UTC) (envelope-from dougb@FreeBSD.org) Received: from mail2.fluidhosting.com (mx21.fluidhosting.com [204.14.89.4]) by mx1.freebsd.org (Postfix) with SMTP id 644F613C4EB for ; Sun, 9 Dec 2007 21:05:15 +0000 (UTC) (envelope-from dougb@FreeBSD.org) Received: (qmail 6721 invoked by uid 399); 9 Dec 2007 21:05:07 -0000 Received: from localhost (HELO lap.dougb.net) (dougb@dougbarton.us@127.0.0.1) by localhost with ESMTP; 9 Dec 2007 21:05:07 -0000 X-Originating-IP: 127.0.0.1 Message-ID: <475C587D.9070902@FreeBSD.org> Date: Sun, 09 Dec 2007 13:05:01 -0800 From: Doug Barton Organization: http://www.FreeBSD.org/ User-Agent: Thunderbird 2.0.0.9 (X11/20071119) MIME-Version: 1.0 To: Brooks Davis References: <4759DC08.9070600@FreeBSD.org> <20071208163857.GC91919@lor.one-eyed-alien.net> <475B2BD1.7000303@FreeBSD.org> <20071209200559.GA2444@lor.one-eyed-alien.net> In-Reply-To: <20071209200559.GA2444@lor.one-eyed-alien.net> X-Enigmail-Version: 0.95.5 OpenPGP: id=D5B2F0FB Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Gordon M Tetlow , freebsd-arch@freebsd.org Subject: Re: Should libgssapi be hidden behind the MK_KERBEROS knob? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Dec 2007 21:05:15 -0000 Brooks Davis wrote: > Seems fine to me. OK, I'll wait till Monday to commit it in order to give anyone else time to weigh in. I would like to get this in 7.0 if we can, if we can't, I won't lose sleep. Doug -- This .signature sanitized for your protection From owner-freebsd-arch@FreeBSD.ORG Sun Dec 9 22:45:51 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F2BEF16A418 for ; Sun, 9 Dec 2007 22:45:51 +0000 (UTC) (envelope-from edwin@mavetju.org) Received: from mail5out.barnet.com.au (mail5.barnet.com.au [202.83.178.78]) by mx1.freebsd.org (Postfix) with ESMTP id B5CDD13C461 for ; Sun, 9 Dec 2007 22:45:51 +0000 (UTC) (envelope-from edwin@mavetju.org) Received: by mail5out.barnet.com.au (Postfix, from userid 1001) id 673F12218A60; Mon, 10 Dec 2007 09:30:48 +1100 (EST) X-Viruscan-Id: <475C6C960000FB9A26B50F@BarNet> Received: from mail5auth.barnet.com.au (mail5.barnet.com.au [202.83.178.78]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail5auth.barnet.com.au", Issuer "*.barnet.com.au" (verified OK)) by mail5.barnet.com.au (Postfix) with ESMTP id 4CE8D21B174D for ; Mon, 10 Dec 2007 09:30:43 +1100 (EST) Received: from k7.mavetju (k7.mavetju.org [10.251.1.18]) by mail5auth.barnet.com.au (Postfix) with ESMTP id EF3DE2218A25 for ; Mon, 10 Dec 2007 09:30:42 +1100 (EST) Received: by k7.mavetju (Postfix, from userid 1001) id 3392012C; Mon, 10 Dec 2007 09:30:42 +1100 (EST) Date: Mon, 10 Dec 2007 09:30:42 +1100 From: Edwin Groothuis To: arch@freebsd.org Message-ID: <20071209223042.GA40965@k7.mavetju> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Cc: Subject: bin/118292: Add support to remove all msg/shm/sem ids with ipcrm X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Dec 2007 22:45:52 -0000 Hello, A friend of me has submitted this PR and I promised him that I would see if I could get it implemented. I couldn't find anybody directly responsible for the ips/iprcm tools, so I throw it in here for discussion. >Description: I've observed that linux apps running under the linuxulator have a habit of leaving behind shared memory segments which are unused, but which eventually cause the system to run out of free segments and these apps will stop working. ipcrm(1) currently only allows removal of unused message queues, shared memory segments and semaphores on an individual basis, or those having a matching (non-zero) key. However it would often be convenient to just do a complete cleanup of everything, usually as root. The attached patch allows removal of all message queues, shared memory segments or semaphores by specifying an id of -1 (ala kill(2)). The code to lookup ids was taken from ipcs. The patch is available in http://www.freebsd.org/cgi/query-pr.cgi?pr=118292 I will do it in two parts (according to the wishes of my mentor): First style(9)ify ipcrm.c, then the patch. If anybody has a good observation on this change, please speak up now. Edwin -- Edwin Groothuis edwin@freebsd.org http://www.mavetju.org From owner-freebsd-arch@FreeBSD.ORG Mon Dec 10 14:37:01 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 894ED16A41A; Mon, 10 Dec 2007 14:37:01 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from mrout1.yahoo.com (mrout1.yahoo.com [216.145.54.171]) by mx1.freebsd.org (Postfix) with ESMTP id 7769513C442; Mon, 10 Dec 2007 14:37:01 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from minion.local.neville-neil.com (proxy7.corp.yahoo.com [216.145.48.98]) by mrout1.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id lBAEQYTo030328; Mon, 10 Dec 2007 06:26:34 -0800 (PST) Date: Mon, 10 Dec 2007 09:26:34 -0500 Message-ID: From: gnn@freebsd.org To: Edwin Groothuis In-Reply-To: <20071209223042.GA40965@k7.mavetju> References: <20071209223042.GA40965@k7.mavetju> User-Agent: Wanderlust/2.15.5 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (=?ISO-8859-4?Q?Shij=F2?=) APEL/10.7 Emacs/22.1.50 (i386-apple-darwin8.10.1) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Cc: arch@freebsd.org Subject: Re: bin/118292: Add support to remove all msg/shm/sem ids with ipcrm X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Dec 2007 14:37:01 -0000 At Mon, 10 Dec 2007 09:30:42 +1100, Edwin Groothuis wrote: > > Hello, > > A friend of me has submitted this PR and I promised him that I would > see if I could get it implemented. I couldn't find anybody directly > responsible for the ips/iprcm tools, so I throw it in here for > discussion. > > >Description: > > I've observed that linux apps running under the linuxulator > have a habit of leaving behind shared memory segments which are > unused, but which eventually cause the system to run out of > free segments and these apps will stop working. ipcrm(1) currently > only allows removal of unused message queues, shared memory > segments and semaphores on an individual basis, or those having > a matching (non-zero) key. However it would often be convenient > to just do a complete cleanup of everything, usually as root. > > The attached patch allows removal of all message queues, shared > memory segments or semaphores by specifying an id of -1 (ala > kill(2)). The code to lookup ids was taken from ipcs. > > The patch is available in http://www.freebsd.org/cgi/query-pr.cgi?pr=118292 > > I will do it in two parts (according to the wishes of my mentor): > First style(9)ify ipcrm.c, then the patch. > > If anybody has a good observation on this change, please speak up now. > I have not read the patch in detail but I like the idea, we should be able to easily clean such things up. Best, George From owner-freebsd-arch@FreeBSD.ORG Mon Dec 10 19:36:36 2007 Return-Path: Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C4DEB16A419; Mon, 10 Dec 2007 19:36:36 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from VARK.MIT.EDU (VARK.MIT.EDU [18.95.3.179]) by mx1.freebsd.org (Postfix) with ESMTP id 8580813C4D1; Mon, 10 Dec 2007 19:36:36 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from VARK.MIT.EDU (localhost [127.0.0.1]) by VARK.MIT.EDU (8.14.2/8.14.1) with ESMTP id lBAJPXGj015795; Mon, 10 Dec 2007 14:25:33 -0500 (EST) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by VARK.MIT.EDU (8.14.2/8.14.1/Submit) id lBAJPXVG015794; Mon, 10 Dec 2007 14:25:33 -0500 (EST) (envelope-from das@FreeBSD.ORG) Date: Mon, 10 Dec 2007 14:25:33 -0500 From: David Schultz To: Robert Watson Message-ID: <20071210192533.GA15728@VARK.MIT.EDU> Mail-Followup-To: Robert Watson , Brooks Davis , freebsd-arch@FreeBSD.ORG References: <20071128211022.GA74762@lor.one-eyed-alien.net> <20071128213947.Q7555@fledge.watson.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071128213947.Q7555@fledge.watson.org> Cc: Brooks Davis , freebsd-arch@FreeBSD.ORG Subject: Re: RFC: libkse*.a in 7.0 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Dec 2007 19:36:36 -0000 On Wed, Nov 28, 2007, Robert Watson wrote: > It's worth noting that some other mainstream operating systems work hard to > disallow static linking for precisely this sort of reason -- when I last > checked, Mac OS X had only one statically linked binary, init, and it may > well be that launchd is dynamically linked. This is part of a very > explicit policy that the defined ABI for applications is *not* the system > call layer, but rather, the library interfaces, which gives greater > flexibility to modify the system call interface as needed. Solaris has done this for well over a decade, and as a consequence, they have a stable ABI without adding a bunch of compat garbage to the kernel. It's mostly done via symbol versioning in libc and other libraries. Note that it's possible to *provide* static libraries without *supporting* them. People can link their apps statically if they so desire, with the understanding that they will need to recompile when they upgrade to the next major release of FreeBSD. Apologies for replying to an old thread. I'm catching up on my email! From owner-freebsd-arch@FreeBSD.ORG Mon Dec 10 19:39:09 2007 Return-Path: Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 884BA16A417; Mon, 10 Dec 2007 19:39:09 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 7236613C457; Mon, 10 Dec 2007 19:39:09 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id AD4FC1A4D7C; Mon, 10 Dec 2007 11:38:29 -0800 (PST) Date: Mon, 10 Dec 2007 11:38:29 -0800 From: Alfred Perlstein To: Robert Watson , Brooks Davis , freebsd-arch@FreeBSD.ORG Message-ID: <20071210193829.GI61429@elvis.mu.org> References: <20071128211022.GA74762@lor.one-eyed-alien.net> <20071128213947.Q7555@fledge.watson.org> <20071210192533.GA15728@VARK.MIT.EDU> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071210192533.GA15728@VARK.MIT.EDU> User-Agent: Mutt/1.4.2.3i Cc: Subject: Re: RFC: libkse*.a in 7.0 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Dec 2007 19:39:09 -0000 * David Schultz [071210 11:36] wrote: > On Wed, Nov 28, 2007, Robert Watson wrote: > > It's worth noting that some other mainstream operating systems work hard to > > disallow static linking for precisely this sort of reason -- when I last > > checked, Mac OS X had only one statically linked binary, init, and it may > > well be that launchd is dynamically linked. This is part of a very > > explicit policy that the defined ABI for applications is *not* the system > > call layer, but rather, the library interfaces, which gives greater > > flexibility to modify the system call interface as needed. > > Solaris has done this for well over a decade, and as a > consequence, they have a stable ABI without adding a bunch of > compat garbage to the kernel. It's mostly done via symbol > versioning in libc and other libraries. Yup. > > Note that it's possible to *provide* static libraries without > *supporting* them. People can link their apps statically if they > so desire, with the understanding that they will need to recompile > when they upgrade to the next major release of FreeBSD. This is a very good point. It's very typical for vendors to statically link things though because of cluelessness over shared libs, we should discourage, _without overly penalizing_, them if they attempt to do so. > Apologies for replying to an old thread. I'm catching up on my email! Well, your comments are still insightful... :) -- - Alfred Perlstein From owner-freebsd-arch@FreeBSD.ORG Mon Dec 10 21:09:07 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DAD4616A41B; Mon, 10 Dec 2007 21:09:07 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from redbull.bpaserver.net (redbullneu.bpaserver.net [213.198.78.217]) by mx1.freebsd.org (Postfix) with ESMTP id 794C013C43E; Mon, 10 Dec 2007 21:09:06 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from outgoing.leidinger.net (p54A57E1D.dip.t-dialin.net [84.165.126.29]) by redbull.bpaserver.net (Postfix) with ESMTP id 947CE2E31A; Mon, 10 Dec 2007 22:08:57 +0100 (CET) Received: from deskjail (deskjail.Leidinger.net [192.168.1.109]) by outgoing.leidinger.net (Postfix) with ESMTP id EBB6378401; Mon, 10 Dec 2007 22:08:54 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1197320935; bh=3JqLcCmfpkq2aiSS2MrzvnlerDVuAmGMG bTcEfBXo10=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To: References:X-Mailer:Mime-Version:Content-Type: Content-Transfer-Encoding; b=vzvtfovNPy/54WXrk/T+UTRTcSntQTsoHqDKr 9FDBlMRCIa72dJ43tF0fZcQJrlQUKAoiOqeFqaVZly976QFekuWdK4cSCxUD8y+xOi/ OIC/VDmXX7kf5TCv4aBbqm2zbm5tl14vxdnnblnO0EPyq+OBlY/WgOzJ6KX4TCmu+wT MqmW7WGPlLcNl/00VpHTBlMHIRGaU4//Br9syX0CVzfg7Gm3AFgXpfdiF9TKKuNNX/P p8hIpWHn2tCqC1Cv1WxX9xl7ByCq42XYMCQdI7GUZPZZ0VTxtmPMa79dRzhW2WtGgBF Iuvd6beMIKFQqTSvK4reDHweOMZK7EsS7FvVw== Date: Mon, 10 Dec 2007 22:08:54 +0100 From: Alexander Leidinger To: David Schultz Message-ID: <20071210220854.07e02f1f@deskjail> In-Reply-To: <20071210192533.GA15728@VARK.MIT.EDU> References: <20071128211022.GA74762@lor.one-eyed-alien.net> <20071128213947.Q7555@fledge.watson.org> <20071210192533.GA15728@VARK.MIT.EDU> X-Mailer: Claws Mail 3.0.1 (GTK+ 2.10.14; i686-portbld-freebsd7.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BPAnet-MailScanner-Information: Please contact the ISP for more information X-BPAnet-MailScanner: Found to be clean X-BPAnet-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=-14.9, required 6, BAYES_00 -15.00, DKIM_SIGNED 0.00, DKIM_VERIFIED -0.00, RDNS_DYNAMIC 0.10) X-BPAnet-MailScanner-From: alexander@leidinger.net X-Spam-Status: No Cc: Brooks Davis , Robert Watson , freebsd-arch@FreeBSD.ORG Subject: Re: RFC: libkse*.a in 7.0 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Dec 2007 21:09:07 -0000 Quoting David Schultz (Mon, 10 Dec 2007 14:25:33 -0500): > On Wed, Nov 28, 2007, Robert Watson wrote: > > It's worth noting that some other mainstream operating systems work hard to > > disallow static linking for precisely this sort of reason -- when I last > > checked, Mac OS X had only one statically linked binary, init, and it may > > well be that launchd is dynamically linked. This is part of a very > > explicit policy that the defined ABI for applications is *not* the system > > call layer, but rather, the library interfaces, which gives greater > > flexibility to modify the system call interface as needed. > > Solaris has done this for well over a decade, and as a > consequence, they have a stable ABI without adding a bunch of > compat garbage to the kernel. It's mostly done via symbol > versioning in libc and other libraries. Running Solaris 8/9 programs is not supported by SUN on Solaris 10. It works in some cases, but it doesn't work in some other cases. And now some people work on using BrandZ (if you know nothing about it, it's sort of like our technology used to do our linuxulator or freebsd32 on amd64; that's not accurate, but is good enough for the point I want to make) to provide a Solaris 10 container (think about it as a jail on steroides) with an Solaris X (X < 10) image, so that people can install a Solaris 10 host and run Solaris X in it (like our linuxulator in a jail, but not as flexible as our linuxulator, theirs can not run on the main system like ours can). So I would not say it is that fafourable. AFAIK there where major kernel changes and they didn't want to do some compat shims. I think we did a much better job so far in providing backward compatibility in the kernel, and when we lose the ability to run e.g. a complete FreeBSD X in a jail of a FreeBSD X+1 system, we would lose a lot of users (portmgr included, as they run the 5.x and 6.x builds on a -current system). So as long as we can run an old system in a jail, do whatever you want regarding static/dynamic libs, but if someone is on the way to destroy our compatibility in the kernel... boy, think not only once, twice or trice, think more about this. Bye, Alexander. -- There is a multi-legged creature crawling on your shoulder. -- Spock, "A Taste of Armageddon", stardate 3193.9 http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-arch@FreeBSD.ORG Mon Dec 10 22:39:09 2007 Return-Path: Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B130F16A41B; Mon, 10 Dec 2007 22:39:09 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from VARK.MIT.EDU (VARK.MIT.EDU [18.95.3.179]) by mx1.freebsd.org (Postfix) with ESMTP id 6A64113C474; Mon, 10 Dec 2007 22:39:09 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from VARK.MIT.EDU (localhost [127.0.0.1]) by VARK.MIT.EDU (8.14.2/8.14.1) with ESMTP id lBAMccAd016879; Mon, 10 Dec 2007 17:38:38 -0500 (EST) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by VARK.MIT.EDU (8.14.2/8.14.1/Submit) id lBAMccvW016878; Mon, 10 Dec 2007 17:38:38 -0500 (EST) (envelope-from das@FreeBSD.ORG) Date: Mon, 10 Dec 2007 17:38:38 -0500 From: David Schultz To: Alexander Leidinger Message-ID: <20071210223838.GB16598@VARK.MIT.EDU> Mail-Followup-To: Alexander Leidinger , Brooks Davis , Robert Watson , freebsd-arch@FreeBSD.ORG References: <20071128211022.GA74762@lor.one-eyed-alien.net> <20071128213947.Q7555@fledge.watson.org> <20071210192533.GA15728@VARK.MIT.EDU> <20071210220854.07e02f1f@deskjail> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071210220854.07e02f1f@deskjail> Cc: Brooks Davis , Robert Watson , freebsd-arch@FreeBSD.ORG Subject: Re: RFC: libkse*.a in 7.0 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Dec 2007 22:39:09 -0000 On Mon, Dec 10, 2007, Alexander Leidinger wrote: > Running Solaris 8/9 programs is not supported by SUN on Solaris 10. It > works in some cases, but it doesn't work in some other cases. That's not true. It is supported. See: http://www.sun.com/software/solaris/programs/abi/ http://www.sun.com/software/solaris/programs/abi/sag.xml In theory, a SunOS 5.0 app will still work in SunOS 5.10. Of course, in practice, perfect binary compatibility is too much to ask for. It's possible to write programs that notice that different releases aren't bug-for-bug compatible, and if you statically link your binary or use unsupported ABIs, you break their guarantee. But that's orthogonal to my original point. > And now > some people work on using BrandZ (if you know nothing about it, it's > sort of like our technology used to do our linuxulator or freebsd32 on > amd64; that's not accurate, but is good enough for the point I want to > make) to provide a Solaris 10 container (think about it as a jail on > steroides) with an Solaris X (X < 10) image, so that people can install > a Solaris 10 host and run Solaris X in it (like our linuxulator in a > jail, but not as flexible as our linuxulator, theirs can not run on the > main system like ours can). Right, having the linuxulator in the kernel is all but unavoidable. But for old FreeBSD apps running on newer versions of FreeBSD, we can do better, and a library-based approach is easier to maintain and less prone to security problems. From owner-freebsd-arch@FreeBSD.ORG Tue Dec 11 07:02:37 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3A79616A417; Tue, 11 Dec 2007 07:02:37 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from redbull.bpaserver.net (redbullneu.bpaserver.net [213.198.78.217]) by mx1.freebsd.org (Postfix) with ESMTP id BB49D13C4F3; Tue, 11 Dec 2007 07:02:36 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from outgoing.leidinger.net (p54A57E1D.dip.t-dialin.net [84.165.126.29]) by redbull.bpaserver.net (Postfix) with ESMTP id 55E3F2E0CB; Tue, 11 Dec 2007 08:02:21 +0100 (CET) Received: from webmail.leidinger.net (webmail.Leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id F0B9B7AD5F; Tue, 11 Dec 2007 08:02:17 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1197356538; bh=buH54ErTHdhHS2MtHQvD7QbAXv4CL9SSR /4Q5bgdfwY=; h=Message-ID:X-Priority:Date:From:To:Cc:Subject: References:In-Reply-To:MIME-Version:Content-Type: Content-Disposition:Content-Transfer-Encoding:User-Agent; b=ZcJz7T 8m91CcPJXibReKQYOQH/Y8ANfw2Wv4FukmBYCW91hGv2FYKi0wagp6acekOlu/hDiM+ pLiTMSRNvP7UILCM5sr9BWfTtfmf1Nzxx4QV64hQ61VDZK4FUjLLiHv2BvpRlRnoLSR QT23pxG5pKlGc8/IK2KMKnRTeSMY4MwrBxRiBDk0RH/FCWKaRepPyUKSpFtUZW5y2/D jceaLfkVs9icJkcF2qABBncr626zY+2sQxjggeK1jDm9lryzkIQSPAM26Lg7YdfAkal EPa7AaxFayFnz3T/0mp7Hv6Hz+sBQ1EvaCkL1nc8WmhwTR/Vfxvwm3vbO/B/s/SI4A+ Q== Received: (from www@localhost) by webmail.leidinger.net (8.14.1/8.13.8/Submit) id lBB72H2c053923; Tue, 11 Dec 2007 08:02:17 +0100 (CET) (envelope-from Alexander@Leidinger.net) Received: from pslux.cec.eu.int (pslux.cec.eu.int [158.169.9.14]) by webmail.leidinger.net (Horde MIME library) with HTTP; Tue, 11 Dec 2007 08:02:16 +0100 Message-ID: <20071211080216.pb3b95teoggko00o@webmail.leidinger.net> X-Priority: 3 (Normal) Date: Tue, 11 Dec 2007 08:02:16 +0100 From: Alexander Leidinger To: David Schultz References: <20071128211022.GA74762@lor.one-eyed-alien.net> <20071128213947.Q7555@fledge.watson.org> <20071210192533.GA15728@VARK.MIT.EDU> <20071210220854.07e02f1f@deskjail> <20071210223838.GB16598@VARK.MIT.EDU> In-Reply-To: <20071210223838.GB16598@VARK.MIT.EDU> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: Internet Messaging Program (IMP) H3 (4.1.4) / FreeBSD-7.0 X-BPAnet-MailScanner-Information: Please contact the ISP for more information X-BPAnet-MailScanner: Found to be clean X-BPAnet-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=-14.9, required 6, BAYES_00 -15.00, DKIM_SIGNED 0.00, DKIM_VERIFIED -0.00, RDNS_DYNAMIC 0.10) X-BPAnet-MailScanner-From: alexander@leidinger.net X-Spam-Status: No Cc: Brooks Davis , Robert Watson , freebsd-arch@FreeBSD.ORG Subject: Re: RFC: libkse*.a in 7.0 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Dec 2007 07:02:37 -0000 Quoting David Schultz (from Mon, 10 Dec 2007 =20 17:38:38 -0500): > On Mon, Dec 10, 2007, Alexander Leidinger wrote: >> Running Solaris 8/9 programs is not supported by SUN on Solaris 10. It >> works in some cases, but it doesn't work in some other cases. > > That's not true. It is supported. See: > http://www.sun.com/software/solaris/programs/abi/ > http://www.sun.com/software/solaris/programs/abi/sag.xml > > In theory, a SunOS 5.0 app will still work in SunOS 5.10 The important part is the "theory" word... I work in the office of SUN in Luxembourg, and one of our ideas for a =20 client was to run a Solaris 8/9 in a zone of a Solaris 10 as a =20 replacement for machines with Solaris 8/9. As we have a service =20 contract with our client, we have to take some business constraints =20 into account. And one of those business constraints is that Solaris =20 8/9 in a zone of Solaris 10 is not supported, as the kernel interface =20 (syscalls) changed in an incompatible way. > Of course, in practice, perfect binary compatibility is too much > to ask for. It's possible to write programs that notice that So far we handled this good in FreeBSD. > different releases aren't bug-for-bug compatible, and if you > statically link your binary or use unsupported ABIs, you break > their guarantee. But that's orthogonal to my original point. > >> And now >> some people work on using BrandZ (if you know nothing about it, it's >> sort of like our technology used to do our linuxulator or freebsd32 on >> amd64; that's not accurate, but is good enough for the point I want to >> make) to provide a Solaris 10 container (think about it as a jail on >> steroides) with an Solaris X (X < 10) image, so that people can install >> a Solaris 10 host and run Solaris X in it (like our linuxulator in a >> jail, but not as flexible as our linuxulator, theirs can not run on the >> main system like ours can). > > Right, having the linuxulator in the kernel is all but > unavoidable. But for old FreeBSD apps running on newer versions of > FreeBSD, we can do better, and a library-based approach is easier > to maintain and less prone to security problems. It's not running only old apps on a new system, it's running the =20 userland of an old system in a jail of a new system. That's what I'm =20 concerned about (and works currently as we took care about maintaining =20 compatibility in the kernel) and that's what you can not handle with a =20 library-based approach. Bye, Alexander. --=20 You get what you pay for. =09=09-- Gabriel Biel http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID =3D B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID =3D 72077137 From owner-freebsd-arch@FreeBSD.ORG Tue Dec 11 11:22:01 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D879F16A418 for ; Tue, 11 Dec 2007 11:22:01 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197]) by mx1.freebsd.org (Postfix) with ESMTP id 8CE3A13C455 for ; Tue, 11 Dec 2007 11:22:01 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=deviant.kiev.zoral.com.ua) by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1J23BT-000MNu-6G for freebsd-arch@freebsd.org; Tue, 11 Dec 2007 13:22:00 +0200 Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.1/8.14.1) with ESMTP id lBBBLp1R017585; Tue, 11 Dec 2007 13:21:51 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id lBBBLoQO017584; Tue, 11 Dec 2007 13:21:50 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 11 Dec 2007 13:21:50 +0200 From: Kostik Belousov To: Alexander Leidinger Message-ID: <20071211112150.GA1214@deviant.kiev.zoral.com.ua> References: <20071128211022.GA74762@lor.one-eyed-alien.net> <20071128213947.Q7555@fledge.watson.org> <20071210192533.GA15728@VARK.MIT.EDU> <20071210220854.07e02f1f@deskjail> <20071210223838.GB16598@VARK.MIT.EDU> <20071211080216.pb3b95teoggko00o@webmail.leidinger.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="azLHFNyN32YCQGCU" Content-Disposition: inline In-Reply-To: <20071211080216.pb3b95teoggko00o@webmail.leidinger.net> User-Agent: Mutt/1.4.2.3i X-Scanner-Signature: 69470ef176344869aa12c1af2e75e870 X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 1866 [Dec 10 2007] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: freebsd-arch@freebsd.org Subject: Re: RFC: libkse*.a in 7.0 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Dec 2007 11:22:01 -0000 --azLHFNyN32YCQGCU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Dec 11, 2007 at 08:02:16AM +0100, Alexander Leidinger wrote: > I work in the office of SUN in Luxembourg, and one of our ideas for a =20 > client was to run a Solaris 8/9 in a zone of a Solaris 10 as a =20 > replacement for machines with Solaris 8/9. As we have a service =20 > contract with our client, we have to take some business constraints =20 > into account. And one of those business constraints is that Solaris =20 > 8/9 in a zone of Solaris 10 is not supported, as the kernel interface =20 > (syscalls) changed in an incompatible way. Look at the project Etude. --azLHFNyN32YCQGCU Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHXnLOC3+MBN1Mb4gRAvIDAJ93rjhx0ixk+uQ3TKvoQLXzhD9EiACeMa4q MGHRrLsbQtAkvRPbWHW3HtE= =A/ut -----END PGP SIGNATURE----- --azLHFNyN32YCQGCU-- From owner-freebsd-arch@FreeBSD.ORG Tue Dec 11 13:52:32 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 66D1E16A419 for ; Tue, 11 Dec 2007 13:52:32 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.netplex.net (mail.netplex.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id 3A95A13C4E3 for ; Tue, 11 Dec 2007 13:52:31 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.netplex.net (8.14.2/8.14.2/NETPLEX) with ESMTP id lBBDqU1X029326; Tue, 11 Dec 2007 08:52:30 -0500 (EST) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-4.0 (mail.netplex.net [204.213.176.10]); Tue, 11 Dec 2007 08:52:30 -0500 (EST) Date: Tue, 11 Dec 2007 08:52:30 -0500 (EST) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Alexander Leidinger In-Reply-To: <20071211112150.GA1214@deviant.kiev.zoral.com.ua> Message-ID: References: <20071128211022.GA74762@lor.one-eyed-alien.net> <20071128213947.Q7555@fledge.watson.org> <20071210192533.GA15728@VARK.MIT.EDU> <20071210220854.07e02f1f@deskjail> <20071210223838.GB16598@VARK.MIT.EDU> <20071211080216.pb3b95teoggko00o@webmail.leidinger.net> <20071211112150.GA1214@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Kostik Belousov , freebsd-arch@freebsd.org Subject: Re: RFC: libkse*.a in 7.0 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Dec 2007 13:52:32 -0000 On Tue, 11 Dec 2007, Kostik Belousov wrote: > On Tue, Dec 11, 2007 at 08:02:16AM +0100, Alexander Leidinger wrote: >> I work in the office of SUN in Luxembourg, and one of our ideas for a >> client was to run a Solaris 8/9 in a zone of a Solaris 10 as a >> replacement for machines with Solaris 8/9. As we have a service >> contract with our client, we have to take some business constraints >> into account. And one of those business constraints is that Solaris >> 8/9 in a zone of Solaris 10 is not supported, as the kernel interface >> (syscalls) changed in an incompatible way. > > Look at the project Etude. The syscalls are only exposed in an ABI compliant way through the libraries, which is what we should do also. But I think if you were to plop the Solaris 10 libraries (at least the symbol-versioned ones) over the Solaris 8/9 image, it might have a chance of working for you. Hmm, unless the Sun private symbols in Solaris 8/9 were not versioned and kept as compatible versions in Solaris 10. It might be interesting to try it and see what happens ;-) -- DE From owner-freebsd-arch@FreeBSD.ORG Tue Dec 11 17:24:07 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A38A516A41B; Tue, 11 Dec 2007 17:24:07 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from redbull.bpaserver.net (redbullneu.bpaserver.net [213.198.78.217]) by mx1.freebsd.org (Postfix) with ESMTP id 129C813C469; Tue, 11 Dec 2007 17:24:06 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from outgoing.leidinger.net (p54A5600A.dip.t-dialin.net [84.165.96.10]) by redbull.bpaserver.net (Postfix) with ESMTP id 134782E0D7; Tue, 11 Dec 2007 18:23:59 +0100 (CET) Received: from deskjail (deskjail.Leidinger.net [192.168.1.109]) by outgoing.leidinger.net (Postfix) with ESMTP id 6F7D267C94; Tue, 11 Dec 2007 18:23:56 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1197393836; bh=ZNqaFQEo2oFioKGM3ZMmGGlSlfLTevUnv 7vTXhXRiUE=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To: References:X-Mailer:Mime-Version:Content-Type: Content-Transfer-Encoding; b=UiXTMociNS7jQtqk3guwF6+Ana6zb/HnP8DS0 dJgMWyY9tQc3kS5uEkgwGNLqZEwWQDCXAboNscZ41nPj/nUhbRDq3b1TDFK0L3XR1tX +btgn/Vhe4wmAiJXU3D+hHaws2zMWabpO5IEDKMUEyfQBITXsLoX6ARRtEDJibuaTha EDcmiM21dut35Zri9R0z7XwsLpg+XbQ+WevBouObcZMjvd7I+6rRHX7zd0lf5cwdBsz fFN7wQXALoFQByMmuxSFGo3x9HoOBXmU1Dz7D3xc1XP8alYEz62gSVWDNUwyBcJ4Z3h QLOXTWovCspOgjeS7zrbyzdHS7aIfr3EUycQg== Date: Tue, 11 Dec 2007 18:23:55 +0100 From: Alexander Leidinger To: Daniel Eischen Message-ID: <20071211182355.446668fb@deskjail> In-Reply-To: References: <20071128211022.GA74762@lor.one-eyed-alien.net> <20071128213947.Q7555@fledge.watson.org> <20071210192533.GA15728@VARK.MIT.EDU> <20071210220854.07e02f1f@deskjail> <20071210223838.GB16598@VARK.MIT.EDU> <20071211080216.pb3b95teoggko00o@webmail.leidinger.net> <20071211112150.GA1214@deviant.kiev.zoral.com.ua> X-Mailer: Claws Mail 3.0.1 (GTK+ 2.10.14; i686-portbld-freebsd7.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BPAnet-MailScanner-Information: Please contact the ISP for more information X-BPAnet-MailScanner: Found to be clean X-BPAnet-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=-15.4, required 6, autolearn=not spam, BAYES_00 -15.00, DKIM_SIGNED 0.00, DKIM_VERIFIED -0.00, RDNS_DYNAMIC 0.10, SMILEY -0.50) X-BPAnet-MailScanner-From: alexander@leidinger.net X-Spam-Status: No Cc: Kostik Belousov , freebsd-arch@freebsd.org Subject: Re: RFC: libkse*.a in 7.0 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Dec 2007 17:24:07 -0000 Quoting Daniel Eischen (Tue, 11 Dec 2007 08:52:30 -0500 (EST)): > On Tue, 11 Dec 2007, Kostik Belousov wrote: > > > On Tue, Dec 11, 2007 at 08:02:16AM +0100, Alexander Leidinger wrote: > >> I work in the office of SUN in Luxembourg, and one of our ideas for a > >> client was to run a Solaris 8/9 in a zone of a Solaris 10 as a > >> replacement for machines with Solaris 8/9. As we have a service > >> contract with our client, we have to take some business constraints > >> into account. And one of those business constraints is that Solaris > >> 8/9 in a zone of Solaris 10 is not supported, as the kernel interface > >> (syscalls) changed in an incompatible way. > > > > Look at the project Etude. Until it is in an official Solaris release, our client can not be convinced to consider it. > The syscalls are only exposed in an ABI compliant way through the > libraries, which is what we should do also. But I think if you I don't think so. So far we had no problems with having the compatibility in the kernel, and I haven't read on current, arch or hackers that this is a major roadblock for something. And as long as this is not the case, I think we should support (and I mean _really_ support it, and not a half-backed yes and not trying to find a compatible solution if it is not directly obvious) the kernel compatibility. > were to plop the Solaris 10 libraries (at least the symbol-versioned > ones) over the Solaris 8/9 image, it might have a chance of working > for you. Hmm, unless the Sun private symbols in Solaris 8/9 were > not versioned and kept as compatible versions in Solaris 10. > It might be interesting to try it and see what happens ;-) And then you lose support for Legato Networker, Oracle and other commercial software, as you don't run in a certified environment (which is not an argument for FreeBSD, but for our Solaris installation at work). Our client is shy. We all know this kind of management decisions: some systems which can not be upgraded stay at this level "forever", even if there's no support at all in the end anymore (because it just works, and with the other solution there may be a problem because it is not officially supported). Now, what if we abandon our official support we had in this regard? Some manager asks if it is supported, the technician doesn't want to lie, says no, and the manager says forget about FreeBSD, in Linux we don't have this compatibility too, but we get commercial support from companies which outweights the remaining features. Do we want that? I don't think having compatibility in the libs is a bad idea. I like this idea, as it allows to run old programs without the need to install a compat package from ports. I just want to make sure that we all know that we have a very valuable feature by ensuring backwards compatibility in the kernel, and that we should not throw it away. Having backward compatibility in the libs is an _additional_ feature I look forward to. Bye, Alexander. -- http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-arch@FreeBSD.ORG Tue Dec 11 23:33:36 2007 Return-Path: Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7355616A41B for ; Tue, 11 Dec 2007 23:33:36 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from VARK.MIT.EDU (VARK.MIT.EDU [18.95.3.179]) by mx1.freebsd.org (Postfix) with ESMTP id 293FC13C468 for ; Tue, 11 Dec 2007 23:33:36 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from VARK.MIT.EDU (localhost [127.0.0.1]) by VARK.MIT.EDU (8.14.2/8.14.1) with ESMTP id lBBNX1uk022772; Tue, 11 Dec 2007 18:33:01 -0500 (EST) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by VARK.MIT.EDU (8.14.2/8.14.1/Submit) id lBBNX19h022771; Tue, 11 Dec 2007 18:33:01 -0500 (EST) (envelope-from das@FreeBSD.ORG) Date: Tue, 11 Dec 2007 18:33:01 -0500 From: David Schultz To: Daniel Eischen Message-ID: <20071211233301.GA22692@VARK.MIT.EDU> Mail-Followup-To: Daniel Eischen , Alexander Leidinger , Kostik Belousov , freebsd-arch@FreeBSD.ORG References: <20071128211022.GA74762@lor.one-eyed-alien.net> <20071128213947.Q7555@fledge.watson.org> <20071210192533.GA15728@VARK.MIT.EDU> <20071210220854.07e02f1f@deskjail> <20071210223838.GB16598@VARK.MIT.EDU> <20071211080216.pb3b95teoggko00o@webmail.leidinger.net> <20071211112150.GA1214@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Cc: Kostik Belousov , Alexander Leidinger , freebsd-arch@FreeBSD.ORG Subject: Re: RFC: libkse*.a in 7.0 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Dec 2007 23:33:36 -0000 On Tue, Dec 11, 2007, Daniel Eischen wrote: > On Tue, 11 Dec 2007, Kostik Belousov wrote: > > >On Tue, Dec 11, 2007 at 08:02:16AM +0100, Alexander Leidinger wrote: > >>I work in the office of SUN in Luxembourg, and one of our ideas for a > >>client was to run a Solaris 8/9 in a zone of a Solaris 10 as a > >>replacement for machines with Solaris 8/9. As we have a service > >>contract with our client, we have to take some business constraints > >>into account. And one of those business constraints is that Solaris > >>8/9 in a zone of Solaris 10 is not supported, as the kernel interface > >>(syscalls) changed in an incompatible way. > > > >Look at the project Etude. > > The syscalls are only exposed in an ABI compliant way through the > libraries, which is what we should do also. But I think if you > were to plop the Solaris 10 libraries (at least the symbol-versioned > ones) over the Solaris 8/9 image, it might have a chance of working > for you. Hmm, unless the Sun private symbols in Solaris 8/9 were > not versioned and kept as compatible versions in Solaris 10. > It might be interesting to try it and see what happens ;-) I'm not so sure about that. The truth is that they *do* make plenty of incompatible changes (referred to internally as "flag days"), but they ensure that apps linked against the new kernel and the new libc don't notice. They guarantee the stability of the ABI between the app and libc, not the stability of the ABI between libc and the kernel. There's an orthogonal issue of "branded zones," which are a new feature in Solaris 10; they're essentially jails with a different syscall vector, similar to what is done in the linuxulator. The original goal was to support Linux binaries and libraries running in Solaris, but it's also possible to have a Solaris 8 branded zone running in Solaris 10. In the latter case, you actually *are* using all the old Solaris 8 libraries in the jail, and there are a few more caveats in terms of what is supported, as I understand. This may be what Alex was referring to. From owner-freebsd-arch@FreeBSD.ORG Wed Dec 12 02:50:52 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 283E516A419 for ; Wed, 12 Dec 2007 02:50:52 +0000 (UTC) (envelope-from qpadla@gmail.com) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.186]) by mx1.freebsd.org (Postfix) with ESMTP id AB28A13C43E for ; Wed, 12 Dec 2007 02:50:51 +0000 (UTC) (envelope-from qpadla@gmail.com) Received: by nf-out-0910.google.com with SMTP id b2so56449nfb.33 for ; Tue, 11 Dec 2007 18:50:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:reply-to:to:subject:date:user-agent:cc:references:in-reply-to:mime-version:content-type:content-transfer-encoding:message-id; bh=/1B4lMzY4MAY/Ze0AOBrbmwA6fazCdAF87UGlCu9r3U=; b=TMOLk/k9loVt4UCQncoIEokoXKKJeWSgAyx76M8xAkyXa7jXycG5fTVsZGyNIj6wi+k4UV0L7Hvmd0xyJaCV+S83G1YyGXfF+F8NTq3CJqvxAEE4dOXW73b8okJLyZISJfUSVxjVlD1twxgbvPRmsx9NM8vxjkZ6eOzuD8AboFo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:reply-to:to:subject:date:user-agent:cc:references:in-reply-to:mime-version:content-type:content-transfer-encoding:message-id; b=WWSZ/kkvPRAJ1J6GR4Hk5E4qzHWhmfAH8DKdEILBztCXw4wrKkpNX4nkWndOJB6SQU8VuwIZhmWMHQYDa099+K3t1k5NaxpewWQ6E/Du/YETZU6ECrB3HaHLLF87TUU4gsHWDXHHGXtb2zpuOaMaslrzxa0llBOQWfuolSWPqE4= Received: by 10.86.66.1 with SMTP id o1mr147257fga.23.1197427850122; Tue, 11 Dec 2007 18:50:50 -0800 (PST) Received: from 77-109-39-227.dynamic.peoplenet.ua ( [77.109.39.227]) by mx.google.com with ESMTPS id g28sm7612500fkg.2007.12.11.18.50.45 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 11 Dec 2007 18:50:49 -0800 (PST) From: Nikolay Pavlov To: freebsd-arch@freebsd.org Date: Wed, 12 Dec 2007 04:50:47 +0200 User-Agent: KMail/1.9.6 (enterprise 0.20070907.709405) References: <20071128211022.GA74762@lor.one-eyed-alien.net> <20071210223838.GB16598@VARK.MIT.EDU> <20071211080216.pb3b95teoggko00o@webmail.leidinger.net> In-Reply-To: <20071211080216.pb3b95teoggko00o@webmail.leidinger.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1393227.jmfs6DThCv"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200712120450.52116.qpadla@gmail.com> Cc: Alexander Leidinger , David Schultz , Brooks Davis , Robert Watson Subject: Re: RFC: libkse*.a in 7.0 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: qpadla@gmail.com List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Dec 2007 02:50:52 -0000 --nextPart1393227.jmfs6DThCv Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Tuesday 11 December 2007 09:02:16 Alexander Leidinger wrote: > It's not running only old apps on a new system, it's running the =C2=A0 > userland of an old system in a jail of a new system. That's what I'm =C2= =A0 > concerned about (and works currently as we took care about maintaining = =C2=A0 > compatibility in the kernel) and that's what you can not handle with a = =C2=A0 > library-based approach. This is true. And this is very handy. Currently i am running 6.2=20 environment in a jail of BETA4 due to some incompatible vendor binary=20 files. Without of this support i have to go to mess of downgrade. =20 =2D-=20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 =2D Best regards, Nikolay Pavlov. <<<----------------------------------- = =20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 --nextPart1393227.jmfs6DThCv Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQBHX0yM/2R6KvEYGaIRAndMAKC7rH92UDWoRB51/+BmlZQ5/utFjQCgipTJ +3BBvcwNVD4e3ACXCpxpG0o= =OM3U -----END PGP SIGNATURE----- --nextPart1393227.jmfs6DThCv-- From owner-freebsd-arch@FreeBSD.ORG Wed Dec 12 09:12:43 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3A26D16A419; Wed, 12 Dec 2007 09:12:43 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from redbull.bpaserver.net (redbullneu.bpaserver.net [213.198.78.217]) by mx1.freebsd.org (Postfix) with ESMTP id 854DB13C4D3; Wed, 12 Dec 2007 09:12:42 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from outgoing.leidinger.net (p54A55EDA.dip.t-dialin.net [84.165.94.218]) by redbull.bpaserver.net (Postfix) with ESMTP id 473C12E2A1; Wed, 12 Dec 2007 10:12:30 +0100 (CET) Received: from webmail.leidinger.net (webmail.Leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id B38E57528A; Wed, 12 Dec 2007 10:12:27 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1197450747; bh=UASD0fYPAGPZRGp8J0hSk+xkHAUzcP6o2 mopdFguwDk=; h=Message-ID:X-Priority:Date:From:To:Cc:Subject: References:In-Reply-To:MIME-Version:Content-Type: Content-Disposition:Content-Transfer-Encoding:User-Agent; b=roHPdc lutmpyQuyRJ5VDcYHjXNC5Lp17jkBIEx59aULqSnhJYv1S6f0JQZIQqm6xKTIalVINK d+ikvJ1S9lYrH/PwKzpQwO6+T9jNvJGGd3tbJjnShZgt7KgMEpw+B5orKvf/AcPz774 vIF9Cb/PhGBXjBmPzKLi2I+gEpyVVyoPSEBtwtIvcaw0rJi+5cY+0+xzxO5sOes3yNX kRE1iHjIgFf276LueSXxdDjJjWGme7QaMwlVcCUfdmhmCAsL+rA3o0uKLKyAQFvv3Sv kVh82qLmiN4aTD0A/slZFjOTf6e9E3UxjLirI3WTB3Ye9yLLbdno61sE8EyfXCaCHOk A== Received: (from www@localhost) by webmail.leidinger.net (8.14.1/8.13.8/Submit) id lBC9CRjV024564; Wed, 12 Dec 2007 10:12:27 +0100 (CET) (envelope-from Alexander@Leidinger.net) Received: from pslux.cec.eu.int (pslux.cec.eu.int [158.169.9.14]) by webmail.leidinger.net (Horde MIME library) with HTTP; Wed, 12 Dec 2007 10:12:27 +0100 Message-ID: <20071212101227.jl3lqsypnowww044@webmail.leidinger.net> X-Priority: 3 (Normal) Date: Wed, 12 Dec 2007 10:12:27 +0100 From: Alexander Leidinger To: David Schultz References: <20071128211022.GA74762@lor.one-eyed-alien.net> <20071128213947.Q7555@fledge.watson.org> <20071210192533.GA15728@VARK.MIT.EDU> <20071210220854.07e02f1f@deskjail> <20071210223838.GB16598@VARK.MIT.EDU> <20071211080216.pb3b95teoggko00o@webmail.leidinger.net> <20071211112150.GA1214@deviant.kiev.zoral.com.ua> <20071211233301.GA22692@VARK.MIT.EDU> In-Reply-To: <20071211233301.GA22692@VARK.MIT.EDU> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: Internet Messaging Program (IMP) H3 (4.1.4) / FreeBSD-7.0 X-BPAnet-MailScanner-Information: Please contact the ISP for more information X-BPAnet-MailScanner: Found to be clean X-BPAnet-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=-15.4, required 6, autolearn=not spam, BAYES_00 -15.00, DKIM_SIGNED 0.00, DKIM_VERIFIED -0.00, RDNS_DYNAMIC 0.10, SMILEY -0.50) X-BPAnet-MailScanner-From: alexander@leidinger.net X-Spam-Status: No Cc: Daniel Eischen , Kostik Belousov , freebsd-arch@FreeBSD.ORG Subject: Re: RFC: libkse*.a in 7.0 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Dec 2007 09:12:43 -0000 Quoting David Schultz (from Tue, 11 Dec 2007 =20 18:33:01 -0500): > On Tue, Dec 11, 2007, Daniel Eischen wrote: >> On Tue, 11 Dec 2007, Kostik Belousov wrote: >> >> >On Tue, Dec 11, 2007 at 08:02:16AM +0100, Alexander Leidinger wrote: >> >>I work in the office of SUN in Luxembourg, and one of our ideas for a >> >>client was to run a Solaris 8/9 in a zone of a Solaris 10 as a >> >>replacement for machines with Solaris 8/9. As we have a service >> >>contract with our client, we have to take some business constraints >> >>into account. And one of those business constraints is that Solaris >> >>8/9 in a zone of Solaris 10 is not supported, as the kernel interface >> >>(syscalls) changed in an incompatible way. >> > >> >Look at the project Etude. >> >> The syscalls are only exposed in an ABI compliant way through the >> libraries, which is what we should do also. But I think if you >> were to plop the Solaris 10 libraries (at least the symbol-versioned >> ones) over the Solaris 8/9 image, it might have a chance of working >> for you. Hmm, unless the Sun private symbols in Solaris 8/9 were >> not versioned and kept as compatible versions in Solaris 10. >> It might be interesting to try it and see what happens ;-) > > I'm not so sure about that. The truth is that they *do* make > plenty of incompatible changes (referred to internally as "flag > days"), but they ensure that apps linked against the new kernel > and the new libc don't notice. They guarantee the stability of the > ABI between the app and libc, not the stability of the ABI between > libc and the kernel. > > There's an orthogonal issue of "branded zones," which are a new > feature in Solaris 10; they're essentially jails with a different > syscall vector, similar to what is done in the linuxulator. The > original goal was to support Linux binaries and libraries running > in Solaris, but it's also possible to have a Solaris 8 branded > zone running in Solaris 10. In the latter case, you actually *are* > using all the old Solaris 8 libraries in the jail, and there are a > few more caveats in terms of what is supported, as I > understand. This may be what Alex was referring to. Yes and no. Yes regarding using old libraries on a new system. No =20 regarding having this feature only in a jail. Currently we have this =20 feature in the entire system, without the need to tag something as =20 "old" (note: we don't brandelf linux libs, we just load them in the =20 linuxulator, and playing with the syscall vector instead of keeping =20 the current level of backward compatibility in the kernel opens up a =20 can of worms; everyone thinking this is exaggerated is free to head =20 over to the linuxulator and fix the corresponding problems we have =20 with libs there). Bye, Alexander. --=20 Look before you leap. =09=09-- Samuel Butler http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID =3D B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID =3D 72077137 From owner-freebsd-arch@FreeBSD.ORG Wed Dec 12 21:30:32 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8FBA616A468 for ; Wed, 12 Dec 2007 21:30:32 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.179]) by mx1.freebsd.org (Postfix) with ESMTP id 3702513C458 for ; Wed, 12 Dec 2007 21:30:32 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so656742waf.3 for ; Wed, 12 Dec 2007 13:30:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; bh=A+++eRv8vz16fo7eUfqaRuCOFPwJkDvD3wCTsWTisjI=; b=Yo/ATQd24oepp9DkzncxioIhtDAsPt1grcVrMxiwwcqEWqmG1Zv+bmZxj+0RfMg9Nvqc5Cfg7S4DYx9oQhnTvpGlDPQSWSdXa5b51Z/YVK1fnUw6obZYfW1myNfk2cPLpoRe7nsxPhT7tgp4uwLLYrXCDRIUBZOBMelDM3q1Npw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=naBVt7a8aKYHEYrKSTs2Luy9XktkCZ2jamgxGHW1+Nt0CMrtUNZmfDrYRPJI73k6sLCXgtCpc0w1um3Q32qujkdAonM2mQx6ZDViu8Jydv1DupsT8AoeCN8BLphCSraGZj/ww94ojN9aoVP8Ww/KCBVS+flD8dS2SIDaaHdB+Ro= Received: by 10.114.135.1 with SMTP id i1mr1288674wad.88.1197493416451; Wed, 12 Dec 2007 13:03:36 -0800 (PST) Received: by 10.114.255.11 with HTTP; Wed, 12 Dec 2007 13:03:36 -0800 (PST) Message-ID: Date: Wed, 12 Dec 2007 13:03:36 -0800 From: "Kip Macy" To: "Robert Watson" , "Sam Leffler" , freebsd-arch@freebsd.org, "FreeBSD Current" MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Cc: Subject: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Dec 2007 21:30:32 -0000 After review by Mike Silbersack I've committed the hooks that provide a driver independent interface to TCP offload. I would like to commit the changes to tcp_subr.c and tcp_usrreq.c to actually make use of the new interface. Please review the following: http://www.fsmware.com/freebsd/tcp/tcp_subr.c.diff http://www.fsmware.com/freebsd/tcp/tcp_usrreq.c.diff The new KPI is provided by the following 2 files: http://www.fsmware.com/freebsd/tcp/tcp_ofld.c http://www.fsmware.com/freebsd/tcp/tcp_ofld.h Thank you for taking the time to review and provide feedback. -Kip From owner-freebsd-arch@FreeBSD.ORG Thu Dec 13 06:51:19 2007 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3C3AF16A420 for ; Thu, 13 Dec 2007 06:51:19 +0000 (UTC) (envelope-from rermilov@team.vega.ru) Received: from mail.vega.ru (infra.dev.vega.ru [90.156.167.14]) by mx1.freebsd.org (Postfix) with ESMTP id 00EF313C467 for ; Thu, 13 Dec 2007 06:51:18 +0000 (UTC) (envelope-from rermilov@team.vega.ru) Received: from [87.242.97.68] (port=63082 helo=edoofus.dev.vega.ru) by mail.vega.ru with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.68 (FreeBSD)) (envelope-from ) id 1J2heD-000J8t-F5; Thu, 13 Dec 2007 06:34:17 +0000 Received: from edoofus.dev.vega.ru (localhost [127.0.0.1]) by edoofus.dev.vega.ru (8.14.2/8.14.2) with ESMTP id lBD5Di5Y081435; Thu, 13 Dec 2007 08:13:59 +0300 (MSK) (envelope-from rermilov@team.vega.ru) Received: (from ru@localhost) by edoofus.dev.vega.ru (8.14.2/8.14.2/Submit) id lBD5DDn0081408; Thu, 13 Dec 2007 08:13:13 +0300 (MSK) (envelope-from rermilov@team.vega.ru) X-Authentication-Warning: edoofus.dev.vega.ru: ru set sender to rermilov@team.vega.ru using -f Date: Thu, 13 Dec 2007 08:12:58 +0300 From: Ruslan Ermilov To: Doug Barton Message-ID: <20071213051258.GA81366@team.vega.ru> References: <4759DC08.9070600@FreeBSD.org> <20071208163857.GC91919@lor.one-eyed-alien.net> <475B2BD1.7000303@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <475B2BD1.7000303@FreeBSD.org> X-Authentication-Warning: edoofus.dev.vega.ru: ru set sender to rermilov@team.vega.ru using -f User-Agent: Mutt/1.5.16 (2007-06-09) Cc: Gordon M Tetlow , Brooks Davis , freebsd-arch@FreeBSD.org Subject: Re: Should libgssapi be hidden behind the MK_KERBEROS knob? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Dec 2007 06:51:19 -0000 Hi, [ Sorry, I didn't notice this thread the commit. ] On Sat, Dec 08, 2007 at 03:42:09PM -0800, Doug Barton wrote: > Gordon M Tetlow wrote: > > > > On Dec 8, 2007, at 8:38 AM, Brooks Davis wrote: > > > >> On Fri, Dec 07, 2007 at 03:49:28PM -0800, Doug Barton wrote: > >>> If there is a better list for this, don't hesitate to let me know. > >>> > >>> I use WITHOUT_KERBEROS=true in /etc/{make|src}.conf, since I don't > >>> need or use it. However, this leads to a problem with building the > >>> kdelibs3 port. The configure script looks for the presence of > >>> libgssapi and the associated headers, and takes that to mean that > >>> kerberos is available, and sets things up accordingly. This causes > >>> the build to fail when it tries to actually link something to a > >>> kerberos library. > >>> > >>> I realize that GSS can be used for other things besides kerberos, but > >>> are we really losing anything by hiding them both under the same knob? > >>> If the answer to that is yes, is there any objection to a WITHOUT_GSS > >>> knob? > >> > >> We wouldn't loose anything today, but a without GSS knob makes more > >> sense to me. There's at least one other GSS system in fairly wide use > >> in the high performance computing world today. > > > > How about WITHOUT_KERBEROS implies WITHOUT_GSSAPI unless people > > specifically ask for GSSAPI? Is that too obscure? > > That sounds totally reasonable. How does the attached look? > The new build options system was designed to be simple for makefiles and users. It is simple for makefiles in that they can check a particular MK_* variable against "no" (only!) and don't bother to track options interdependencies -- the latter is a task of bsd.own.mk. This has been broken in this commit -- lib/Makefile looks like this: .if ${MK_KERBEROS} != "no" _libgssapi= libgssapi .else .if ${MK_GSSAPI} == "yes" _libgssapi= libgssapi .endif .endif One of the goals of the new system was to avoid conditions like this to appear in makefiles -- all the logic of setting the MK_* variables (including tracking their interdependencies) should be in bsd.own.mk. If MK_GSSAPI were set correctly, then lib/Makefile would look like this: .if ${MK_GSSAPI} != "no" _libgssapi= libgssapi .endif Befor this change, there were two types of MK_* variables, those defaulting to "yes" (majority), and those defaulting to "no". There are also several options dependencies -- it works by switching some options off when another option is switched off. (This gets automatically documented in src.conf(5)). Plus there's a small set of MK_*_SUPPORT variables that *defaulted* to "yes" unless the corresponding MK_* option evaluated to "no", in which case they are *forced* to "no". This allows for the src.conf(5) manpage to be automatically generated, showing only non-default options, and documenting options interdependencies by the script. Now the logic in lib/Makefile and a manpage are broken because setting of MK_GSSAPI is broken -- it's set to "no" by default, while the intent was to set its value to the value of ${MK_KERBEROS} unless it's set explicitly by WITH_GSSAPI / WITHOUT_GSSAPI. The logic in lib/Makefile is broken because WITHOUT_GSSAPI simply doesn't have any effect, while all MK_* variables are controllable by corresponding WITH_*/WITHOUT_* user-settable knobs, modulo forcing some variables to "no" when their dependencies are "no". The manpage is broken because it misses some interesting facts, e.g. that WITHOUT_CRYPT sets MK_CRYPT=no which in turn sets MK_KERBEROS=no which in turn should set MK_GSSAPI=no. While I have a patch that fixes all of the above bugs (and it is committed by the time you read this), I have a proposal: let's break this artificial dependency of MK_GSSAPI on MK_KERBEROS, have a simple MK_GSSAPI option which defaults to "yes", and have a WITHOUT_GSSAPI knob to turn it off. Then we'd be back to a simple and normal behavior. By having MK_GSSAPI default to the value of MK_KERBEROS we introduce another type of MK_* variables defaulting to the value of another MK_* variable. This is: 1) not necessary at the moment, 2) harder for users to understand, and 3) harder to implement and support (see my commit). Cheers, -- Ruslan Ermilov ru@FreeBSD.org FreeBSD committer From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 08:57:30 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 375BF16A421 for ; Sat, 15 Dec 2007 08:57:30 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from hs-out-2122.google.com (hs-out-0708.google.com [64.233.178.241]) by mx1.freebsd.org (Postfix) with ESMTP id D527613C447 for ; Sat, 15 Dec 2007 08:57:29 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by hs-out-2122.google.com with SMTP id j58so1733838hsj.11 for ; Sat, 15 Dec 2007 00:57:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=ryS8L/M2sdKR7XEu9LbiBGN65DlXsIXf4/PuLSk2vco=; b=nyPIN7I8wZMuxCNc8/LbyJPeB2YsdrHS8QHfg5UaREd0zroNd5VIUW77rO1oKNkZb66HWkRAo1St61VlmY+Pce92l7J73tXi5eItCbIxHsJj0buzmKBzF/TOt9+aeTT5fX/uLH+8MJy/SKxlSevxwYLiaRE/KTbAB6MQVfQ7Hlk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=YA/wz/B3JlbGK3a7d9q9eUnynhIcUoLW7xdWNYN9UKFDCO/j0WP20ogtzG9YC/iswnsLbbcsvNLaSPnL6s+IRF9bEfUBrUUZT5OKn45DCMg3JaC+pP7eVjLVFRzhKsFTg4DSk75JRzsswo4gGoS54MCi2+5cbtuuNg+gv/JnYbo= Received: by 10.150.124.2 with SMTP id w2mr1572036ybc.2.1197709047753; Sat, 15 Dec 2007 00:57:27 -0800 (PST) Received: by 10.150.200.17 with HTTP; Sat, 15 Dec 2007 00:57:27 -0800 (PST) Message-ID: Date: Sat, 15 Dec 2007 00:57:27 -0800 From: "Kip Macy" To: "Robert Watson" , "Sam Leffler" , freebsd-arch@freebsd.org, "FreeBSD Current" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: Cc: Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 08:57:30 -0000 The updated patch is at: http://www.fsmware.com/tcp/tcp_offload.diff I've only had structural feedback from one person. Please let me know if you intend to provide feedback I'd like to get this in without much further delay. -Kip On 12/12/07, Kip Macy wrote: > After review by Mike Silbersack I've committed the hooks that provide > a driver independent interface to TCP offload. > > I would like to commit the changes to tcp_subr.c and tcp_usrreq.c to > actually make use of the new interface. Please review the following: > > http://www.fsmware.com/freebsd/tcp/tcp_subr.c.diff > > http://www.fsmware.com/freebsd/tcp/tcp_usrreq.c.diff > > The new KPI is provided by the following 2 files: > http://www.fsmware.com/freebsd/tcp/tcp_ofld.c > http://www.fsmware.com/freebsd/tcp/tcp_ofld.h > > > Thank you for taking the time to review and provide feedback. > > -Kip > From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 09:02:37 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E696716A419 for ; Sat, 15 Dec 2007 09:02:37 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from hs-out-2122.google.com (hs-out-0708.google.com [64.233.178.241]) by mx1.freebsd.org (Postfix) with ESMTP id 8C40513C455 for ; Sat, 15 Dec 2007 09:02:37 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by hs-out-2122.google.com with SMTP id j58so1735239hsj.11 for ; Sat, 15 Dec 2007 01:02:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=1F0mj4+hO7HRQbji9Jr+IxNt9hbOM5Ga9ssh/rkKp9w=; b=Ml+1dwZkXs5sOrZjEBCR2UKTjtbassUP8gzQ+pddM4G2Nfl7ChFv6vU6SCcHG+aShk3oNEqjzDK6T6qlJstE5Gmy6qNdNT7mtxK3/wGn2/uSd5Dj6Ay9ZZISnaI/FLAyQ9cO20A9O0tJ2pk4XS04MSFZV6LXkt3dtb0QXquX1KQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=Lqc5i3Zt9sHIDGCkckyvfbSubaCO4GOsKU3CpLoWzB3r8+jIMGbII9mECXFFSLJu4iXi80KGqteURvECUjI54Q5YBt0NJ5zPILBBzUPYGBsp+wLCNW9sj1j/1b6P51zBxfSwlvTPgZleUsue7XK4yn0IDJWTDJKLd0kbOY4Bz4w= Received: by 10.150.192.7 with SMTP id p7mr1555696ybf.90.1197709356543; Sat, 15 Dec 2007 01:02:36 -0800 (PST) Received: by 10.150.200.17 with HTTP; Sat, 15 Dec 2007 01:02:36 -0800 (PST) Message-ID: Date: Sat, 15 Dec 2007 01:02:36 -0800 From: "Kip Macy" To: "Robert Watson" , "Sam Leffler" , freebsd-arch@freebsd.org, "FreeBSD Current" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: Cc: Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 09:02:38 -0000 http://www.fsmware.com/freebsd/tcp/tcp_offload.diff On 12/15/07, Kip Macy wrote: > The updated patch is at: > http://www.fsmware.com/tcp/tcp_offload.diff > > I've only had structural feedback from one person. Please let me know > if you intend to provide feedback I'd like to get this in without much > further delay. > > -Kip > > > On 12/12/07, Kip Macy wrote: > > After review by Mike Silbersack I've committed the hooks that provide > > a driver independent interface to TCP offload. > > > > I would like to commit the changes to tcp_subr.c and tcp_usrreq.c to > > actually make use of the new interface. Please review the following: > > > > http://www.fsmware.com/freebsd/tcp/tcp_subr.c.diff > > > > http://www.fsmware.com/freebsd/tcp/tcp_usrreq.c.diff > > > > The new KPI is provided by the following 2 files: > > http://www.fsmware.com/freebsd/tcp/tcp_ofld.c > > http://www.fsmware.com/freebsd/tcp/tcp_ofld.h > > > > > > Thank you for taking the time to review and provide feedback. > > > > -Kip > > > From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 10:56:47 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A8D4716A419; Sat, 15 Dec 2007 10:56:47 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 5D18113C45D; Sat, 15 Dec 2007 10:56:47 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id B17DB484F5; Sat, 15 Dec 2007 05:56:46 -0500 (EST) Date: Sat, 15 Dec 2007 10:56:46 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Kip Macy In-Reply-To: Message-ID: <20071215100351.Q70617@fledge.watson.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD Current , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 10:56:47 -0000 On Sat, 15 Dec 2007, Kip Macy wrote: > The updated patch is at: http://www.fsmware.com/tcp/tcp_offload.diff > > I've only had structural feedback from one person. Please let me know if you > intend to provide feedback I'd like to get this in without much further > delay. Per our out-of-band communication, I won't have the opportunity to do any serious reviewing of this work until later next week, as I have other personal obligations in the mean time that preclude that. However, skimming the comments, I have a couple of areas where I'd like some clarification to help guide my reading. I've only got fifteen minutes so I'll cut off when I run out. + * A driver publishes that it provides offload services + * by setting IFCAP_TOE in the ifnet. The offload connect + * will bypass any further work if the interface that a + * connection would use does not support TCP offload. My initial feeling is that, even if an interface supports TOE, we shouldn't enable the capability in the enabled vector by default, as TOE bypasses firewall behavior, etc, and would certainly be a surprise if an admin swapped a chelsio card for a non-TOE supporting card. What's your feeling on this? + * The TOE API assumes that the tcp offload engine can offload the + * the entire connection from set up to teardown, with some provision + * being made to allowing the software stack to handle time wait. If + * the device does not meet these criteria, it is the driver's responsibility + * to overload the functions that it needs to in tcp_usrreqs and make + * its own calls to tcp_output if it needs to do so. While I'm familiar with TCP, I'm less familiar with the scope of what cards support for TOE. Do we know of any cards that are less capable than the chelsio card in this respect, or are they all sort of on-par on that front? I.e., do we think the above eventuality is likely? If we don't, then one of the things I'd like to see us do is fairly carefully assert, at least for a few months, that TCP never "slips" into any transmission-related paths that could lead to truly odd and hard-to-diagnose behavior when runnning with TOE. I.e., tcp_output, etc. If we do think it's likely, we don't need to address this immediately, but we should make sure that before we ship TOE in a release, we've thought somewhat more thoroughly about that case. As long as TOE remains un-MFC'd, we don't find ourselves with an obligation to maintain guarantees about the interfaces, and that includes dealing with incompatibility :-). Do we know if any of the current 10gbps vendors other than chelsio are actively looking at TOE on FreeBSD, and could be engaged in discussion? + * The toe_usrreqs structure constitutes the TOE driver's + * interface to the TCP stack for functionality that doesn't + * interact directly with userspace. If one wants to provide + * (optional) functionality to do zero-copy to/from + * userspace one still needs to override soreceive/sosend + * with functions that fault in and pin the user buffers. And this is an issue we also should work out in order to properly fix ZERO_COPY_SOCKETS anyway. I think it might be useful to add a couple of paragraphs here on three topics: (1) Clarify the way in which windows are updated between the device driver and the socket code, both for sending/receiving. You talk a bit about "credit", but introducing it up-front would be useful. (2) One of the issues I've run into in the TCP and socket code generally is that there was significant lack of clarity on the "life cycle" of the set of related data structures. Could you write a bit of text about when drivers will allocate state and when they will free it? I.e., tu_attach allocates state, tu_{abort,detach} free it, and TCP promises not to call anything before attach or anything after abort/detach. (3) Could you talk at a high level about the ways in which TOE drivers will interact with TCP? You do it a bit in each of the sections, but if there's a principle, pulling it out would be useful. Also, you should indicate whether the driver is allowed to drop the inpcb lock or not. Doing this would address a few of the comments I have below also. + * + tu_send + * - tells the driver that new data may have been added to the + * socket's send buffer - the driver should not fail if the + * buffer is in fact unchanged I'm a bit confused by the description of the error condition here. Could you clarify when a driver should return an error, and what the impact of an error returned will be on the connection state? In fact, it probably makes sense to have an up-front comment on conventions for error-handling -- if TOE returns an error will that generally lead to a TCP tear-down? + * - The driver expects the inpcb lock to be held and This comment is truncated -- is there an and? We should specify that drivers are not allowed to drop the inpcb lock if that is the case, FYI. + * + tu_rcvd + * - returns credits to the driver and triggers window updates + * to the peer (a credit is a byte in the user's receive window) Might begin with a sentence defining the notion of credit. Is it possible to use tu_rcvd to reduce credit to the card if the socket buffer size is changed, or just increase it? + * - the driver is expected to determine how many bytes have been + * consumed and credit that back to the card so that it can grow + * the window again Could you provide an example of how it is to do that -- i.e., is it just going to inspect so_rcv in the same way native TCP does? + * - this function needs to correctly handle being called any number of + * times without any bytes being consumed from the receive buffer. + * - the driver expects the inpcb lock to be held + * + * + tu_disconnect + * - tells the driver to send FIN to peer + * - driver is expected to send the remaining data and then do a clean half close + * - disconnect implies at least half-close so only send, abort, and detach + * are legal Could you clarify this a bit? Do you mean that TCP guarangees that only tu_send, tu_abort, and tu_detach will be delivered to the driver in the future? + * - the driver is expected to handle transition through the shutdown + * state machine and allow the stack to support SO_LINGER. Probably worth commenting that the device driver won't detach the toe state. + * + * + tu_abort + * - closes the connection and sends a RST to peer + * - driver is expectd to trigger an RST and detach the toepcb In regular TCP, the pru_abort method is only called on pending connections while still in the listen queues of a listen socket. Is this true of tu_abort, or is tu_abort a more general method to be used to cancel connections? If so, probably worth commenting on that. + * - no further calls are legal after abort + * - the driver expects the inpcb lock to be held + * + * + tu_detach + * - tells driver that the socket is going away so disconnect + * the toepcb and free appropriate resources + * - allows the driver to cleanly handle the case of connection state + * outliving the socket + * - no further calls are legal after detach + * - the driver acquires the tcbinfo lock For this call, you haven't specified whether the inpcb lock is held. If it is, the driver acquiring the tcbinfo lock without first dropping the inpcb lock would be a lock order reversal. Should the caller instead acquire/hold it? For the above calls, what guarantees does the TCP stack make about the presence of the socket, if any? These interfaces all pass the tcpcb, but in our regular TCP stack, the invariant is the existence of the inpcb, not the tcpcb, which may be replaced with a tcptw (or in one edge case, inp_ppcb may be NULL). If there will be drivers in the future that implement timewait, perhaps we should be passing in the inpcb? + * + tu_syncache_event + * - even if it is not actually needed, the driver is expected to + * call syncache_add for the initial SYN and then syncache_expand + * for the SYN,ACK + * - tells driver that a connection either has not been added or has + * been dropped from the syncache + * - the driver is expected to maintain state that lives outside the + * software stack so the syncache needs to be able to notify the + * toe driver that the software stack is not going to create a connection + * for a received SYN + * - the driver is responsible for any synchronization required Presumably tu_syncache_event is called from the syncache and locks will be held when that happens...? How will the race between the syncache deciding to drop a connection of its own accord and the hardware/driver deciding to accept be addressed, generally speaking? + +extern struct toe_usrreqs tcp_offload_usrreqs; What is the purpose of this global? Presumably we can have two drivers that both implement offload at once? More comments to follow. Robert N M Watson Computer Laboratory University of Cambridge > > -Kip > > > On 12/12/07, Kip Macy wrote: >> After review by Mike Silbersack I've committed the hooks that provide >> a driver independent interface to TCP offload. >> >> I would like to commit the changes to tcp_subr.c and tcp_usrreq.c to >> actually make use of the new interface. Please review the following: >> >> http://www.fsmware.com/freebsd/tcp/tcp_subr.c.diff >> >> http://www.fsmware.com/freebsd/tcp/tcp_usrreq.c.diff >> >> The new KPI is provided by the following 2 files: >> http://www.fsmware.com/freebsd/tcp/tcp_ofld.c >> http://www.fsmware.com/freebsd/tcp/tcp_ofld.h >> >> >> Thank you for taking the time to review and provide feedback. >> >> -Kip >> > From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 16:23:46 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AB4AA16A418; Sat, 15 Dec 2007 16:23:46 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 6F3D113C467; Sat, 15 Dec 2007 16:23:46 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id EECC947E90; Sat, 15 Dec 2007 11:23:45 -0500 (EST) Date: Sat, 15 Dec 2007 16:23:45 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Kip Macy In-Reply-To: <20071215100351.Q70617@fledge.watson.org> Message-ID: <20071215152959.V85668@fledge.watson.org> References: <20071215100351.Q70617@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD Current , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 16:23:46 -0000 On Sat, 15 Dec 2007, Robert Watson wrote: > More comments to follow. Some more, back to tcp_offload.c which I didn't look at last time around: + ifp = rt->rt_ifp; + tdev = TOEDEV(ifp); + if (tdev == NULL) + return (EINVAL); + + if (tdev->tod_can_offload(tdev, so) == 0) + return (EINVAL); I sort of expected to see a if_capenable check for TOE here, but I guess it doesn't make much difference if the flag is going to be checked at the lower level. However, in the case of things like TCP checksum offload, we do do the check at the higher level, so it wouldn't be inconsistent to do that. BTW, could you prepare similar comments for toedev.h? +struct toe_usrreqs tcp_offload_usrreqs = { + .tu_send = tcp_output, + .tu_rcvd = tcp_output, + .tu_disconnect = tcp_output, + .tu_abort = tcp_output, + .tu_detach = tcp_offload_detach_noop, + .tu_syncache_event = tcp_offload_syncache_event_noop, +}; This structure seems to introduce quite a bit more indirection for non-offloaded cases than before, especially to do things like this: +static void +tcp_offload_syncache_event_noop(int event, void *toepcb) +{ +} The compiler can't compile these out because it likely doesn't do much in the way of function pointer analysis, and probably shouldn't. However, it leaves quite a few code paths heavier weight than before. I think I'd prefer a model in which a TF_OFFLOAD flag is set on the tcpcb and then we conditionally invoke tu_foo as a result. I.e.,: static __inline int tcp_gen_send(struct tcpcb *tp) { #ifndef TCP_OFFLOAD_DISABLE if (tcp->f_flag & TF_OFFLOADED) return (tp->t_tu->tu_send(tp)); else #endif return (tcp_output(tp)); } This would compile to a straight call to tcp_output() when offloading isn't compiled in, and when it is compiled in and offloading isn't enabled, we do a simple flag check rather than invoking a function via a series of pointers. Back to tcp_offload.h: +#define SC_ENTRY_PRESENT 1 /* 4-tuple already present */ +#define SC_DROP 2 /* connection was timed out */ I think you should give these a different prefix, since they're part of the interface between the syncache and offload, and SC_ generally are flags used internal to the syncache. Later you use TCP_OFFLOAD_ as the prefix, so that may be appropriate here also. +#define TCP_OFFLOAD_LISTEN_OPEN 1 +#define TCP_OFFLOAD_LISTEN_CLOSE 2 + +typedef void (*tcp_offload_listen_fn)(void *, int, struct tcpcb *); +EVENTHANDLER_DECLARE(tcp_offload_listen, tcp_offload_listen_fn); Here and with the syncache interface, it seems like you're adding new operations as modes to functions, whereas elsewhere you're adding multiple functions. There isn't too much difference, but I think I'd rather see a slightly wider interface (i.e., more functions) and avoid flags that change the semantics of particular functions. That is to say: tu_syncache_add and tu_syncache_drop, and likewise tcp_offload_listen and tcp_offload_unlisten (or something similar). +int tcp_offload_connect(struct socket *so, struct sockaddr *nam); This prototype appear not to be documented. +/* + * The tcp_gen_* routines are wrappers around the toe_usrreqs calls, + * in the non-offloaded case they translate to tcp_output. Not really sure about the _gen_ naming, but not sure I have a better suggestion in mind just yet -- maybe _output_ since they control output. I'm not entirely thrilled that all of these become inline functions in include files, although since a couple are used in multiple tcp*.c files, it may be the least bad of the options. My comments above about possibly restructuring this to avoid lots of indirection for non-offloaded case strengthen the argument for inlining, however. +#ifndef TCP_OFFLOAD_DISABLE I notice you've renamed the option, and I like the new name a lot better. However, I also notice that it doesn't seem to appear in conf/options or conf/files. Could you make sure to add it? +/* + * The socket has not been marked as "do not offload" + */ +#define SO_OFFLOADABLE(so) ((so->so_options & SO_NO_OFFLOAD) == 0) I find myself wondering if the offload option should be a TCP socket option rather than a socket-layer option, but don't have strong feelings about this. What do you intend the policy model to be for enabling TOE, in general? That if the TOE capability is available and the TOE capability is enabled, all TCP sockets via the enabled interface will be offloaded unless SO_NO_OFFLOAD is set? Are you currently anticipating enabling the capability by default? Nitpick on style: + /* disconnect offload device, if any */ Comments that are sentences should begin with an upper case letter and end with a period. :-) And another one: +#ifdef TCP_OFFLOAD_DISABLE +#define TOEPCB_SET(sc) (0) +#else +#define TOEPCB_SET(sc) ((sc)->sc_toepcb != NULL) +#endif + + Too many blank lines there. I've noticed that in quite a few places, we prefer X_ISSET() to test for a set flag or other condition, in order to prevent confusion with X_SET() assigning a value. I think that's pretty sensible, and should do it more myself, so encourage you to do the same Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 18:10:22 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 634AC16A418 for ; Sat, 15 Dec 2007 18:10:22 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.181]) by mx1.freebsd.org (Postfix) with ESMTP id 27E3413C4EE for ; Sat, 15 Dec 2007 18:10:22 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so2244225waf.3 for ; Sat, 15 Dec 2007 10:10:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=z7Iw+pvwm5uxTOJW2rs+OaFu4bUSZ2tC9EURox+qH9U=; b=aR37tjXjQLeS9mgZ4qbDN9IuhmzbwIAAQ+p/DTgmy+9/Odt91up2cRUbJLInubkMLjBewzQB2eiaRMJVnJ/QQZeSAJliWWuLq+n5yz5UEzsE0mZUMxRIaMfzJvd4jAFrOJZACpSPn/JZHetBQnSLStbFAAL/J+UibkkQQmKxeXU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=rz8/jCZ9cVEJubQLAdBIYl9HjbmnykfpfGV7ItsAWUaaNm3itA7gHS7ULhFhracF/BAdYr5Mna59QhSpgA19Eh0V/40d3++aKVp70IntzfNbvxIQrGvcSMqf/+lw2E/uGMJZr5QiNwJyQn3bbYicUuw6W/9sd3citUmM/zBTVqU= Received: by 10.114.78.1 with SMTP id a1mr324244wab.102.1197742221471; Sat, 15 Dec 2007 10:10:21 -0800 (PST) Received: by 10.114.255.11 with HTTP; Sat, 15 Dec 2007 10:10:21 -0800 (PST) Message-ID: Date: Sat, 15 Dec 2007 10:10:21 -0800 From: "Kip Macy" To: "Robert Watson" In-Reply-To: <20071215152959.V85668@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20071215100351.Q70617@fledge.watson.org> <20071215152959.V85668@fledge.watson.org> Cc: FreeBSD Current , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 18:10:22 -0000 On Dec 15, 2007 8:23 AM, Robert Watson wrote: > > On Sat, 15 Dec 2007, Robert Watson wrote: > > > More comments to follow. > > Some more, back to tcp_offload.c which I didn't look at last time around: > > + ifp = rt->rt_ifp; > + tdev = TOEDEV(ifp); > + if (tdev == NULL) > + return (EINVAL); > + > + if (tdev->tod_can_offload(tdev, so) == 0) > + return (EINVAL); > > I sort of expected to see a if_capenable check for TOE here, but I guess it > doesn't make much difference if the flag is going to be checked at the lower > level. However, in the case of things like TCP checksum offload, we do do the > check at the higher level, so it wouldn't be inconsistent to do that. Yes, IFCAP_TOE should be checked here. > > BTW, could you prepare similar comments for toedev.h? Ok, toedev.h isn't actually originally from me. However, it is a part of the API and so should be documented. > > +struct toe_usrreqs tcp_offload_usrreqs = { > + .tu_send = tcp_output, > + .tu_rcvd = tcp_output, > + .tu_disconnect = tcp_output, > + .tu_abort = tcp_output, > + .tu_detach = tcp_offload_detach_noop, > + .tu_syncache_event = tcp_offload_syncache_event_noop, > +}; > > This structure seems to introduce quite a bit more indirection for > non-offloaded cases than before, especially to do things like this: > > +static void > +tcp_offload_syncache_event_noop(int event, void *toepcb) > +{ > +} This should never be called as sc_tu will only be set by the TOE interface to the syncache. Also bear in mind that this and detach are only called once per connection The real overhead is in calling tcp_output via an indirect function call versus a direct function call. Considering the level of gratuitous overhead in the output path I would be surprised if this were measurable. > > The compiler can't compile these out because it likely doesn't do much in the > way of function pointer analysis, and probably shouldn't. However, it leaves > quite a few code paths heavier weight than before. I think I'd prefer a model > in which a TF_OFFLOAD flag is set on the tcpcb and then we conditionally > invoke tu_foo as a result. I.e.,: > > static __inline int > tcp_gen_send(struct tcpcb *tp) > { > > #ifndef TCP_OFFLOAD_DISABLE > if (tcp->f_flag & TF_OFFLOADED) > return (tp->t_tu->tu_send(tp)); > else > #endif > return (tcp_output(tp)); > } > > This would compile to a straight call to tcp_output() when offloading isn't > compiled in, and when it is compiled in and offloading isn't enabled, we do a > simple flag check rather than invoking a function via a series of pointers. This is what the current code does. See tcp_ofld.h in CVS. The indirection was suggested by Sam as a cleaner abstraction. > > Back to tcp_offload.h: > > +#define SC_ENTRY_PRESENT 1 /* 4-tuple already > present */ > +#define SC_DROP 2 /* connection was > timed out */ > > I think you should give these a different prefix, since they're part of the > interface between the syncache and offload, and SC_ generally are flags used > internal to the syncache. Later you use TCP_OFFLOAD_ as the prefix, so that > may be appropriate here also. > > +#define TCP_OFFLOAD_LISTEN_OPEN 1 > +#define TCP_OFFLOAD_LISTEN_CLOSE 2 > + > +typedef void (*tcp_offload_listen_fn)(void *, int, struct tcpcb *); > +EVENTHANDLER_DECLARE(tcp_offload_listen, tcp_offload_listen_fn); > > Here and with the syncache interface, it seems like you're adding new > operations as modes to functions, whereas elsewhere you're adding multiple > functions. There isn't too much difference, but I think I'd rather see a > slightly wider interface (i.e., more functions) and avoid flags that change > the semantics of particular functions. That is to say: tu_syncache_add and > tu_syncache_drop, and likewise tcp_offload_listen and tcp_offload_unlisten (or > something similar). I've kept the interface to as few functions as possible. I've expanded tcp_output to 4 functions because it was necessary semantically. I'll widen the interface if Sam agrees. > > +int tcp_offload_connect(struct socket *so, struct sockaddr *nam); > > This prototype appear not to be documented. Ok, it appears fairly self-documenting to me =-D > > +/* > + * The tcp_gen_* routines are wrappers around the toe_usrreqs calls, > + * in the non-offloaded case they translate to tcp_output. > > Not really sure about the _gen_ naming, but not sure I have a better > suggestion in mind just yet -- maybe _output_ since they control output. I'm > not entirely thrilled that all of these become inline functions in include > files, although since a couple are used in multiple tcp*.c files, it may be > the least bad of the options. My comments above about possibly restructuring > this to avoid lots of indirection for non-offloaded case strengthen the > argument for inlining, however. > > +#ifndef TCP_OFFLOAD_DISABLE > > I notice you've renamed the option, and I like the new name a lot better. > However, I also notice that it doesn't seem to appear in conf/options or > conf/files. Could you make sure to add it? > > +/* > + * The socket has not been marked as "do not offload" > + */ > +#define SO_OFFLOADABLE(so) ((so->so_options & SO_NO_OFFLOAD) == 0) > > I find myself wondering if the offload option should be a TCP socket option > rather than a socket-layer option, but don't have strong feelings about this. The assumption is that, if you a) pay the extra for a version of the card that supports TOE and b) you load the toe module (it isn't part of the NIC driver) that you want your existing software to have its connections offloaded. I don't know of any customers that want to modify their user applications to selectively enable TOE. > > What do you intend the policy model to be for enabling TOE, in general? That > if the TOE capability is available and the TOE capability is enabled, all TCP > sockets via the enabled interface will be offloaded unless SO_NO_OFFLOAD is > set? That is the current usage model. At some point we may tie it into pf/ipf/ipfw to provide for offload policy. However, currently we offload all connections on an interface until we run out of TCAM entries. >Are you currently anticipating enabling the capability by default? Yes, *if the TOE driver is loaded*. > > Nitpick on style: > > + /* disconnect offload device, if any */ > > Comments that are sentences should begin with an upper case letter and end > with a period. :-) And another one: > > +#ifdef TCP_OFFLOAD_DISABLE > +#define TOEPCB_SET(sc) (0) > +#else > +#define TOEPCB_SET(sc) ((sc)->sc_toepcb != NULL) > +#endif > + > + > > Too many blank lines there. > > I've noticed that in quite a few places, we prefer X_ISSET() to test for a set > flag or other condition, in order to prevent confusion with X_SET() assigning > a value. I think that's pretty sensible, and should do it more myself, so > encourage you to do the same That sounds reasonable. -Kip From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 18:19:46 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8672D16A5BB for ; Sat, 15 Dec 2007 18:19:46 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.180]) by mx1.freebsd.org (Postfix) with ESMTP id 39F1413C46B for ; Sat, 15 Dec 2007 18:19:46 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so2248973waf.3 for ; Sat, 15 Dec 2007 10:19:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=m9RV1o99QgVaGtoRcEeT0SezTSp5AN5N7JH5VNAJPY4=; b=rq1E3UODOuy2Gr+0EkvZVKHblSVb6g+nTHhtUhsNOZ5IEVIgpK+xwcN9HkvMsrchyNVynnBDsQqMsdBkpp+1Xz5ocR9Yy24Qycikkt1Tku5vjUsBHSOgtxi8esDTAXpZai+/tBrpthilFWOPYh5DPACFeunFYOopv8bW4nG7xSU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=lmTscsRcgdIJvX4uMX0qMvV5ZbFVkp8OGdrE7qu761PATOVEFQ0u4PevGt5lo0JuaREPvNgA+Y0gPY7uiKeUXrZ5dpIWZxuTK+gDLQduJOBGay+LBXssbMaLRK8idoaTiSdbjge91ltW1wOafb2cBGytTj3uUqUiB2WAfNVNkYc= Received: by 10.115.92.2 with SMTP id u2mr930461wal.139.1197742785662; Sat, 15 Dec 2007 10:19:45 -0800 (PST) Received: by 10.114.255.11 with HTTP; Sat, 15 Dec 2007 10:19:45 -0800 (PST) Message-ID: Date: Sat, 15 Dec 2007 10:19:45 -0800 From: "Kip Macy" To: "Sam Leffler" In-Reply-To: <47641A4E.5050106@errno.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20071215100351.Q70617@fledge.watson.org> <20071215152959.V85668@fledge.watson.org> <47641A4E.5050106@errno.com> Cc: FreeBSD Current , Robert Watson , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 18:19:46 -0000 On Dec 15, 2007 10:17 AM, Sam Leffler wrote: > > Robert Watson wrote: > > > > On Sat, 15 Dec 2007, Robert Watson wrote: > > > >> More comments to follow. > > > > Some more, back to tcp_offload.c which I didn't look at last time around: > > > > + ifp = rt->rt_ifp; > > + tdev = TOEDEV(ifp); > > + if (tdev == NULL) > > + return (EINVAL); > > + > > + if (tdev->tod_can_offload(tdev, so) == 0) > > + return (EINVAL); > > > > I sort of expected to see a if_capenable check for TOE here, but I > > guess it doesn't make much difference if the flag is going to be > > checked at the lower level. However, in the case of things like TCP > > checksum offload, we do do the check at the higher level, so it > > wouldn't be inconsistent to do that. > > > > BTW, could you prepare similar comments for toedev.h? > > > > +struct toe_usrreqs tcp_offload_usrreqs = { > > + .tu_send = tcp_output, > > + .tu_rcvd = tcp_output, > > + .tu_disconnect = tcp_output, > > + .tu_abort = tcp_output, > > + .tu_detach = tcp_offload_detach_noop, > > + .tu_syncache_event = tcp_offload_syncache_event_noop, > > +}; > > > > This structure seems to introduce quite a bit more indirection for > > non-offloaded cases than before, especially to do things like this: > > > > +static void > > +tcp_offload_syncache_event_noop(int event, void *toepcb) > > +{ > > +} > > > > The compiler can't compile these out because it likely doesn't do much > > in the way of function pointer analysis, and probably shouldn't. > > However, it leaves quite a few code paths heavier weight than before. > > I think I'd prefer a model in which a TF_OFFLOAD flag is set on the > > tcpcb and then we conditionally invoke tu_foo as a result. I.e.,: > > > > static __inline int > > tcp_gen_send(struct tcpcb *tp) > > { > > > > #ifndef TCP_OFFLOAD_DISABLE > > if (tcp->f_flag & TF_OFFLOADED) > > return (tp->t_tu->tu_send(tp)); > > else > > #endif > > return (tcp_output(tp)); > > } > > > > This would compile to a straight call to tcp_output() when offloading > > isn't compiled in, and when it is compiled in and offloading isn't > > enabled, we do a simple flag check rather than invoking a function via > > a series of pointers. > > I suggested Kip explore this technique as an alternative to having the > inlines that check whether a socket is marked for offload or not > (tradeoff indirect function call vs conditionals). My comment to him > was that I find it can make code more intuitive. But with the common > case being empty functions it's probably not a great option as the > compiler cannot optimize it out. Actually, in the datapath the common case is an indirect call to tcp_output. The only empty function that is called is detach, and that is only called once on shutdown. -Kip From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 18:35:50 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 627DA16A41B for ; Sat, 15 Dec 2007 18:35:50 +0000 (UTC) (envelope-from sam@errno.com) Received: from ebb.errno.com (ebb.errno.com [69.12.149.25]) by mx1.freebsd.org (Postfix) with ESMTP id 2B76313C442 for ; Sat, 15 Dec 2007 18:35:48 +0000 (UTC) (envelope-from sam@errno.com) Received: from trouble.errno.com (trouble.errno.com [10.0.0.248]) (authenticated bits=0) by ebb.errno.com (8.13.6/8.12.6) with ESMTP id lBFIHoEO083518 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 15 Dec 2007 10:17:50 -0800 (PST) (envelope-from sam@errno.com) Message-ID: <47641A4E.5050106@errno.com> Date: Sat, 15 Dec 2007 10:17:50 -0800 From: Sam Leffler User-Agent: Thunderbird 2.0.0.9 (X11/20071125) MIME-Version: 1.0 To: Robert Watson References: <20071215100351.Q70617@fledge.watson.org> <20071215152959.V85668@fledge.watson.org> In-Reply-To: <20071215152959.V85668@fledge.watson.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-DCC-dcc-servers-Metrics: om; whitelist Cc: Kip Macy , FreeBSD Current , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 18:35:50 -0000 Robert Watson wrote: > > On Sat, 15 Dec 2007, Robert Watson wrote: > >> More comments to follow. > > Some more, back to tcp_offload.c which I didn't look at last time around: > > + ifp = rt->rt_ifp; > + tdev = TOEDEV(ifp); > + if (tdev == NULL) > + return (EINVAL); > + > + if (tdev->tod_can_offload(tdev, so) == 0) > + return (EINVAL); > > I sort of expected to see a if_capenable check for TOE here, but I > guess it doesn't make much difference if the flag is going to be > checked at the lower level. However, in the case of things like TCP > checksum offload, we do do the check at the higher level, so it > wouldn't be inconsistent to do that. > > BTW, could you prepare similar comments for toedev.h? > > +struct toe_usrreqs tcp_offload_usrreqs = { > + .tu_send = tcp_output, > + .tu_rcvd = tcp_output, > + .tu_disconnect = tcp_output, > + .tu_abort = tcp_output, > + .tu_detach = tcp_offload_detach_noop, > + .tu_syncache_event = tcp_offload_syncache_event_noop, > +}; > > This structure seems to introduce quite a bit more indirection for > non-offloaded cases than before, especially to do things like this: > > +static void > +tcp_offload_syncache_event_noop(int event, void *toepcb) > +{ > +} > > The compiler can't compile these out because it likely doesn't do much > in the way of function pointer analysis, and probably shouldn't. > However, it leaves quite a few code paths heavier weight than before. > I think I'd prefer a model in which a TF_OFFLOAD flag is set on the > tcpcb and then we conditionally invoke tu_foo as a result. I.e.,: > > static __inline int > tcp_gen_send(struct tcpcb *tp) > { > > #ifndef TCP_OFFLOAD_DISABLE > if (tcp->f_flag & TF_OFFLOADED) > return (tp->t_tu->tu_send(tp)); > else > #endif > return (tcp_output(tp)); > } > > This would compile to a straight call to tcp_output() when offloading > isn't compiled in, and when it is compiled in and offloading isn't > enabled, we do a simple flag check rather than invoking a function via > a series of pointers. I suggested Kip explore this technique as an alternative to having the inlines that check whether a socket is marked for offload or not (tradeoff indirect function call vs conditionals). My comment to him was that I find it can make code more intuitive. But with the common case being empty functions it's probably not a great option as the compiler cannot optimize it out. > > Back to tcp_offload.h: > > +#define SC_ENTRY_PRESENT 1 /* 4-tuple already present */ > +#define SC_DROP 2 /* connection was timed out */ > > I think you should give these a different prefix, since they're part > of the interface between the syncache and offload, and SC_ generally > are flags used internal to the syncache. Later you use TCP_OFFLOAD_ > as the prefix, so that may be appropriate here also. > > +#define TCP_OFFLOAD_LISTEN_OPEN 1 > +#define TCP_OFFLOAD_LISTEN_CLOSE 2 > + > +typedef void (*tcp_offload_listen_fn)(void *, int, struct tcpcb > *); > +EVENTHANDLER_DECLARE(tcp_offload_listen, tcp_offload_listen_fn); > > Here and with the syncache interface, it seems like you're adding new > operations as modes to functions, whereas elsewhere you're adding > multiple functions. There isn't too much difference, but I think I'd > rather see a slightly wider interface (i.e., more functions) and avoid > flags that change the semantics of particular functions. That is to > say: tu_syncache_add and tu_syncache_drop, and likewise > tcp_offload_listen and tcp_offload_unlisten (or something similar). > > +int tcp_offload_connect(struct socket *so, struct sockaddr *nam); > > This prototype appear not to be documented. > > +/* > + * The tcp_gen_* routines are wrappers around the toe_usrreqs calls, > + * in the non-offloaded case they translate to tcp_output. > > Not really sure about the _gen_ naming, but not sure I have a better > suggestion in mind just yet -- maybe _output_ since they control > output. I'm not entirely thrilled that all of these become inline > functions in include files, although since a couple are used in > multiple tcp*.c files, it may be the least bad of the options. My > comments above about possibly restructuring this to avoid lots of > indirection for non-offloaded case strengthen the argument for > inlining, however. > > +#ifndef TCP_OFFLOAD_DISABLE > > I notice you've renamed the option, and I like the new name a lot > better. However, I also notice that it doesn't seem to appear in > conf/options or conf/files. Could you make sure to add it? > > +/* > + * The socket has not been marked as "do not offload" > + */ > +#define SO_OFFLOADABLE(so) ((so->so_options & SO_NO_OFFLOAD) == 0) > > I find myself wondering if the offload option should be a TCP socket > option rather than a socket-layer option, but don't have strong > feelings about this. > > What do you intend the policy model to be for enabling TOE, in > general? That if the TOE capability is available and the TOE > capability is enabled, all TCP sockets via the enabled interface will > be offloaded unless SO_NO_OFFLOAD is set? Are you currently > anticipating enabling the capability by default? > > Nitpick on style: > > + /* disconnect offload device, if any */ > > Comments that are sentences should begin with an upper case letter and > end with a period. :-) And another one: > > +#ifdef TCP_OFFLOAD_DISABLE > +#define TOEPCB_SET(sc) (0) > +#else > +#define TOEPCB_SET(sc) ((sc)->sc_toepcb != NULL) > +#endif > + > + > > Too many blank lines there. > > I've noticed that in quite a few places, we prefer X_ISSET() to test > for a set flag or other condition, in order to prevent confusion with > X_SET() assigning a value. I think that's pretty sensible, and should > do it more myself, so encourage you to do the same > From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 18:40:40 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9CAA516A421 for ; Sat, 15 Dec 2007 18:40:40 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.183]) by mx1.freebsd.org (Postfix) with ESMTP id 612F613C465 for ; Sat, 15 Dec 2007 18:40:40 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so2258855waf.3 for ; Sat, 15 Dec 2007 10:40:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=O9fG3Ze5eAT0KUU5+5gwDTc7eXjVf3bY9kTVjyuyXbE=; b=mTaOajIJBxbVIjVcXl4m7dDH3f7mR/G25Z8OYmck5MXx/TuRKEwGr8Z/Vl0eyAVQg0t4/P5dpx+RM03eYEut05phNuDgyei7uSdCMag9zL1LHf/jPVQ5kSWKIKv5kVFF1hDa+jCNRnSFdFO5wW/nK9VCfkiyzXOxKNm2bW3QJbE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=jcblXZlFNzb0cCt+U7X1tc9EaWA4GrPebkXpiCAy1Cg8uldMoeOtEIkISknA0ueclcJfU8VNyKuFJ//CN14PGk4ueAYxCWTbryFk3i8/w7kO28s4pijEgcSRmRP/Y1bcVXKAGdzH535OPfV+HPsyhTh4+J27U1Cchm4DwlYcA8M= Received: by 10.115.78.1 with SMTP id f1mr978726wal.100.1197744039662; Sat, 15 Dec 2007 10:40:39 -0800 (PST) Received: by 10.114.255.11 with HTTP; Sat, 15 Dec 2007 10:40:39 -0800 (PST) Message-ID: Date: Sat, 15 Dec 2007 10:40:39 -0800 From: "Kip Macy" To: "Robert Watson" In-Reply-To: <20071215100351.Q70617@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20071215100351.Q70617@fledge.watson.org> Cc: FreeBSD Current , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 18:40:40 -0000 > My initial feeling is that, even if an interface supports TOE, we shouldn't > enable the capability in the enabled vector by default, as TOE bypasses > firewall behavior, etc, and would certainly be a surprise if an admin swapped > a chelsio card for a non-TOE supporting card. What's your feeling on this? The current implementation bypasses the firewall. This and likely other hardware has extensive filtering support so it isn't neccessarily intrinsic. The usage model at this moment is that the customer makes a conscious decision to load the TOE driver and understands the implications. I think this is quite adequate for 10GigE cards currently. However, this will need to be revisited when these chips start showing up on mainstream motherboards. > + * The TOE API assumes that the tcp offload engine can offload the > + * the entire connection from set up to teardown, with some provision > + * being made to allowing the software stack to handle time wait. If > + * the device does not meet these criteria, it is the driver's responsibility > + * to overload the functions that it needs to in tcp_usrreqs and make > + * its own calls to tcp_output if it needs to do so. > > While I'm familiar with TCP, I'm less familiar with the scope of what cards > support for TOE. Do we know of any cards that are less capable than the > chelsio card in this respect, or are they all sort of on-par on that front? > I.e., do we think the above eventuality is likely? I don't have any way of knowing. I think it is probably safe to say that any vendors that don't meet that criteria now will in the future as transistor density increases. > > If we don't, then one of the things I'd like to see us do is fairly carefully > assert, at least for a few months, that TCP never "slips" into any > transmission-related paths that could lead to truly odd and hard-to-diagnose > behavior when runnning with TOE. I.e., tcp_output, etc. I'm happy to do that. However, I see problems introduced by offloading connections as being driver bugs much the same as problems caused by the driver's TCP segmentation offload or checksum offload. The problems will be isolated to connections using a specific interface. > > If we do think it's likely, we don't need to address this immediately, but we > should make sure that before we ship TOE in a release, we've thought somewhat > more thoroughly about that case. As long as TOE remains un-MFC'd, we don't > find ourselves with an obligation to maintain guarantees about the interfaces, > and that includes dealing with incompatibility :-). Do we know if any of the > current 10gbps vendors other than chelsio are actively looking at TOE on > FreeBSD, and could be engaged in discussion? As with most vendors and FreeBSD, I suspect that the two that I know of will have zero interest until they have a prospective customer. At which point they'll want it done yesterday. > I think it might be useful to add a couple of paragraphs here on three topics: > > (1) Clarify the way in which windows are updated between the device driver and > the socket code, both for sending/receiving. You talk a bit about > "credit", but introducing it up-front would be useful. I didn't realize a definition was necessary. To the best of my knowledge this is the common term used when discussing flow control. I've seen it used for Fibre Channel and IB. The one ambiguity that arises is whether or not it refers to bytes or segments. > (2) One of the issues I've run into in the TCP and socket code generally is > that there was significant lack of clarity on the "life cycle" of the set > of related data structures. Could you write a bit of text about when > drivers will allocate state and when they will free it? I.e., tu_attach > allocates state, tu_{abort,detach} free it, and TCP promises not to call > anything before attach or anything after abort/detach. > > (3) Could you talk at a high level about the ways in which TOE drivers will > interact with TCP? You do it a bit in each of the sections, but if > there's a principle, pulling it out would be useful. Also, you should > indicate whether the driver is allowed to drop the inpcb lock or not. I've done my best to minimize changes to TCP. It is safe to assume that the invariants are the same as those for tcp_output. I think we should ask the author of tcp_output to document the interface, expected state transitions, and its invariants (joke). > > Doing this would address a few of the comments I have below also. > > + * + tu_send > + * - tells the driver that new data may have been added to the > + * socket's send buffer - the driver should not fail if the > + * buffer is in fact unchanged > > I'm a bit confused by the description of the error condition here. Could you > clarify when a driver should return an error, and what the impact of an error > returned will be on the connection state? In fact, it probably makes sense to > have an up-front comment on conventions for error-handling -- if TOE returns > an error will that generally lead to a TCP tear-down? The offload routines are substituted for tcp_output and thus should interact with the stack in the same way. By extension they should have the same failure modes and invariants. > > + * - The driver expects the inpcb lock to be held and > > This comment is truncated -- is there an and? > > We should specify that drivers are not allowed to drop the inpcb lock if that > is the case, FYI. > > + * + tu_rcvd > + * - returns credits to the driver and triggers window updates > + * to the peer (a credit is a byte in the user's receive window) > > Might begin with a sentence defining the notion of credit. Is it possible to > use tu_rcvd to reduce credit to the card if the socket buffer size is changed, > or just increase it? > > + * - the driver is expected to determine how many bytes have been > + * consumed and credit that back to the card so that it can grow > + * the window again > > Could you provide an example of how it is to do that -- i.e., is it just going > to inspect so_rcv in the same way native TCP does? Correct. It is up to the driver to maintain any ancillary state needed to determine that. > > + * - this function needs to correctly handle being called any number of > + * times without any bytes being consumed from the receive buffer. > + * - the driver expects the inpcb lock to be held > + * > + * + tu_disconnect > + * - tells the driver to send FIN to peer > + * - driver is expected to send the remaining data and then do a clean half > close > + * - disconnect implies at least half-close so only send, abort, and detach > + * are legal > > Could you clarify this a bit? Do you mean that TCP guarangees that only > tu_send, tu_abort, and tu_detach will be delivered to the driver in the > future? Those are the only things that make sense, but the driver is not expected to break if TCP does. > > + * - the driver is expected to handle transition through the shutdown > + * state machine and allow the stack to support SO_LINGER. > > Probably worth commenting that the device driver won't detach the toe state. > > + * > + * + tu_abort > + * - closes the connection and sends a RST to peer > + * - driver is expectd to trigger an RST and detach the toepcb > > In regular TCP, the pru_abort method is only called on pending connections > while still in the listen queues of a listen socket. Is this true of > tu_abort, or is tu_abort a more general method to be used to cancel > connections? If so, probably worth commenting on that. tu_abort is called in place of tcp_output in pru_abort. > + * - no further calls are legal after abort > + * - the driver expects the inpcb lock to be held > + * > + * + tu_detach > + * - tells driver that the socket is going away so disconnect > + * the toepcb and free appropriate resources > + * - allows the driver to cleanly handle the case of connection state > + * outliving the socket > + * - no further calls are legal after detach > + * - the driver acquires the tcbinfo lock > > For this call, you haven't specified whether the inpcb lock is held. If it > is, the driver acquiring the tcbinfo lock without first dropping the inpcb > lock would be a lock order reversal. Should the caller instead acquire/hold > it? The inpcb lock no longer exists at this point. > For the above calls, what guarantees does the TCP stack make about the > presence of the socket, if any? The assumptions are the same as those of tcp_output except for syncache_event and detach, at which points the socket does not yet exist or no longer exists. > These interfaces all pass the tcpcb, but in our regular TCP stack, the > invariant is the existence of the inpcb, not the tcpcb, which may be replaced > with a tcptw (or in one edge case, inp_ppcb may be NULL). If there will be > drivers in the future that implement timewait, perhaps we should be passing in > the inpcb? The interface is intended to drop in the place of tcp_output. > > + * + tu_syncache_event > + * - even if it is not actually needed, the driver is expected to > + * call syncache_add for the initial SYN and then syncache_expand > + * for the SYN,ACK > + * - tells driver that a connection either has not been added or has > + * been dropped from the syncache > + * - the driver is expected to maintain state that lives outside the > + * software stack so the syncache needs to be able to notify the > + * toe driver that the software stack is not going to create a connection > + * for a received SYN > + * - the driver is responsible for any synchronization required > > Presumably tu_syncache_event is called from the syncache and locks will be > held when that happens...? The driver doesn't care what locks the syncache holds. The syncache event handler is responsible for acquiring any locks necessary to synchronize with the rest of the driver for the transition from SYN_RCVD -> ESTABLISHED. > > How will the race between the syncache deciding to drop a connection of its > own accord and the hardware/driver deciding to accept be addressed, generally > speaking? That is a driver implementation issue. The one case to avoid is a deadlock between the driver calling syncache_expand and the syncache calling syncache_event. > > +8 > +extern struct toe_usrreqs tcp_offload_usrreqs; > > What is the purpose of this global? Presumably we can have two drivers that > both implement offload at once? I think you're follow on reading of tcp_offload.c answers that question. -Kip From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 19:01:10 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 46C8616A41B; Sat, 15 Dec 2007 19:01:10 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 126A913C447; Sat, 15 Dec 2007 19:01:09 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 98DF2481B2; Sat, 15 Dec 2007 14:01:09 -0500 (EST) Date: Sat, 15 Dec 2007 19:01:09 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Kip Macy In-Reply-To: Message-ID: <20071215184737.A85668@fledge.watson.org> References: <20071215100351.Q70617@fledge.watson.org> <20071215152959.V85668@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD Current , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 19:01:10 -0000 On Sat, 15 Dec 2007, Kip Macy wrote: >> BTW, could you prepare similar comments for toedev.h? > > Ok, toedev.h isn't actually originally from me. However, it is a part of the > API and so should be documented. Thanks -- when designing a KPI, it's important to keep in mind that the goal of the author using it will not be to understand our TCP code, but instead, to implement the driver as quickly and cheaply as possible. Therefore we should make it as easy as possible for the authors of drivers to get it right. This comes out a few other places in my requests for clarification or additions -- imagine you weren't familiar with our TCP, what could you get wrong? >> This structure seems to introduce quite a bit more indirection for >> non-offloaded cases than before, especially to do things like this: > > This should never be called as sc_tu will only be set by the TOE interface > to the syncache. Also bear in mind that this and detach are only called once > per connection The real overhead is in calling tcp_output via an indirect > function call versus a direct function call. Considering the level of > gratuitous overhead in the output path I would be surprised if this were > measurable. Undoubtably this is true for high-end systems sporting PCIe interfaces with 10gbps cards, but we also run in other configurations. A complaint we've had a fair amount in the last few years is that our work has increasingly targeted high-end systems where overhead is in cache misses, and decreasingly targeted low-end systems where overhead is in instruction count. While you offer the opportunity to compile out some of this, I think we should try to make these things capable of being fast in both circumstances without multiple compile paths where it's easy. >> This would compile to a straight call to tcp_output() when offloading isn't >> compiled in, and when it is compiled in and offloading isn't enabled, we do >> a simple flag check rather than invoking a function via a series of >> pointers. > > This is what the current code does. See tcp_ofld.h in CVS. The indirection > was suggested by Sam as a cleaner abstraction. I think I prefer the CVS version of the two, although I would like to collapse the two ifdef parts per my example, so that ifdef and non-ifdef cases are side-by-side, and so that function headers are shared by the two cases. Ideally, we'd make the ifdef very, very small, as it's well-known that when we have two mutually exclusive code paths and one isn't compiled or used frequently, it rots. >> +int tcp_offload_connect(struct socket *so, struct sockaddr *nam); >> >> This prototype appear not to be documented. > > Ok, it appears fairly self-documenting to me =-D I'm sure you can find some insight to express that's not self-obvious in the code :-). >> I find myself wondering if the offload option should be a TCP socket option >> rather than a socket-layer option, but don't have strong feelings about >> this. > > The assumption is that, if you a) pay the extra for a version of the card > that supports TOE and b) you load the toe module (it isn't part of the NIC > driver) that you want your existing software to have its connections > offloaded. I don't know of any customers that want to modify their user > applications to selectively enable TOE. I think you mis-understood my question -- I was wondering whether it should be selectively disabled by an IP-layer socket option rather than a socket-layer socket option. >> What do you intend the policy model to be for enabling TOE, in general? >> That if the TOE capability is available and the TOE capability is enabled, >> all TCP sockets via the enabled interface will be offloaded unless >> SO_NO_OFFLOAD is set? > > That is the current usage model. At some point we may tie it into > pf/ipf/ipfw to provide for offload policy. However, currently we offload all > connections on an interface until we run out of TCAM entries. > >> Are you currently anticipating enabling the capability by default? > > Yes, *if the TOE driver is loaded*. I have somewhat mixed feelings about this, and feel like there should be something eloquent to say about having functionality administratively enabled rather than a default when compiled in, but can't quite figure out a nice expression of that. I think it might have something to do with expecting vendors of 10gbps cards to like shipping two modules for every device rather than one, and having the right behavior out of the box if they do ship just one. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 19:11:26 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 81DBC16A418 for ; Sat, 15 Dec 2007 19:11:26 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.183]) by mx1.freebsd.org (Postfix) with ESMTP id 4C89813C448 for ; Sat, 15 Dec 2007 19:11:26 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so2274371waf.3 for ; Sat, 15 Dec 2007 11:11:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=woJZIPW/sJEZpIYU2RnPVUN9MHnCMSQBsV531Gd5S4o=; b=p2cAWTN4c2BkQ3tv1slPyMb3HqAwxyJ4/cmSvCjg5DJsvlWKAOo0Ee4ytPu/lFOAAy6J1A+Mig6jcC6hJtr4v+RgHxuL3xVI5kK94onWZ6Xq77JtLL74DpPLJ/EUDcZbQJXdqhC2dvTU73qB/vgijPLKn4GUP07a1nvMWwBfwSA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=HG3zMnfHAQx5aLX2Qvdg3xRBLoHfKX4OfX5akEijahvszDrtaESGiCymez512aU8+jj/xQfANYG09Z/g91XJKWnH0n3IWx2CiGO2zrD4Mv6tFz2AGA7a8GD/2c0VQK3xWEtuWAaw+k7ztxmIsVnKnD9JRoGIEjIMaMVXzJDNrdI= Received: by 10.114.146.1 with SMTP id t1mr813439wad.20.1197745885372; Sat, 15 Dec 2007 11:11:25 -0800 (PST) Received: by 10.114.255.11 with HTTP; Sat, 15 Dec 2007 11:11:20 -0800 (PST) Message-ID: Date: Sat, 15 Dec 2007 11:11:20 -0800 From: "Kip Macy" To: "Robert Watson" In-Reply-To: <20071215184737.A85668@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20071215100351.Q70617@fledge.watson.org> <20071215152959.V85668@fledge.watson.org> <20071215184737.A85668@fledge.watson.org> Cc: FreeBSD Current , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 19:11:26 -0000 > > Undoubtably this is true for high-end systems sporting PCIe interfaces with > 10gbps cards, but we also run in other configurations. A complaint we've had > a fair amount in the last few years is that our work has increasingly targeted > high-end systems where overhead is in cache misses, and decreasingly targeted > low-end systems where overhead is in instruction count. While you offer the > opportunity to compile out some of this, I think we should try to make these > things capable of being fast in both circumstances without multiple compile > paths where it's easy. Ok. That was the intent of the initial design. The design intent, however, was that it would be compiled out on slower platforms - sparc64, arm, mips etc. > Ideally, we'd make the ifdef very, very small, as it's well-known that when we > have two mutually exclusive code paths and one isn't compiled or used > frequently, it rots. Good point. > > >> I find myself wondering if the offload option should be a TCP socket option > >> rather than a socket-layer option, but don't have strong feelings about > >> this. > > > > The assumption is that, if you a) pay the extra for a version of the card > > that supports TOE and b) you load the toe module (it isn't part of the NIC > > driver) that you want your existing software to have its connections > > offloaded. I don't know of any customers that want to modify their user > > applications to selectively enable TOE. > > I think you mis-understood my question -- I was wondering whether it should be > selectively disabled by an IP-layer socket option rather than a socket-layer > socket option. I did misunderstand, and yes a TCP_ socket option would probably be more appropriate.. > > That is the current usage model. At some point we may tie it into > > pf/ipf/ipfw to provide for offload policy. However, currently we offload all > > connections on an interface until we run out of TCAM entries. > > > >> Are you currently anticipating enabling the capability by default? > > > > Yes, *if the TOE driver is loaded*. > > I have somewhat mixed feelings about this, and feel like there should be > something eloquent to say about having functionality administratively enabled > rather than a default when compiled in, but can't quite figure out a nice > expression of that. I think it might have something to do with expecting > vendors of 10gbps cards to like shipping two modules for every device rather > than one, and having the right behavior out of the box if they do ship just > one. This is easy enough to arrange, just as we have TSO and jumbo frames that some drivers default to on and others default to off, we can have the TOE capability enabled or disabled by default by the driver. I think that if vendors left TOE disabled by default in the single module case that this would address your concerns, would it not? -Kip From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 19:16:23 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 69DD016A469; Sat, 15 Dec 2007 19:16:23 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 34E2213C4E8; Sat, 15 Dec 2007 19:16:23 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id B08814700F; Sat, 15 Dec 2007 14:16:22 -0500 (EST) Date: Sat, 15 Dec 2007 19:16:22 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Kip Macy In-Reply-To: Message-ID: <20071215190252.I85668@fledge.watson.org> References: <20071215100351.Q70617@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD Current , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 19:16:23 -0000 On Sat, 15 Dec 2007, Kip Macy wrote: > The current implementation bypasses the firewall. This and likely other > hardware has extensive filtering support so it isn't neccessarily intrinsic. I'm not sure I agree when it comes to features like DUMMYNET, NAT, BPF, etc. TCP offload completely bypasses, by its very intent, most of the network stack. > The usage model at this moment is that the customer makes a conscious > decision to load the TOE driver and understands the implications. I think > this is quite adequate for 10GigE cards currently. However, this will need > to be revisited when these chips start showing up on mainstream > motherboards. I think I would prefer that our policy switch be the capenable flag, so that compiling things in or out (or loading, which is the logical equivilent) doesn't change functional behavior for existing interfaces. >> While I'm familiar with TCP, I'm less familiar with the scope of what cards >> support for TOE. Do we know of any cards that are less capable than the >> chelsio card in this respect, or are they all sort of on-par on that front? >> I.e., do we think the above eventuality is likely? > > I don't have any way of knowing. I think it is probably safe to say that any > vendors that don't meet that criteria now will in the future as transistor > density increases. I think it behooves us to find out, given that we're designing a KPI for those cards also. I agree with the transistor argument, and given that TOE is a fairly undeployed technology at this point, it may quickly resolve itself if it hasn't. >> If we don't, then one of the things I'd like to see us do is fairly >> carefully assert, at least for a few months, that TCP never "slips" into >> any transmission-related paths that could lead to truly odd and >> hard-to-diagnose behavior when runnning with TOE. I.e., tcp_output, etc. > > I'm happy to do that. However, I see problems introduced by offloading > connections as being driver bugs much the same as problems caused by the > driver's TCP segmentation offload or checksum offload. The problems will be > isolated to connections using a specific interface. Interesting point -- it's amazing how broken checksum processing in, and TCP is many orders of magnitude more complex. >> the socket code, both for sending/receiving. You talk a bit about >> "credit", but introducing it up-front would be useful. > > I didn't realize a definition was necessary. To the best of my knowledge > this is the common term used when discussing flow control. I've seen it used > for Fibre Channel and IB. The one ambiguity that arises is whether or not it > refers to bytes or segments. I think a phrase wouldn't hurt; also, I notice you did only address flow control in one direction in the comments, which is why I mentioned both sending and receiving. The clearer we make this, the happier we'll be. I suspect we'll actually want to move a lot of this text from the include file to the man page for the TOE interface... >> (3) Could you talk at a high level about the ways in which TOE drivers will >> interact with TCP? You do it a bit in each of the sections, but if >> there's a principle, pulling it out would be useful. Also, you should >> indicate whether the driver is allowed to drop the inpcb lock or not. > > I've done my best to minimize changes to TCP. It is safe to assume that the > invariants are the same as those for tcp_output. I think we should ask the > author of tcp_output to document the interface, expected state transitions, > and its invariants (joke). :-P Documenting locking semantics such as "You can rely on lock X being held, but do not drop it" takes an extra phrase and can save someone a lot of time. >> I'm a bit confused by the description of the error condition here. Could >> you clarify when a driver should return an error, and what the impact of an >> error returned will be on the connection state? In fact, it probably makes >> sense to have an up-front comment on conventions for error-handling -- if >> TOE returns an error will that generally lead to a TCP tear-down? > > The offload routines are substituted for tcp_output and thus should interact > with the stack in the same way. By extension they should have the same > failure modes and invariants. Most driver authors will not be intimately familiar with tcp_output()'s subleties, and documenting error-handling for a KPI is always a good idea. > The interface is intended to drop in the place of tcp_output. <"see what tcp_output does" repeated many times> tcp_output() was previously an internal function of the TCP code, and now the semantics are being exposed to device drivers. Let's not perpetuate poorly documented driver interfaces by adding another one. I think it would be a reasonable expectation of a driver author to have consistent documentation of the life cycle of data structures and objects, locking expectations and requirements, and the semantics for error values from functions. Certainly, they need to look at TCP a fair amount because they'll be pulling things out of inpcb, tcpcb, etc, but I'd rather we limit that requirement to simple things (addresses, socket options) that are relatively static and avoid it being for complex things (locking, error handling) that tend to be more subject to change. Also, if you document what you think the behavior is or should be, we can then check to see if we agree. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 19:44:37 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A222316A417 for ; Sat, 15 Dec 2007 19:44:37 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.180]) by mx1.freebsd.org (Postfix) with ESMTP id 6E5D613C44B for ; Sat, 15 Dec 2007 19:44:37 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so2290152waf.3 for ; Sat, 15 Dec 2007 11:44:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=7yysAaCrdon/nvxft4lKcfyuoNhIspPYfuMhAc3I8Dc=; b=rb0iT2CsvdhWeHg91kdvpqz0ObwDSqlhzqazwzO94yba5gvWWvk/z5RVfjE4Rz8EK6Wl0MMzbnArHyNecuGHqBuyLlTzUC/BCrsTpKlUN+IN3wqz0GnLtQ2mu7LxaFUG+AZRVw1BclwGP/X+gRndpvyzMh1iMZRQiABw7UEIw0Q= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=aKBBwWFoa8v72VqSZVLi/WWlrpHqVICruKNIdMyytV0RwiuJqMl6xdcDdzjnrgHZcue+STOMBNYrhT/agdTmpkPzHsIpb4Tes3DR0VOo1qWWCrQJU6LoDei87XaL2J5W7MvbEVhI3GHWNrmyU344ah7Xs1OVS/0SRFFMqoK7Y7s= Received: by 10.114.60.19 with SMTP id i19mr1003839waa.142.1197747876704; Sat, 15 Dec 2007 11:44:36 -0800 (PST) Received: by 10.114.255.11 with HTTP; Sat, 15 Dec 2007 11:44:36 -0800 (PST) Message-ID: Date: Sat, 15 Dec 2007 11:44:36 -0800 From: "Kip Macy" To: "Robert Watson" In-Reply-To: <20071215190252.I85668@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20071215100351.Q70617@fledge.watson.org> <20071215190252.I85668@fledge.watson.org> Cc: FreeBSD Current , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 19:44:37 -0000 On Dec 15, 2007 11:16 AM, Robert Watson wrote: > I think I would prefer that our policy switch be the capenable flag, so that > compiling things in or out (or loading, which is the logical equivilent) > doesn't change functional behavior for existing interfaces. I believe I said something similar to that in my recent mail. > I think it behooves us to find out, given that we're designing a KPI for those > cards also. I agree with the transistor argument, and given that TOE is a > fairly undeployed technology at this point, it may quickly resolve itself if > it hasn't. I certainly agree. However, where do you stand if they are unwilling to cooperate? > Interesting point -- it's amazing how broken checksum processing in, and TCP > is many orders of magnitude more complex. Correct, it isn't a given that a vendor's TCP implementation will work correctly. > I think a phrase wouldn't hurt; also, I notice you did only address flow > control in one direction in the comments, which is why I mentioned both > sending and receiving. It is fairly implicit for the sending case. There the driver returns credits to the stack via sbdrop(). I'll have to think about how to describe the two in a uniform fashion. > Documenting locking semantics such as "You can rely on lock X being held, but > do not drop it" takes an extra phrase and can save someone a lot of time. I *think* I've done that, I guess I could be clearer and say that the inpcb lock is held and expected not to be dropped. > > Most driver authors will not be intimately familiar with tcp_output()'s > subleties, and documenting error-handling for a KPI is always a good idea. I'll do my best. However, the TCP stack as it exists now really isn't very modular at all. This is intended to allow developers to skip duplicating large swaths of tcp code with small local changes the way they do on Linux. > > The interface is intended to drop in the place of tcp_output. > <"see what tcp_output does" repeated many times> > > tcp_output() was previously an internal function of the TCP code, and now the > semantics are being exposed to device drivers. Let's not perpetuate poorly > documented driver interfaces by adding another one. I think it would be a > reasonable expectation of a driver author to have consistent documentation of > the life cycle of data structures and objects, locking expectations and > requirements, and the semantics for error values from functions. Certainly, > they need to look at TCP a fair amount because they'll be pulling things out > of inpcb, tcpcb, etc, but I'd rather we limit that requirement to simple > things (addresses, socket options) that are relatively static and avoid it > being for complex things (locking, error handling) that tend to be more > subject to change. Also, if you document what you think the behavior is or > should be, we can then check to see if we agree. To the extent possible, yes. I'm not convinced that anyone person knows what all the existing invariants are in the stack as it is now. Do you feel that a Stevens'-esque understanding of the environment around the calls is necessary? I know it sounds like "other people beat their wives so I can too", but even something as well documented as ifnet gives no indication of what the locking conventions - e.g. you can't sleep or acquire sx locks in if_ioctl. The demands placed should be no greater than those placed on existing subsystems and should take into account the hitherto somewhat black box nature of TCP. -Kip From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 21:51:36 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D652316A417; Sat, 15 Dec 2007 21:51:36 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 9F7A113C455; Sat, 15 Dec 2007 21:51:35 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 6D36548C2A; Sat, 15 Dec 2007 16:51:35 -0500 (EST) Date: Sat, 15 Dec 2007 21:51:35 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Kip Macy In-Reply-To: Message-ID: <20071215214253.N85668@fledge.watson.org> References: <20071215100351.Q70617@fledge.watson.org> <20071215190252.I85668@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD Current , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 21:51:36 -0000 On Sat, 15 Dec 2007, Kip Macy wrote: >> I think it behooves us to find out, given that we're designing a KPI for >> those cards also. I agree with the transistor argument, and given that TOE >> is a fairly undeployed technology at this point, it may quickly resolve >> itself if it hasn't. > > I certainly agree. However, where do you stand if they are unwilling to > cooperate? I think we should make a best faith effort to figure out how to do the right thing. Employing harsh interrogation tactics is probably not called for, but we have several vendors who are regularly involved in the FreeBSD community and may be willing to lend their insight as to their requirements, even if an implementation isn't immediately forthcoming. >> tcp_output() was previously an internal function of the TCP code, and now >> the semantics are being exposed to device drivers. Let's not perpetuate >> poorly documented driver interfaces by adding another one. I think it >> would be a reasonable expectation of a driver author to have consistent >> documentation of the life cycle of data structures and objects, locking >> expectations and requirements, and the semantics for error values from >> functions. Certainly, they need to look at TCP a fair amount because >> they'll be pulling things out of inpcb, tcpcb, etc, but I'd rather we limit >> that requirement to simple things (addresses, socket options) that are >> relatively static and avoid it being for complex things (locking, error >> handling) that tend to be more subject to change. Also, if you document >> what you think the behavior is or should be, we can then check to see if we >> agree. > > To the extent possible, yes. I'm not convinced that anyone person knows what > all the existing invariants are in the stack as it is now. Do you feel that > a Stevens'-esque understanding of the environment around the calls is > necessary? I know it sounds like "other people beat their wives so I can > too", but even something as well documented as ifnet gives no indication of > what the locking conventions - e.g. you can't sleep or acquire sx locks in > if_ioctl. The demands placed should be no greater than those placed on > existing subsystems and should take into account the hitherto somewhat black > box nature of TCP. Actually, what I was asking for in the omitted context above was something along the lines of the following, adapted for whatever the reality may be: Returning a non-zero value will lead to the software stack beginning a disconnect. Or, say, Non-zero return values will be ignored. (*) This is not intended as a contrarian point. I'm not looking for a complete exposition of the behavior of the stack -- rather, basic information that we should be documenting about a KPI, such as what an error being returned will do. Robert N M Watson Computer Laboratory University of Cambridge (*) In which case perhaps it should return void. From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 22:02:29 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3B57F16A550 for ; Sat, 15 Dec 2007 22:02:29 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.183]) by mx1.freebsd.org (Postfix) with ESMTP id AE5F813C45B for ; Sat, 15 Dec 2007 22:02:28 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so2357986waf.3 for ; Sat, 15 Dec 2007 14:02:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=GJ80xv4ybbGGM2jFPYOWs0MttisbZ6W6ea2OH9I8h08=; b=su269A67sQ31FcV7k8jvjE1Bftek9Ppp8wrX+Fswi1ckLskwtP+Kk0wOdqM/hG3KrMWiJpfljfuY2brAvccEkygnf5WWpdZBHRSKmDQeqBACxPhpQhC6UFENE7zWOQyt2cT6VGtqKzbujpwbLDGP31gMbUzOahhwTK7DxKojW4w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=rOCYY1vN5VXTd3LHU+Rnd7xzkO6FNLK7fat8o25Rb5EksHf+31asZ6xbn+bdcnW4TUgNjaEwpTcmNmLoHdhOitfsg/rfyrxxeoQqj1EpRlrAz/3jr+9htKCYXvzWqYnlmDShns85QfvTsTHf3IwWEDZok2AbJZ/oyTCeMvuaJPE= Received: by 10.115.48.12 with SMTP id a12mr838011wak.149.1197756148165; Sat, 15 Dec 2007 14:02:28 -0800 (PST) Received: by 10.114.255.11 with HTTP; Sat, 15 Dec 2007 14:02:28 -0800 (PST) Message-ID: Date: Sat, 15 Dec 2007 14:02:28 -0800 From: "Kip Macy" To: "Robert Watson" In-Reply-To: <20071215214253.N85668@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20071215100351.Q70617@fledge.watson.org> <20071215190252.I85668@fledge.watson.org> <20071215214253.N85668@fledge.watson.org> Cc: FreeBSD Current , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 22:02:29 -0000 > I think we should make a best faith effort to figure out how to do the right > thing. Employing harsh interrogation tactics is probably not called for, but > we have several vendors who are regularly involved in the FreeBSD community > and may be willing to lend their insight as to their requirements, even if an > implementation isn't immediately forthcoming. :-) - the two vendors that I know of have not been active in the community. Sam has initiated contact on my behalf with the one vendor that might be willing to talk to us. The other vendor has an established pattern of ignoring all requests. > > >> tcp_output() was previously an internal function of the TCP code, and now > >> the semantics are being exposed to device drivers. Let's not perpetuate > >> poorly documented driver interfaces by adding another one. I think it > >> would be a reasonable expectation of a driver author to have consistent > >> documentation of the life cycle of data structures and objects, locking > >> expectations and requirements, and the semantics for error values from > >> functions. Certainly, they need to look at TCP a fair amount because > >> they'll be pulling things out of inpcb, tcpcb, etc, but I'd rather we limit > >> that requirement to simple things (addresses, socket options) that are > >> relatively static and avoid it being for complex things (locking, error > >> handling) that tend to be more subject to change. Also, if you document > >> what you think the behavior is or should be, we can then check to see if we > >> agree. > > > > To the extent possible, yes. I'm not convinced that anyone person knows what > > all the existing invariants are in the stack as it is now. Do you feel that > > a Stevens'-esque understanding of the environment around the calls is > > necessary? I know it sounds like "other people beat their wives so I can > > too", but even something as well documented as ifnet gives no indication of > > what the locking conventions - e.g. you can't sleep or acquire sx locks in > > if_ioctl. The demands placed should be no greater than those placed on > > existing subsystems and should take into account the hitherto somewhat black > > box nature of TCP. > > Actually, what I was asking for in the omitted context above was something > along the lines of the following, adapted for whatever the reality may be: > > Returning a non-zero value will lead to the software stack beginning a > disconnect. > > Or, say, > > Non-zero return values will be ignored. (*) > > This is not intended as a contrarian point. I'm not looking for a complete > exposition of the behavior of the stack -- rather, basic information that we > should be documenting about a KPI, such as what an error being returned will > do. That is quite reasonable. I perceived your initial request as being entirely too open-ended. -Kip From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 22:41:04 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F21F016A418; Sat, 15 Dec 2007 22:41:03 +0000 (UTC) (envelope-from SRS0=175ffd958ccf631e3f50bc84e4580880c9a9c317=550=es.net=oberman@es.net) Received: from postal1.es.net (postal1.es.net [IPv6:2001:400:14:3::6]) by mx1.freebsd.org (Postfix) with ESMTP id 437A213C44B; Sat, 15 Dec 2007 22:41:03 +0000 (UTC) (envelope-from SRS0=175ffd958ccf631e3f50bc84e4580880c9a9c317=550=es.net=oberman@es.net) Received: from ptavv.es.net (ptavv.es.net [198.128.4.29]) by postal1.es.net (Postal Node 1) with ESMTP (SSL) id UIE67201; Sat, 15 Dec 2007 14:41:01 -0800 Received: from ptavv.es.net (ptavv.es.net [127.0.0.1]) by ptavv.es.net (Tachyon Server) with ESMTP id 6D1A045013; Sat, 15 Dec 2007 14:41:01 -0800 (PST) To: "Kip Macy" In-Reply-To: Your message of "Sat, 15 Dec 2007 14:02:28 PST." Mime-Version: 1.0 Content-Type: multipart/signed; boundary="==_Exmh_1197758461_12041P"; micalg=pgp-sha1; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit Date: Sat, 15 Dec 2007 14:41:01 -0800 From: "Kevin Oberman" Message-Id: <20071215224101.6D1A045013@ptavv.es.net> X-Sender-IP: 198.128.4.29 X-Sender-Domain: es.net X-Recipent: ; ; ; ; X-Sender: X-To_Name: Kip Macy X-To_Domain: gmail.com X-To: "Kip Macy" X-To_Email: kip.macy@gmail.com X-To_Alias: kip.macy Cc: FreeBSD Current , Robert Watson , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 22:41:04 -0000 --==_Exmh_1197758461_12041P Content-Type: text/plain; charset=us-ascii Content-Disposition: inline > Date: Sat, 15 Dec 2007 14:02:28 -0800 > From: "Kip Macy" > Sender: owner-freebsd-current@freebsd.org > > > I think we should make a best faith effort to figure out how to do the right > > thing. Employing harsh interrogation tactics is probably not called for, but > > we have several vendors who are regularly involved in the FreeBSD community > > and may be willing to lend their insight as to their requirements, even if an > > implementation isn't immediately forthcoming. > > :-) - the two vendors that I know of have not been active in the > community. Sam has initiated contact on my behalf with the one vendor > that might be willing to talk to us. The other vendor has an > established pattern of ignoring all requests. > > > > > >> tcp_output() was previously an internal function of the TCP code, and now > > >> the semantics are being exposed to device drivers. Let's not perpetuate > > >> poorly documented driver interfaces by adding another one. I think it > > >> would be a reasonable expectation of a driver author to have consistent > > >> documentation of the life cycle of data structures and objects, locking > > >> expectations and requirements, and the semantics for error values from > > >> functions. Certainly, they need to look at TCP a fair amount because > > >> they'll be pulling things out of inpcb, tcpcb, etc, but I'd rather we limit > > >> that requirement to simple things (addresses, socket options) that are > > >> relatively static and avoid it being for complex things (locking, error > > >> handling) that tend to be more subject to change. Also, if you document > > >> what you think the behavior is or should be, we can then check to see if we > > >> agree. > > > > > > To the extent possible, yes. I'm not convinced that anyone person knows what > > > all the existing invariants are in the stack as it is now. Do you feel that > > > a Stevens'-esque understanding of the environment around the calls is > > > necessary? I know it sounds like "other people beat their wives so I can > > > too", but even something as well documented as ifnet gives no indication of > > > what the locking conventions - e.g. you can't sleep or acquire sx locks in > > > if_ioctl. The demands placed should be no greater than those placed on > > > existing subsystems and should take into account the hitherto somewhat black > > > box nature of TCP. > > > > Actually, what I was asking for in the omitted context above was something > > along the lines of the following, adapted for whatever the reality may be: > > > > Returning a non-zero value will lead to the software stack beginning a > > disconnect. > > > > Or, say, > > > > Non-zero return values will be ignored. (*) > > > > This is not intended as a contrarian point. I'm not looking for a complete > > exposition of the behavior of the stack -- rather, basic information that we > > should be documenting about a KPI, such as what an error being returned will > > do. > > That is quite reasonable. I perceived your initial request as being > entirely too open-ended. We certainly know who provides support for Intel and Myricom cards (unless there has been a recent change of which I am unaware) and I happen to work across the hall from the guy who has been upgrading the Neterion drivers for them and I suspect provided the recent new versions to them. Am I missing anyone? If they are contacted and express disinterest or don't respond, I think Kip has to proceed as best he can. We use three of the vendors I know of with FreeBSD, so can push as a customer, too. We will be ordering more 10GE cards from someone soon and support for TOE could be a significant issue in selection. Until now it has not been mentioned since there was no prospect of near-term FreeBSD support, but that is clearly no longer the case. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751 --==_Exmh_1197758461_12041P Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) Comment: Exmh version 2.5 06/03/2002 iD8DBQFHZFf9kn3rs5h7N1ERAitpAJ9mP+hrmQPP71rmVJe+Hd4b22hkWgCgh7Lm zAZZ2t/fF9qf42P9u8mz3j0= =qjFn -----END PGP SIGNATURE----- --==_Exmh_1197758461_12041P-- From owner-freebsd-arch@FreeBSD.ORG Sat Dec 15 22:47:13 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6613116A479; Sat, 15 Dec 2007 22:47:13 +0000 (UTC) (envelope-from sam@errno.com) Received: from ebb.errno.com (ebb.errno.com [69.12.149.25]) by mx1.freebsd.org (Postfix) with ESMTP id E719813C4F0; Sat, 15 Dec 2007 22:47:12 +0000 (UTC) (envelope-from sam@errno.com) Received: from trouble.errno.com (trouble.errno.com [10.0.0.248]) (authenticated bits=0) by ebb.errno.com (8.13.6/8.12.6) with ESMTP id lBFMl0Z7085175 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 15 Dec 2007 14:47:00 -0800 (PST) (envelope-from sam@errno.com) Message-ID: <47645963.40301@errno.com> Date: Sat, 15 Dec 2007 14:46:59 -0800 From: Sam Leffler User-Agent: Thunderbird 2.0.0.9 (X11/20071125) MIME-Version: 1.0 To: Kevin Oberman References: <20071215224101.6D1A045013@ptavv.es.net> In-Reply-To: <20071215224101.6D1A045013@ptavv.es.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-DCC-dcc-servers-Metrics: om; whitelist Cc: FreeBSD Current , freebsd-arch@freebsd.org Subject: Re: pending changes for TOE support X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2007 22:47:13 -0000 Kevin Oberman wrote: >> Date: Sat, 15 Dec 2007 14:02:28 -0800 >> From: "Kip Macy" >> Sender: owner-freebsd-current@freebsd.org >> >> >>> I think we should make a best faith effort to figure out how to do the right >>> thing. Employing harsh interrogation tactics is probably not called for, but >>> we have several vendors who are regularly involved in the FreeBSD community >>> and may be willing to lend their insight as to their requirements, even if an >>> implementation isn't immediately forthcoming. >>> >> :-) - the two vendors that I know of have not been active in the >> community. Sam has initiated contact on my behalf with the one vendor >> that might be willing to talk to us. The other vendor has an >> established pattern of ignoring all requests. >> >> >>>>> tcp_output() was previously an internal function of the TCP code, and now >>>>> the semantics are being exposed to device drivers. Let's not perpetuate >>>>> poorly documented driver interfaces by adding another one. I think it >>>>> would be a reasonable expectation of a driver author to have consistent >>>>> documentation of the life cycle of data structures and objects, locking >>>>> expectations and requirements, and the semantics for error values from >>>>> functions. Certainly, they need to look at TCP a fair amount because >>>>> they'll be pulling things out of inpcb, tcpcb, etc, but I'd rather we limit >>>>> that requirement to simple things (addresses, socket options) that are >>>>> relatively static and avoid it being for complex things (locking, error >>>>> handling) that tend to be more subject to change. Also, if you document >>>>> what you think the behavior is or should be, we can then check to see if we >>>>> agree. >>>>> >>>> To the extent possible, yes. I'm not convinced that anyone person knows what >>>> all the existing invariants are in the stack as it is now. Do you feel that >>>> a Stevens'-esque understanding of the environment around the calls is >>>> necessary? I know it sounds like "other people beat their wives so I can >>>> too", but even something as well documented as ifnet gives no indication of >>>> what the locking conventions - e.g. you can't sleep or acquire sx locks in >>>> if_ioctl. The demands placed should be no greater than those placed on >>>> existing subsystems and should take into account the hitherto somewhat black >>>> box nature of TCP. >>>> >>> Actually, what I was asking for in the omitted context above was something >>> along the lines of the following, adapted for whatever the reality may be: >>> >>> Returning a non-zero value will lead to the software stack beginning a >>> disconnect. >>> >>> Or, say, >>> >>> Non-zero return values will be ignored. (*) >>> >>> This is not intended as a contrarian point. I'm not looking for a complete >>> exposition of the behavior of the stack -- rather, basic information that we >>> should be documenting about a KPI, such as what an error being returned will >>> do. >>> >> That is quite reasonable. I perceived your initial request as being >> entirely too open-ended. >> > > We certainly know who provides support for Intel and Myricom cards > (unless there has been a recent change of which I am unaware) and I > happen to work across the hall from the guy who has been upgrading the > Neterion drivers for them and I suspect provided the recent new versions > to them. > > Am I missing anyone? > > If they are contacted and express disinterest or don't respond, I think > Kip has to proceed as best he can. > > We use three of the vendors I know of with FreeBSD, so can push as a > customer, too. We will be ordering more 10GE cards from someone soon and > support for TOE could be a significant issue in selection. Until now it > has not been mentioned since there was no prospect of near-term FreeBSD > support, but that is clearly no longer the case. > So far as I know none of Intel, Myricom, and Neterion support TOE. Sam