From owner-freebsd-current@FreeBSD.ORG Wed Dec 17 10:25:07 2008 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D11FD1065677; Wed, 17 Dec 2008 10:25:07 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mail.cksoft.de (mail.cksoft.de [62.111.66.27]) by mx1.freebsd.org (Postfix) with ESMTP id 49CCF8FC1B; Wed, 17 Dec 2008 10:25:07 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from localhost (amavis.str.cksoft.de [192.168.74.71]) by mail.cksoft.de (Postfix) with ESMTP id E061F41C6BB; Wed, 17 Dec 2008 11:25:05 +0100 (CET) X-Virus-Scanned: amavisd-new at cksoft.de Received: from mail.cksoft.de ([62.111.66.27]) by localhost (amavis.str.cksoft.de [192.168.74.71]) (amavisd-new, port 10024) with ESMTP id jYrK+caOXMY5; Wed, 17 Dec 2008 11:25:05 +0100 (CET) Received: by mail.cksoft.de (Postfix, from userid 66) id 62C3541C64C; Wed, 17 Dec 2008 11:25:05 +0100 (CET) Received: from maildrop.int.zabbadoz.net (maildrop.int.zabbadoz.net [10.111.66.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.int.zabbadoz.net (Postfix) with ESMTP id B2B184448D5; Wed, 17 Dec 2008 10:23:17 +0000 (UTC) Date: Wed, 17 Dec 2008 10:23:16 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@maildrop.int.zabbadoz.net To: Peter Wemm , FreeBSD virtualization mailing list , FreeBSD current mailing list In-Reply-To: Message-ID: <20081217084354.T97918@maildrop.int.zabbadoz.net> References: <200812132159.mBDLxIQv040799@svn.freebsd.org> X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: HEADS UP: vimage ABI constraints on container structs [was: Re: svn commit: r186057 - head/sys/netinet] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: FreeBSD virtualization mailing list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Dec 2008 10:25:08 -0000 On Tue, 16 Dec 2008, Peter Wemm wrote: Hi, let me Cc: virtualization and current@ for this reply (to have the explicit HEADS UP) for what Peter pointed out. The first to reply may want to trim the Cc: list again; possibly to only virtualization. > On Sat, Dec 13, 2008 at 1:59 PM, Bjoern A. Zeeb wrote: >> De-virtualize the MD5 context for TCP initial seq number generation >> and make it a function local variable like we do almost everywhere >> inside the kernel. > [..] >> --- head/sys/netinet/vinet.h Sat Dec 13 21:17:46 2008 (r186056) >> +++ head/sys/netinet/vinet.h Sat Dec 13 21:59:18 2008 (r186057) >> @@ -142,7 +142,6 @@ struct vnet_inet { >> int _isn_last_reseed; >> u_int32_t _isn_offset; >> u_int32_t _isn_offset_old; >> - MD5_CTX _isn_ctx; >> >> struct inpcbhead _udb; >> struct inpcbinfo _udbinfo; > > I'm bitterly unhappy with this. Every time these structs are touched, > either directly or indirectly, there is a guaranteed ABI breakage with > kernel modules. > > There needs to be a __FreeBSD_version bump (or something similar) > every time any of these structures change, and any kernel modules > *must* be prevented from loading. It can't be a >= some version, it > has to be an exact match. > > With the global variable method the linker calculates the offsets at > load time. With this abomination, the knowledge of the structure > layout is compiled into the generated code with no chance of a fixup. > There are no sanity checks. If a module that referenced _isn_ctx is > loaded the old way, there would be a link error. The new way will > have it silently trash _udb instead. > > There is a whole world of hurt being unleashed here. I suspect that > we might even be possible to run out of digits in our > __FreeBSD_version numbering scheme. > > I know we've talked about ABI-stable alternatives after the > infrastructure is done, but it needs to be absolutely clear that > touching this structure in the current form is a guaranteed ABI break, > and is currently undetected. If you boot kernel.old, you're hosed. > (Again, with -current, this might be ok for a while, but this scheme > won't wash with ports or other 3rd party modules) ... > In the mean time, I'd like to see some compile-time asserts in there > to make sure there are no accidental size changes of this structure. Ok, there are multiple things here: Size changes: ---------------------------------------- * I think catching pure size changes alone are not enough; any changes to the structs matter. Think of removing an int and adding an int (even at the same place). Something working on this will trash it. * The size changes of course would guard about non-obvious, indirect struct changes like (just an example) a change to struct inpcbinfo that make me worry even more (like they seem to worry you). More on structure changes: ---------------------------------------- * While this is on HEAD (refering to "Again, with -current, this might be ok for a while ..") I expected the following (major) changes coming with the continuing integration and testing: 1) Final passthrough on the set of virtualized variables. That might happen after Step #3 when people can actually test of SVN. 2) During Benchmarking - this would be about the same time - there might be shuffling. 3) Before we turn to a STABLE branch. *fear* * Again it's the indirect changes as said above (which are very worryingly). ABI problem: ---------------------------------------- * I agree that there are unaddressed challenges here. * Ideally we want version checking - at least at load time for modules - per container struct, as people do not want to change and recompile everything if say something in gif or ipsec changes which might not affect them. netgraph has done something like this but my feeling is that that would be the wrong way to go, especially wrt to vinet/vinet6. * I am not sure if padding as we had it before will work for stable branches. We need to think of this problem as well. To my understanding MAC has another really large structure with sufficient padding but it's a subsystem more or less living on its own on the side, not as heavily changes between branches and MFCed as the netstack and it's (function) pointers there and less direct members. * If you have suggestions or solutions, please share them. * .. Misc: ---------------------------------------- * I was aware of the problem and failed on two fronts here: 1) Doing my commit after Marko and forgetting about it going from the p4 branches to SVN. 2) Forgetting to mention this in the HEADs UP:( * The http://wiki.freebsd.org/Image/BeginnersGuideFAQ had a (somewhat hidden) "A single change would require complete recompiliation." I factored it out but will need to be more explicit there or refer to this thread. * Thanks a lot for sending this out to comitters and making them aware of the problem. I Cc:ed current@ and virtualization@ in the reply and change the subject. * I am happy there is another pair of eyes here now:) After losing the initial three other reviewers ... * I think the current mode (is?/)was "getting the infrastructre in" and get more hands (again) for the remaining parts when inevitable confronted with it and the huge pile of changes is gone and one might see more clear again. * We may want to always keep in mind that, not for 8 but maybe for 9, people may want to further virtualize more subsystems so whatever we do should possibly be general enough to work for other parts of the kernel as well. * As a very last resort at the moment (which would be a pitty) we can still do the rollback, keep the globals (as default) and only have virtualization optional. I would expect there to be some "keeping in sync" problems but it would be a lot easier to have both working and have people to adhere to the ``V_rules'' than on the side. It would be kind of hard for 3rd party vendors to supply (binary) modules with that then though. I really hope we won't need to go there. /bz -- Bjoern A. Zeeb The greatest risk is not taking one.