From owner-freebsd-current@FreeBSD.ORG Fri Mar 17 14:24:09 2006 Return-Path: X-Original-To: current@FreeBSD.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B4BD416A400 for ; Fri, 17 Mar 2006 14:24:09 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6DF8F43D46 for ; Fri, 17 Mar 2006 14:24:09 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 36DD446B16 for ; Fri, 17 Mar 2006 09:23:44 -0500 (EST) Date: Fri, 17 Mar 2006 14:25:13 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: current@FreeBSD.org Message-ID: <20060317141627.W2181@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Subject: HEADS UP: network stack and socket hackery over the next few weeks X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Mar 2006 14:24:09 -0000 Over the next few weeks, I'll be doing a fairly serious workworking of the socket and protocol reference models, in order to clean up a number of long-standing race conditions and provide infrastructure for significant locking optimizations for several protocols (including TCP). This is high risk work, in that this part of the socket code is very complex and there are a great many subtleties. Part of the goal of the work is to eliminate some of this complexity, and make the subtle a bit more obvious (and documented), so I think it's all for the good in the long term. However, it will likely introduce significant instability in the short term, especially in the TCP code where there will be substantial changes in the memory management model. I've started merging minor parts of the patch over the last few days, but things will get serious around April 1 when the deadline for maintenance on the netatm stack expires (see arch@ and net@ posts about this), allowing me to bring in changes that are not known to work with netatm. As such, be warned that things may get a bit messy! Just as a high level outline of changes in the pipeline: - Increase the strength of invariants relating to so_pcb pointers in protocols, including generally guaranteeing that as long as the socket hasn't been detached or aborted, those pointers will be non-NULL. - Eliminate sotryfree(), leaving just sofree(). - Eliminate use of so_pcb checking in the socket layer as an implicit reference model for protocols that need to hold onto sockets. For now, replace it with SS_PROTOREF, but in the future, possibly just so_count. Generally normalize the socket reference count model. - Eliminate the need to hold locks in order to follow so_pcb pointers throughout the protocol code, allowing us to avoid acquiring the pcbinfo lock in many important protocol entry paths from the socket code in TCP. Especially, the send and receive paths. - This requires significant reworking of the memory management model of TCP so that it doesn't spontaneously discard PCB state all over the place. This is a high risk change, but with significant payoffs. - pru_abort and pru_detach will no longer be allowed to fail. Things facilitated by these changes: - Eliminate a number of known race conditions. - Edge towards eliminating ACCEPT_LOCK(). - Avoid acquiring global TCP locks in common protocol paths from the socket layer, reducing tcbinfo lock contention. - Eliminate many excessive and sometimes gratuitous checks relating to so_pcb, avoiding a lot of complicated and now unnecessary error-handling in protocols. - Pave the way for adding true references to TCP PCB's in the in-bound TCP path, further reducing contention on tcbinfo. All good stuff, but requires the ground-work to be laid first. For those with p4 access, you can track the current work branch in rwatson_sockref. Oh, and despite my best efforts, and testing by a number of developers, it's likely TCP will become somewhat broken during this work. Be warned! Robert N M Watson