From owner-freebsd-hackers@FreeBSD.ORG Wed May 30 01:43:45 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C1C0A106564A for ; Wed, 30 May 2012 01:43:45 +0000 (UTC) (envelope-from rwatson@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 76DA48FC18 for ; Wed, 30 May 2012 01:43:45 +0000 (UTC) Received: from [10.175.134.213] (173-13-112-142-NewEngland.hfc.comcastbusiness.net [173.13.112.142]) by cyrus.watson.org (Postfix) with ESMTPSA id 415DC46B17; Tue, 29 May 2012 21:43:43 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: text/plain; charset=iso-8859-1 From: "Robert N. M. Watson" In-Reply-To: Date: Tue, 29 May 2012 21:43:40 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <02692B7F-02AD-49A5-A3D5-C92F03E7147C@freebsd.org> References: To: vasanth rao naik sabavat X-Mailer: Apple Mail (2.1257) Cc: "freebsd-hackers@freebsd.org" Subject: Re: SMP: protocol control block protection for a multithreaded process (ex: udp). X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 May 2012 01:43:45 -0000 On 29 May 2012, at 21:09, vasanth rao naik sabavat wrote: > I am trying to understand the socket <--> protocol layer as part of = our project. I was trying to understand why the sotoinpcb() is called = before taking any locks. Also, I am trying to understand scenario of a = multi-threaded process trying to do socket operations simultaneously on = a multicore cpu. >=20 > I have gone through the socket life cycle comments in the code and = gave good understanding of the socket life cycle. Thank you for the = reference. Hi Vasanth: Historically, the so->so_pcb pointer in BSD was protected by spl's, and = could only be followed safely while at an elevated spl (probably splnet = -- details forgotten at this point!). In FreeBSD 6.x, I made a substantial revisions to the semantics of the = socket<->pcb relationship in order to reduce the amount of = synchronisation required. Among other things, I made it so that the = validity of the so->so_pcb pointer is entirely defined by the protocol, = and also made it so that all protocols could safely follow so->so_pcb = without locks held, by virtue of the reference model. This trades off = slightly greater memory use (inpcbs are always allocated for sockets, = even after they have closed) for reduced synchronisation overhead + = improved stability (due to reduced complexity). The socket life cycle = ensures that no access to so->so_pcb occurs before pru_attach() has = returned, and also ensures that no socket access will occur from the = moment pru_detach() is called. As pru_attach() and pru_detach() are = responsible for allocating and freeing pcb state, this means that all = other pru_method() calls can safely dereference so_pcb in all protocols. Synchronisation is required to use the socket, but the nature of the = synchronisation depends on the protocol, and different protocols use = quite different locking strategies (e.g., netnatm vs unix domain sockets = vs IPv4/IPv6). There are similar reference concerns in the other = direction, which among other things allow TCP to hold a reference on the = socket it represents until it's done with it, regardless of API-layer = close operations. We universally place protocol locks before = socket-layer locks in the lock order so that calls into the socket layer = are safe from the protocol while holding locks required to stabilise = pcbs -- this means that socket locks can't be held over calls down the = stack, mandating a stronger reference model. None of this precludes bugs, of course, but the design is fairly = coherent. The area of greatest weakness in synchronisation in the = network stack is actually in the socket state machine (so_state and = friends), where the stack is unclear whether the protocol or the socket = layer is driving the state machine. I've been gradually pushing in the = direction of the protocol driving state transitions, since that allows = atomicity between layers due to protocol locks being held over socket = locks when calling into the socket layer from the protocol. Robert=