Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 May 2012 21:43:40 -0400
From:      "Robert N. M. Watson" <rwatson@freebsd.org>
To:        vasanth rao naik sabavat <vasanth.raonaik@gmail.com>
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: SMP: protocol control block protection for a multithreaded process (ex: udp).
Message-ID:  <02692B7F-02AD-49A5-A3D5-C92F03E7147C@freebsd.org>
In-Reply-To: <CAAuizBhaNp-%2Bf7FD=e0Sp_15yuXYnAViBsH5%2B7DbDuDNoJxfMA@mail.gmail.com>
References:  <CAAuizBjhGUUH3D3XN1t7WMnOPTq0vZjnV1QXGrR99qBOD34rGQ@mail.gmail.com> <CAAuizBhn_QT4WCh1ZRyc%2BHBkOYGaGivsVGm4oLj-i9VY7a5wxw@mail.gmail.com> <alpine.BSF.2.00.1205292204590.15505@fledge.watson.org> <CAAuizBjpLHoWwQ_CYrY9H5xrJ8_e48S_hVyU8Fif_J2pEyiq6Q@mail.gmail.com> <B6BB8E0D-F536-463E-B59C-A098038B8C1E@FreeBSD.org> <CAAuizBhaNp-%2Bf7FD=e0Sp_15yuXYnAViBsH5%2B7DbDuDNoJxfMA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 29 May 2012, at 21:09, vasanth rao naik sabavat wrote:

> I am trying to understand the socket <--> protocol layer as part of =
our project. I was trying to understand why the sotoinpcb() is called =
before taking any locks. Also, I am trying to understand scenario of a =
multi-threaded process trying to do socket operations simultaneously on =
a multicore cpu.
>=20
> I have gone through the socket life cycle comments in the code and =
gave good understanding of the socket life cycle. Thank you for the =
reference.

Hi Vasanth:

Historically, the so->so_pcb pointer in BSD was protected by spl's, and =
could only be followed safely while at an elevated spl (probably splnet =
-- details forgotten at this point!).

In FreeBSD 6.x, I made a substantial revisions to the semantics of the =
socket<->pcb relationship in order to reduce the amount of =
synchronisation required. Among other things, I made it so that the =
validity of the so->so_pcb pointer is entirely defined by the protocol, =
and also made it so that all protocols could safely follow so->so_pcb =
without locks held, by virtue of the reference model. This trades off =
slightly greater memory use (inpcbs are always allocated for sockets, =
even after they have closed) for reduced synchronisation overhead + =
improved stability (due to reduced complexity). The socket life cycle =
ensures that no access to so->so_pcb occurs before pru_attach() has =
returned, and also ensures that no socket access will occur from the =
moment pru_detach() is called. As pru_attach() and pru_detach() are =
responsible for allocating and freeing pcb state, this means that all =
other pru_method() calls can safely dereference so_pcb in all protocols.

Synchronisation is required to use the socket, but the nature of the =
synchronisation depends on the protocol, and different protocols use =
quite different locking strategies (e.g., netnatm vs unix domain sockets =
vs IPv4/IPv6). There are similar reference concerns in the other =
direction, which among other things allow TCP to hold a reference on the =
socket it represents until it's done with it, regardless of API-layer =
close operations. We universally place protocol locks before =
socket-layer locks in the lock order so that calls into the socket layer =
are safe from the protocol while holding locks required to stabilise =
pcbs -- this means that socket locks can't be held over calls down the =
stack, mandating a stronger reference model.

None of this precludes bugs, of course, but the design is fairly =
coherent. The area of greatest weakness in synchronisation in the =
network stack is actually in the socket state machine (so_state and =
friends), where the stack is unclear whether the protocol or the socket =
layer is driving the state machine. I've been gradually pushing in the =
direction of the protocol driving state transitions, since that allows =
atomicity between layers due to protocol locks being held over socket =
locks when calling into the socket layer from the protocol.

Robert=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?02692B7F-02AD-49A5-A3D5-C92F03E7147C>