From owner-freebsd-net@FreeBSD.ORG  Sun Jul 22 17:55:15 2012
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 63B72106564A
	for <freebsd-net@freebsd.org>; Sun, 22 Jul 2012 17:55:15 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 1C1878FC0A
	for <freebsd-net@freebsd.org>; Sun, 22 Jul 2012 17:55:15 +0000 (UTC)
Received: from fledge.watson.org (fledge.watson.org [65.122.17.41])
	by cyrus.watson.org (Postfix) with ESMTPS id 71B0E46B0D;
	Sun, 22 Jul 2012 13:55:14 -0400 (EDT)
Date: Sun, 22 Jul 2012 18:55:14 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Vijay Singh <vijju.singh@gmail.com>
In-Reply-To: <CALCNsJSqc+4FTZqFJVhhGffVjxzaJaQf=FASrah_=a=WFB_r2w@mail.gmail.com>
Message-ID: <alpine.BSF.2.00.1207221842010.83786@fledge.watson.org>
References: <CALCNsJSqc+4FTZqFJVhhGffVjxzaJaQf=FASrah_=a=WFB_r2w@mail.gmail.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-net@freebsd.org
Subject: Re: Use of V_tcbinfo lock in tcp_do_segment()
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 22 Jul 2012 17:55:15 -0000


On Thu, 24 May 2012, Vijay Singh wrote:

> Folks, I am trying to understand the need to hold the V_tcbinfo lock in part 
> of this function [code included below for reference].
>
> At present this causes the socket upcall to be called with the V_tcbinfo 
> lock held, which I'd like to avoid. We release this lock right after this 
> section.
>
> Looking at the code, it seems the lock is needed if we were in FIN_WAIT_2 
> and tcp_close() the connection. The lock also seems to be protecting 
> V_twq_2msl.
>
> Would it be an acceptable solution if we deferred calling socantrecvmore() 
> till after the lock can be dropped (after the swtich statement). 
> tcp_twstart() can be changed to return tp if the connections survives, or 
> NULL if it doesn't, much like what tcp_close() does. Also a new lock could 
> be added to protect the V_rwq_2msl queue.
>
> If this sounds acceptable, I can generate a patch against -CURRENT. I would 
> appreciate feedback.

Hi Vijay:

Sorry for responding to your query with such a delay.

A quick glance at tcp_twstart() appears to confirm your description -- I read 
it as requiring the incpbinfo lock for several reasons:

1) Potential call to tcp_close().

2) Potential call to tcp_tw_2msl_scan() which may in turn call tcp_twclose(),
    which requires the pcbinfo lock (see 5 below).

3) Direct use of V_twq_2msl.

4) Potential call to tcp_tw_2msl_scan(), which uses V_twq_2msl, and may also
    call tcp_tw_close (see 5).

5) In the tcp_twclose() callgraph subtree, the pcbinfo lock is used for
    several things, including tcp_tw_2msl_stop() and in_pcbfree().  The call to
    sofree() may recurse, under some circumstances, back into pru_detach(), and
    then into tcp_detach() which requires the lock.

Especially with the pcbgroup reworking so that the pcbhash rather than pcbinfo 
lock is used in a number of places, there are increasing numbers of places 
where this sort of optimisation could be made.

One caution to ponder: socantrecvmore() can trigger an upcall (per your 
comment) in the receive path.  It used to be that upcalls did worrying things, 
such as call back down the stack -- I think we've largely eliminated those 
(NFS UDP was one such example, now fixed?) and therefore they are probably OK, 
but worth checking the so_upcall consumers to see what the implications might 
be, if any.

Robert

>
> -vijay
>
>
> relevant code from tcp_do_segment():
>
>        if (thflags & TH_FIN) {
>                if (TCPS_HAVERCVDFIN(tp->t_state) == 0) {
>                        socantrcvmore(so);
>                        /*
>                         * If connection is half-synchronized
>                         * (ie NEEDSYN flag on) then delay ACK,
>                         * so it may be piggybacked when SYN is sent.
>                         * Otherwise, since we received a FIN then no
>                         * more input can be expected, send ACK now.
>                         */
>                        if (tp->t_flags & TF_NEEDSYN)
>                                tp->t_flags |= TF_DELACK;
>                        else
>                                tp->t_flags |= TF_ACKNOW;
>                        tp->rcv_nxt++;
>                }
>                switch (tp->t_state) {
>
>                /*
>                 * In SYN_RECEIVED and ESTABLISHED STATES
>                 * enter the CLOSE_WAIT state.
>                 */
>                case TCPS_SYN_RECEIVED:
>                        tp->t_starttime = ticks;
>                        /* FALLTHROUGH */
>                case TCPS_ESTABLISHED:
>                        tp->t_state = TCPS_CLOSE_WAIT;
>                        break;
>
>                /*
>                 * If still in FIN_WAIT_1 STATE FIN has not been acked so
>                 * enter the CLOSING state.
>                 */
>                case TCPS_FIN_WAIT_1:
>                        tp->t_state = TCPS_CLOSING;
>                        break;
>
>                /*
>                 * In FIN_WAIT_2 state enter the TIME_WAIT state,
>                 * starting the time-wait timer, turning off the other
>                 * standard timers.
>                 */
>                case TCPS_FIN_WAIT_2:
>                        INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
>                        KASSERT(ti_locked == TI_WLOCKED, ("%s: dodata "
>                            "TCP_FIN_WAIT_2 ti_locked: %d", __func__,
>                            ti_locked));
>
>                        tcp_twstart(tp);
>                        INP_INFO_WUNLOCK(&V_tcbinfo);
>                        return;
>                }
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>