From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 23:40:46 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 70D1A5BB for ; Wed, 19 Dec 2012 23:40:46 +0000 (UTC) (envelope-from vijju.singh@gmail.com) Received: from mail-ee0-f45.google.com (mail-ee0-f45.google.com [74.125.83.45]) by mx1.freebsd.org (Postfix) with ESMTP id ED6088FC12 for ; Wed, 19 Dec 2012 23:40:45 +0000 (UTC) Received: by mail-ee0-f45.google.com with SMTP id d49so1355017eek.18 for ; Wed, 19 Dec 2012 15:40:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=cdxT3Uv36Ca4ItHcuCmTNKwLKxq/hcKIBlLsfijXGI4=; b=xnMVlnnlxr7kQuJwZCzvrA5vtJu1FYbn+7qQYzYmG1wa4D9nwEH8gfIGvgA84cPUbQ UL9MOsCibVDAmtXTq0wiSrteMhn5nqAqZ1SLjV9sMYjgby8E1HsrPCsu7+aPtMmM8a81 yt4sLirlWyScIgJGjdyGAsy2fTEvwiaer3ak1hYTXOpUUa1DItRkJLcr0kalQmq/nem4 ol9PG5wYDmDN4QqRFfZzR3JCMfJYgnGG8Wfuvh7A6ef6J63gPYukGl3bDaBLzU0D5iKp fZ5hNamQ4SPmpqL733wsPjcm3rjalKuvWTqijK2cADLnCnqtFilKs9G8lck/0RnrzLqJ SFeA== MIME-Version: 1.0 Received: by 10.14.1.195 with SMTP id 43mr18099531eed.31.1355960439432; Wed, 19 Dec 2012 15:40:39 -0800 (PST) Received: by 10.223.101.77 with HTTP; Wed, 19 Dec 2012 15:40:39 -0800 (PST) In-Reply-To: <50D24774.80800@gmail.com> References: <50D218BA.7080301@FreeBSD.org> <50D21C3B.8020803@gmail.com> <50D24774.80800@gmail.com> Date: Wed, 19 Dec 2012 15:40:39 -0800 Message-ID: Subject: Re: use of V_tcbinfo lock for TCP syncache From: Vijay Singh To: Karim Fodil-Lemelin Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-net@freebsd.org, Navdeep Parhar X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2012 23:40:46 -0000 > Sure but syncache_expand() is entered with the tcbinfo already write locked > which also protects the unlocking of the listening connection and the > locking of the newly created socket. Around this part: > > /* > * Socket is created in state SYN_RECEIVED. > * Unlock the listen socket, lock the newly > * created socket and update the tp variable. > */ > INP_WUNLOCK(inp); /* listen socket */ > inp = sotoinpcb(so); > INP_WLOCK(inp); /* new connection */ > tp = intotcpcb(inp); > > Without the tcbinfo lock the new socket could be closed (getting a reset) > which would put it in INP_TIMEWAIT or INP_DROPPED _after_ the check is made > in tcp_usr_accept since there is a period of time where tcbinfo is not > locked and the new socket inp is not locked either. > > I could be wrong but it seems that without the tcbinfo lock a lot could > happen between the unlocking of the listen socket and the locking of the new Hopefully Robert will chime in. I am sorry that I was not clear. In my experiment, syncache_expand() is still entered with the V_tcbinfo lock held. In my (limited) view, sonewconn() is overleaded. It [1] allocates a new socket [2] initializes it using the listener socket [3] invokes the pru_attach routine, where the inp is allocated [4] it inserts the socket in the listeners queue [5] it, optionally, notifies the listener of the new connection When sonewconn returns, we do a bunch of things, [6] such as call in_pcbconnect() and set state in tp etc. What I am experimenting with is to separate out [4] & [5] from the list above, and move those to AFTER we do the inp processing in [6]. At that point I do not think that the pcbinfo lock should be required to be held. -vijay