From owner-freebsd-net@FreeBSD.ORG Thu Dec 20 16:12:26 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 072F9195 for ; Thu, 20 Dec 2012 16:12:26 +0000 (UTC) (envelope-from fodillemlinkarim@gmail.com) Received: from mail-ie0-f172.google.com (mail-ie0-f172.google.com [209.85.223.172]) by mx1.freebsd.org (Postfix) with ESMTP id B39A48FC16 for ; Thu, 20 Dec 2012 16:12:25 +0000 (UTC) Received: by mail-ie0-f172.google.com with SMTP id c13so4912619ieb.3 for ; Thu, 20 Dec 2012 08:12:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=SyreUKPkOiCGcRIosJT02K8MWIufeiomlv6VDLLa5AE=; b=fa6h1OuMsCT71HLSr/+fU2wNkgb2+1vWUvl9+dA+30KjuhKuZoxcXOsszHHiM8c5Ei G7XkLQXIPf5ZaHrQc1pG9juDELzgFemt41B7dXhwlhSFoYxh82ErPr3NYg5PEyWpnIol oTJh/mFqbjOFDpG2ak80tygRbGgJsvd/W8yNgod3QXDBOz2y8P/rivlf/r8qpwBdv555 Gxci+RA9/YslTuYsQnc6ChfEZA/oxSaU3ksuJPb7vHmLovx2yJjWBaQXuXWIYm2eNBYq svPBYXsPf8B+Cdrked4GSjZ+PP4fieJ6sMx388A5oN9YZM336O48AOrDFZGXKwmXiDPO XZ4g== X-Received: by 10.50.53.175 with SMTP id c15mr6071001igp.106.1356019945145; Thu, 20 Dec 2012 08:12:25 -0800 (PST) Received: from [192.168.1.73] ([208.85.112.101]) by mx.google.com with ESMTPS id l8sm7127517igo.13.2012.12.20.08.12.22 (version=SSLv3 cipher=OTHER); Thu, 20 Dec 2012 08:12:23 -0800 (PST) Message-ID: <50D338DB.5070907@gmail.com> Date: Thu, 20 Dec 2012 11:12:11 -0500 From: Karim Fodil-Lemelin User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Vijay Singh Subject: Re: use of V_tcbinfo lock for TCP syncache References: <50D218BA.7080301@FreeBSD.org> <50D21C3B.8020803@gmail.com> <50D24774.80800@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Karim Fodil-Lemelin , freebsd-net@freebsd.org, Navdeep Parhar X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2012 16:12:26 -0000 On 19/12/2012 6:40 PM, Vijay Singh wrote: >> Sure but syncache_expand() is entered with the tcbinfo already write locked >> which also protects the unlocking of the listening connection and the >> locking of the newly created socket. Around this part: >> >> /* >> * Socket is created in state SYN_RECEIVED. >> * Unlock the listen socket, lock the newly >> * created socket and update the tp variable. >> */ >> INP_WUNLOCK(inp); /* listen socket */ >> inp = sotoinpcb(so); >> INP_WLOCK(inp); /* new connection */ >> tp = intotcpcb(inp); >> >> Without the tcbinfo lock the new socket could be closed (getting a reset) >> which would put it in INP_TIMEWAIT or INP_DROPPED _after_ the check is made >> in tcp_usr_accept since there is a period of time where tcbinfo is not >> locked and the new socket inp is not locked either. >> >> I could be wrong but it seems that without the tcbinfo lock a lot could >> happen between the unlocking of the listen socket and the locking of the new > Hopefully Robert will chime in. > > I am sorry that I was not clear. In my experiment, syncache_expand() > is still entered with the V_tcbinfo lock held. > > In my (limited) view, sonewconn() is overleaded. It > > [1] allocates a new socket > [2] initializes it using the listener socket > [3] invokes the pru_attach routine, where the inp is allocated > [4] it inserts the socket in the listeners queue > [5] it, optionally, notifies the listener of the new connection > > When sonewconn returns, we do a bunch of things, [6] such as call > in_pcbconnect() and set state in tp etc. > > What I am experimenting with is to separate out [4] & [5] from the > list above, and move those to AFTER we do the inp processing in [6]. > At that point I do not think that the pcbinfo lock should be required > to be held. > > How do you plan to handle the fact that most of tcp_input() and tcp_do_segment() require at least a read lock held on the pcbinfo lock? Is your goal to reduce the amount of code that gets executed under the write lock protection of pcbinfo or completely get rid of the lock all together? Karim.