From owner-freebsd-smp Tue May 14 3:17:53 2002 Delivered-To: freebsd-smp@freebsd.org Received: from rina.r.dl.itc.u-tokyo.ac.jp (cvsup2.r.dl.itc.u-tokyo.ac.jp [133.11.199.247]) by hub.freebsd.org (Postfix) with ESMTP id 372B337B403; Tue, 14 May 2002 03:17:44 -0700 (PDT) Received: from rina.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1]) by rina.r.dl.itc.u-tokyo.ac.jp (8.12.3+3.5Wbeta/3.7W-rina.r-Nankai-Koya) with ESMTP id g4EAHf3i037251 ; Tue, 14 May 2002 19:17:42 +0900 (JST) Message-Id: <200205141017.g4EAHf3i037251@rina.r.dl.itc.u-tokyo.ac.jp> Date: Tue, 14 May 2002 19:17:40 +0900 From: Seigo Tanimura To: Alfred Perlstein Cc: Seigo Tanimura , current@FreeBSD.org, smp@FreeBSD.org Subject: Re: The updated socket patch and axing sotryfree() (Re: Locking down a socket, milestone 1) In-Reply-To: <20020508161656.GV36741@elvis.mu.org> References: <200204241110.g3OB8u8t006194@bunko> <200205081159.g48Bx63i045654@rina.r.dl.itc.u-tokyo.ac.jp> <20020508161656.GV36741@elvis.mu.org> User-Agent: Wanderlust/2.8.1 (Something) SEMI/1.14.3 (Ushinoya) FLIM/1.14.3 (=?ISO-8859-1?Q?Unebigory=F2mae?=) APEL/10.3 MULE XEmacs/21.1 (patch 14) (Cuyahoga Valley) (i386--freebsd) Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Wed, 8 May 2002 09:16:56 -0700, Alfred Perlstein said: bright> * Seigo Tanimura [020508 04:59] wrote: >> >> I would like to commit this patch in one or two weeks to start working >> on a possible race between a user process and a netisr kthread, >> prevented by only the Giant lock at the moment. >> >> When a user process calls sofree() for a listening socket, it attempts >> to free the sockets in the connection queues by soabort(). If the >> connection of an aborting socket gets dropped by a remote host (eg by >> TCP RST), a netisr kthread also attempts to free the socket. Since >> the reference count of a socket in a connection queue is zero, this >> would resust in doubly freeing a socket. >> >> To solve that problem, I would like to axe sotryfree(). The PCB of a >> socket and a connection queue should hold a reference to the >> socket. This should make the reference count of an alive socket always >> be >= 1, and ensure that there is only one referer to a socket to be >> freed. >> >> Comments? bright> I'm not sure exactly how this solves the problem, there may need to bright> be a global socket mutex, perhaps putting this sort of operation under bright> that may do what you want. Yes, at least, we should introduce a global lock to protect the relation between sockets and PCBs. bright> Off the top of my head... bright> I think one way of doing it is storing the hashlist that the socket bright> belongs to (inpcb_hash) inside the sockets. That way after a lookup bright> you will have the lock on the parent structure, a user process will bright> have to follow the same locking paradym, basically look at the head bright> socket, lock the hashlist, then manipulate the incomplete queue. bright> Basically, protect this sort of operation via the hashlist because bright> you pretty much need to anyway. :) In order to solve the issue of deallocation race by a hashlist lock, we *always* have to obtain a socket or a PCB by looking up a hashlist. This is quite problematic because: 1. the lock order between a socket and a PCB gets tangled, 2. a hashlist introduces an overhead of calculating a hash index value, and 3. a hashlint lock cannot be per-socket or per-PCB, resulting in a contention under a huge number of socket operations or incoming packets. In order to make our lock strategy readable and comprehensible, we should keep a lock order as simple as the following: 1. a lock to protect the relation between/among objects, (eg the proctree lock or the allproc lock) and 2. a lock dedicated to a single object. (eg a proc lock) A reference count allows us a flexible way to keep a lock order clean. Once we grabbed a reference to an object, we can unlock it completely to restart with locking any lock protecting a relation. For instance, in the interrupt handler of a protocol stack, you lock a hashlist to look up the PCB appropriate to an incoming packet. You then lock the PCB to do some work. If you have to modify the socket corresponding to the PCB, hold a reference to the PCB and unlock it. Now you can lock the relation between sockets and PCBs to grab the socket safely. This strategy should be applicable to a socket operation initiated by a user process as well. We will not have to worry about the lock order between a socket and a PCB. Another advantage of a reference count is its cost. Provided that we hold an appropriate lock, we can simply follow a pointer to obtain an object. This is much cheaper than we calculate a hash index. We can also reduce the contention over a lock because the lock of a reference count is per-socket or per-PCB. bright> Other than that, have you looked at what BSD/os does and what Linux bright> does? Do they get it wrong or have any particular drawbacks? BSD/OS seems to ensure the existence of a PCB by locking the hashlist of PCBs. I am worrying about the fact that they lock the hashlist for quite a long duration. (about a half of udp_input() hold a read lock, and almost all of tcp_input() hold a *write* lock.) -- Seigo Tanimura To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message