From owner-freebsd-arch@FreeBSD.ORG Mon Nov 24 02:16:45 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BBAA792C for ; Mon, 24 Nov 2014 02:16:45 +0000 (UTC) Received: from st11p02mm-asmtp001.mac.com (st11p02mm-asmtp001.mac.com [17.172.220.236]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9210C7C5 for ; Mon, 24 Nov 2014 02:16:45 +0000 (UTC) Received: from fukuyama.hsd1.ca.comcast.net (unknown [73.162.13.215]) by st11p02mm-asmtp001.mac.com (Oracle Communications Messaging Server 7.0.5.33.0 64bit (built Aug 27 2014)) with ESMTPSA id <0NFI00LCRUAM9O30@st11p02mm-asmtp001.mac.com> for freebsd-arch@freebsd.org; Mon, 24 Nov 2014 02:16:00 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.13.68,1.0.28,0.0.0000 definitions=2014-11-24_01:2014-11-21,2014-11-23,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=2 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1408290000 definitions=main-1411240020 Content-type: text/plain; charset=us-ascii MIME-version: 1.0 (Mac OS X Mail 8.1 \(1993\)) Subject: Re: rarely changing process-wide data vs threads From: Rui Paulo In-reply-to: <20141123231435.GA32084@dft-labs.eu> Date: Sun, 23 Nov 2014 18:15:58 -0800 Content-transfer-encoding: quoted-printable Message-id: <7683D4D1-9458-48D1-A4DF-602E2C4D13C2@me.com> References: <20141123231435.GA32084@dft-labs.eu> To: Mateusz Guzik X-Mailer: Apple Mail (2.1993) Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2014 02:16:45 -0000 On Nov 23, 2014, at 15:14, Mateusz Guzik wrote: >=20 > Currently we have some things frequently accessed which require > locking, even though they very rarely change. >=20 > This includes: > - cwd, root, jdir vnodes > - resource limits >=20 > File lookup typically requires us to vref and unref cwd and root dir = and > locking filedesc lock shared which competes with fd open/close in = other > threads. >=20 > Any resource limit checks requires taking PROC_LOCK, which is an > exclusive lock. >=20 > Turns out we already have a nice solution which only needs some minor > refining and it was used to manage credentials: >=20 > Each thread has a reference on active credentials and has its own > pointer. When credentials are updated, a new structure is allocated = and > threads check that they got the right pointer on syscall boundary. If > they got the wrong one, they lock PROC_LOCK and update. >=20 > We can make this more general to suit other needs with an introduction > of 'generation' counter and optionally an rwlock instead of using > PROC_LOCK. If 'generation' is unequal to what is set in the process, > at least one of creds/dirs/rlimits/$something needs updating and we = can > take the lock and iterate over structs. Right, this is the same model used by the routing table. > This may pose some concern since it may seem this introduces a window > where given thread uses stale data while a concurrently executing = thread > uses new one. Likewise there's a small race for the networking stack. > This window is already present for all users that I can see. >=20 > During file lookups filedesc lock is only temporarily held (and = current > code even has a possible use after free since it does not start with > refing root vnode while fdp is locked so it can be freed/recycled). >=20 > resource limits are inherently racy anyway. proc lock is held only for = a > short them to read them, that's it. >=20 > As such, I don't believe this approach introduces any new windows > (although it extends already existing ones). I agree. > When it comes to implementation of this concept for dir vnodes, one > would need to split current struct filedesc. chdir in threaded = processes > would be more expensive since new struct would have to be allocated = and > vnodes vrefed, but chdirs are way less frequent than lookups so it > should be worth it anyway. I agree. A lookup is a different operation but most of the time a chdir = is followed by a lookup, so if we optimise the lookup case the end = result might still be better. > There is also a note on filedescs shared between processes. In such > cases we would abandon this optimisation (dir struct can have a flag = to > note cow is not suitable and lookups need to vref like they do now). Are you talking about your optimisation or something that's already = there? -- Rui Paulo