From owner-freebsd-arch@FreeBSD.ORG Fri Jul 18 13:06:37 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DD7E1922 for ; Fri, 18 Jul 2014 13:06:36 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 66373216F for ; Fri, 18 Jul 2014 13:06:36 +0000 (UTC) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s6ID6UdI051389 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 18 Jul 2014 16:06:30 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s6ID6UdI051389 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s6ID6TKx051388; Fri, 18 Jul 2014 16:06:29 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 18 Jul 2014 16:06:29 +0300 From: Konstantin Belousov To: Mateusz Guzik Subject: Re: current fd allocation idiom Message-ID: <20140718130629.GJ93733@kib.kiev.ua> References: <20140717235538.GA15714@dft-labs.eu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="fkEewJHTsXZZ0w48" Content-Disposition: inline In-Reply-To: <20140717235538.GA15714@dft-labs.eu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 13:06:37 -0000 --fkEewJHTsXZZ0w48 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jul 18, 2014 at 01:55:38AM +0200, Mateusz Guzik wrote: > The kernel has to deal with concurrent attempts to open, close, dup2 and > use the same file descriptor. >=20 > I start with stating a rather esoteric bug affecting this area, then I > follow with a short overview of what is happening in general and a > proposal how to change to get rid of the bug and get some additional > enchancements. Interestingly enough turns out Linux is doing pretty much > the same thing. >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > THE BUG: > /* > * Initialize the file pointer with the specified properties. > * > * The ops are set with release semantics to be certain that the flags, t= ype, > * and data are visible when ops is. This is to prevent ops methods from= being > * called with bad data. > */ > void > finit(struct file *fp, u_int flag, short type, void *data, struct fileops= *ops) > { > fp->f_data =3D data; > fp->f_flag =3D flag; > fp->f_type =3D type; > atomic_store_rel_ptr((volatile uintptr_t *)&fp->f_ops, (uintptr_t= )ops); > } >=20 > This itself is fine, but it assumes all code obtaining fp from fdtable pl= aces > a read memory barrier after reading f_ops and before reading anything els= e. > As you could guess no code does that and I don't believe placing rmb's > in several new places is the way to go. I think your analysis is correct for all cases except kern_openat(). For kern_openat(), we install a file into fdtable only after the struct file is fully initialized, see kern_openat(). The file is allocated with falloc_noinstall(), then file is initialized, then finstall() does FILEDESC_LOCK/UNLOCK, which ensures the full barrier. Other file accessors do fget_unlocked(), which does acquire (of fp->f_count, but this does not matter). The only critical constraint is that other accessors must not see f_ops !=3D badfileops while struct file is not fully visible. IMO, the full barrier in finstall() and acquire in fget*() guarantee that we see badfileops until writes to other members are visible. For falloc(), indeed the write to f_ops could become visible too early (but not on x86). >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > GENERAL OVERVIEW OF CURRENT STATE: >=20 > fps are obtained and installed as follows: >=20 > struct file *fp; > int fd; >=20 > if (error =3D falloc(&fp, &fd)) > return (error); > if (error =3D something(....)) { > fdclose(fp, fd); > fdrop(fp); > return (error); > } >=20 > finit(fp, ....); > fdrop(fp); > return (0); >=20 > After falloc fp is installed in fdtable, it has refcount 2 and ops set to > 'badfileops'. >=20 > if something() failed: > fdclose() checks if it has anything to do. if so, it cleares fd and fdrop= s fp > fdrop() clears the second reference, everything is cleared up >=20 > if something() succeeded: > finit() finishes initialization of fp > fdrop() cleares the second reference. fp now has expected refcount of 1. >=20 > Now a little complication: > parallel close() execution: > fd is recognizes as used. it is cleared and fdrop(fp) is called. >=20 > if something() succeeded after close: > fdrop() kills fp >=20 > if something() failed after close: > fdclose() concludes nothing to do > fdrop() kill fp >=20 > Same story with dup2. >=20 > What readers need to do: > - rmb() after reading fp_ops > - check fp_ops for badfileops How can readers see badfileops ? >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > PROPOSAL: >=20 > struct file *fp; > int fd; >=20 > if (error =3D falloc(&fp, &fd)) Use falloc_noinstall() there. > return (error); > if (error =3D something(....)) { > fdclose(fp, fd); > return (error); > } >=20 > finit(fp, ....); > factivate(fd, fp); This function is spelled finstall(). It seems that all what is needed is conversion of places using falloc() to falloc_noinstall()/finstall(). > return (0); >=20 > After falloc fd is only marked as used, fp is NOT installed. > fp is returned with refcount of 1 and is invisible to anyone but > curthread. >=20 > if something() failed: > fdclose() marks fd as unused and kills fp >=20 > if something() succeeded: > finit() finishes initialization of fp > factivate() sets fp to non-null with a barrier >=20 > Now a little complication: > parallel close() execution: > since fp is null, fd is recognized as unused. EBADF is returned. >=20 > The only problem is with dup2 and I believe it is actually a step > forward. >=20 > Let's assume fd was marked as used by falloc, but fp was not installed ye= t. > dup2(n, fd) will see that fd is used. With current code there is no > problem since there is fp to fdrop and it can proceed. With the proposal > however, there is nothing to fdrop. Linux returns EBADF in this case > which deals with the problem and does not seem to provide any drawbacks > for behaving processes. >=20 > So, differences to current approach: > 1. fewer barriers and atomic operations > 2. no need to check for f_ops type > 3. new case when dup2 can return an error >=20 > Note that 3 should not be a problem since Linux is doing this already. >=20 > Also note current approach is not implemented correctly at the moment as > it misses rmbs, although I'm unsure how much this matters in practice. >=20 > Thoughts? > --=20 > Mateusz Guzik > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" --fkEewJHTsXZZ0w48 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJTyRvVAAoJEJDCuSvBvK1BZ0oP/jcXsr1uN2vcD85ED6NlFOvs I/PCpLyndYxOdMEGhUPiHw7nMvmpcHknGjxyPrzr64fMCk1WUiySA7gY5kqNxV/w UZB61O9O57A9m3c+3uIHMWQdhZYPk7WC0d1H59rCcwlnMrFAZXgy+EcFvd4MNAgN gTR4Ss2mb8sqjVpl34Sal4MRV2AuprGsWJfViomcGYOp02pu8pNbvq2hySmoTu/Y HGsx9Pxv9Tq5P4pWBbb6RG6nEB2wiggv2qQa3MWy4YSf+e5GI5b0aeu9mZO9RD2/ t5iL8skUp4TTt/TZIM7KOpbvzqfKU30nFfB+b8ZNB2bKMkXgvWU4KS6vqhIcCX4z DWq0XbEoQ63myLqcdzv8FSxuWhA2Djv8BoslpeIZeGltM+aMAhBw1htI5kP44aWo IvDmW//O5QrUay1cct1Nnn170sRF8pjzeazQECiP3c8GVHYdUXqy4HWuLLs1lqGg LVgruvQMcuzxYuB4gQtttvKYl7M5gS12H0zj/ALn+YoW9i4VHvjk31GvvZ4NAVw2 ahHohWU7KJSLukmgrYt5Cb33CyCd5obNpm4YLR/iQo6rjhxN4m9Lg5wTek8Rwz03 pjqP0ZKJJPX4lSX0CzKPStcO5Oz/gIVgoUGNhdRWHa1WxRFE0nBHLGfeYizU2Bsq nfX+H5kv4pa7+pR0+SJz =YyP8 -----END PGP SIGNATURE----- --fkEewJHTsXZZ0w48--