Date: Fri, 18 Jul 2014 16:06:29 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Mateusz Guzik <mjguzik@gmail.com> Cc: freebsd-arch@freebsd.org Subject: Re: current fd allocation idiom Message-ID: <20140718130629.GJ93733@kib.kiev.ua> In-Reply-To: <20140717235538.GA15714@dft-labs.eu> References: <20140717235538.GA15714@dft-labs.eu>
next in thread | previous in thread | raw e-mail | index | archive | help
--fkEewJHTsXZZ0w48 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jul 18, 2014 at 01:55:38AM +0200, Mateusz Guzik wrote: > The kernel has to deal with concurrent attempts to open, close, dup2 and > use the same file descriptor. >=20 > I start with stating a rather esoteric bug affecting this area, then I > follow with a short overview of what is happening in general and a > proposal how to change to get rid of the bug and get some additional > enchancements. Interestingly enough turns out Linux is doing pretty much > the same thing. >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > THE BUG: > /* > * Initialize the file pointer with the specified properties. > * > * The ops are set with release semantics to be certain that the flags, t= ype, > * and data are visible when ops is. This is to prevent ops methods from= being > * called with bad data. > */ > void > finit(struct file *fp, u_int flag, short type, void *data, struct fileops= *ops) > { > fp->f_data =3D data; > fp->f_flag =3D flag; > fp->f_type =3D type; > atomic_store_rel_ptr((volatile uintptr_t *)&fp->f_ops, (uintptr_t= )ops); > } >=20 > This itself is fine, but it assumes all code obtaining fp from fdtable pl= aces > a read memory barrier after reading f_ops and before reading anything els= e. > As you could guess no code does that and I don't believe placing rmb's > in several new places is the way to go. I think your analysis is correct for all cases except kern_openat(). For kern_openat(), we install a file into fdtable only after the struct file is fully initialized, see kern_openat(). The file is allocated with falloc_noinstall(), then file is initialized, then finstall() does FILEDESC_LOCK/UNLOCK, which ensures the full barrier. Other file accessors do fget_unlocked(), which does acquire (of fp->f_count, but this does not matter). The only critical constraint is that other accessors must not see f_ops !=3D badfileops while struct file is not fully visible. IMO, the full barrier in finstall() and acquire in fget*() guarantee that we see badfileops until writes to other members are visible. For falloc(), indeed the write to f_ops could become visible too early (but not on x86). >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > GENERAL OVERVIEW OF CURRENT STATE: >=20 > fps are obtained and installed as follows: >=20 > struct file *fp; > int fd; >=20 > if (error =3D falloc(&fp, &fd)) > return (error); > if (error =3D something(....)) { > fdclose(fp, fd); > fdrop(fp); > return (error); > } >=20 > finit(fp, ....); > fdrop(fp); > return (0); >=20 > After falloc fp is installed in fdtable, it has refcount 2 and ops set to > 'badfileops'. >=20 > if something() failed: > fdclose() checks if it has anything to do. if so, it cleares fd and fdrop= s fp > fdrop() clears the second reference, everything is cleared up >=20 > if something() succeeded: > finit() finishes initialization of fp > fdrop() cleares the second reference. fp now has expected refcount of 1. >=20 > Now a little complication: > parallel close() execution: > fd is recognizes as used. it is cleared and fdrop(fp) is called. >=20 > if something() succeeded after close: > fdrop() kills fp >=20 > if something() failed after close: > fdclose() concludes nothing to do > fdrop() kill fp >=20 > Same story with dup2. >=20 > What readers need to do: > - rmb() after reading fp_ops > - check fp_ops for badfileops How can readers see badfileops ? >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > PROPOSAL: >=20 > struct file *fp; > int fd; >=20 > if (error =3D falloc(&fp, &fd)) Use falloc_noinstall() there. > return (error); > if (error =3D something(....)) { > fdclose(fp, fd); > return (error); > } >=20 > finit(fp, ....); > factivate(fd, fp); This function is spelled finstall(). It seems that all what is needed is conversion of places using falloc() to falloc_noinstall()/finstall(). > return (0); >=20 > After falloc fd is only marked as used, fp is NOT installed. > fp is returned with refcount of 1 and is invisible to anyone but > curthread. >=20 > if something() failed: > fdclose() marks fd as unused and kills fp >=20 > if something() succeeded: > finit() finishes initialization of fp > factivate() sets fp to non-null with a barrier >=20 > Now a little complication: > parallel close() execution: > since fp is null, fd is recognized as unused. EBADF is returned. >=20 > The only problem is with dup2 and I believe it is actually a step > forward. >=20 > Let's assume fd was marked as used by falloc, but fp was not installed ye= t. > dup2(n, fd) will see that fd is used. With current code there is no > problem since there is fp to fdrop and it can proceed. With the proposal > however, there is nothing to fdrop. Linux returns EBADF in this case > which deals with the problem and does not seem to provide any drawbacks > for behaving processes. >=20 > So, differences to current approach: > 1. fewer barriers and atomic operations > 2. no need to check for f_ops type > 3. new case when dup2 can return an error >=20 > Note that 3 should not be a problem since Linux is doing this already. >=20 > Also note current approach is not implemented correctly at the moment as > it misses rmbs, although I'm unsure how much this matters in practice. >=20 > Thoughts? > --=20 > Mateusz Guzik <mjguzik gmail.com> > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" --fkEewJHTsXZZ0w48 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJTyRvVAAoJEJDCuSvBvK1BZ0oP/jcXsr1uN2vcD85ED6NlFOvs I/PCpLyndYxOdMEGhUPiHw7nMvmpcHknGjxyPrzr64fMCk1WUiySA7gY5kqNxV/w UZB61O9O57A9m3c+3uIHMWQdhZYPk7WC0d1H59rCcwlnMrFAZXgy+EcFvd4MNAgN gTR4Ss2mb8sqjVpl34Sal4MRV2AuprGsWJfViomcGYOp02pu8pNbvq2hySmoTu/Y HGsx9Pxv9Tq5P4pWBbb6RG6nEB2wiggv2qQa3MWy4YSf+e5GI5b0aeu9mZO9RD2/ t5iL8skUp4TTt/TZIM7KOpbvzqfKU30nFfB+b8ZNB2bKMkXgvWU4KS6vqhIcCX4z DWq0XbEoQ63myLqcdzv8FSxuWhA2Djv8BoslpeIZeGltM+aMAhBw1htI5kP44aWo IvDmW//O5QrUay1cct1Nnn170sRF8pjzeazQECiP3c8GVHYdUXqy4HWuLLs1lqGg LVgruvQMcuzxYuB4gQtttvKYl7M5gS12H0zj/ALn+YoW9i4VHvjk31GvvZ4NAVw2 ahHohWU7KJSLukmgrYt5Cb33CyCd5obNpm4YLR/iQo6rjhxN4m9Lg5wTek8Rwz03 pjqP0ZKJJPX4lSX0CzKPStcO5Oz/gIVgoUGNhdRWHa1WxRFE0nBHLGfeYizU2Bsq nfX+H5kv4pa7+pR0+SJz =YyP8 -----END PGP SIGNATURE----- --fkEewJHTsXZZ0w48--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140718130629.GJ93733>