Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Jul 2014 16:06:29 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Mateusz Guzik <mjguzik@gmail.com>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: current fd allocation idiom
Message-ID:  <20140718130629.GJ93733@kib.kiev.ua>
In-Reply-To: <20140717235538.GA15714@dft-labs.eu>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On Fri, Jul 18, 2014 at 01:55:38AM +0200, Mateusz Guzik wrote:
> The kernel has to deal with concurrent attempts to open, close, dup2 and
> use the same file descriptor.
> 
> I start with stating a rather esoteric bug affecting this area, then I
> follow with a short overview of what is happening in general and a
> proposal how to change to get rid of the bug and get some additional
> enchancements. Interestingly enough turns out Linux is doing pretty much
> the same thing.
> 
> ============================
> THE BUG:
> /*
>  * Initialize the file pointer with the specified properties.
>  *
>  * The ops are set with release semantics to be certain that the flags, type,
>  * and data are visible when ops is.  This is to prevent ops methods from being
>  * called with bad data.
>  */
> void
> finit(struct file *fp, u_int flag, short type, void *data, struct fileops *ops)
> {
>         fp->f_data = data;
>         fp->f_flag = flag;
>         fp->f_type = type;
>         atomic_store_rel_ptr((volatile uintptr_t *)&fp->f_ops, (uintptr_t)ops);
> }
> 
> This itself is fine, but it assumes all code obtaining fp from fdtable places
> a read memory barrier after reading f_ops and before reading anything else.
> As you could guess no code does that and I don't believe placing rmb's
> in several new places is the way to go.
I think your analysis is correct for all cases except kern_openat().

For kern_openat(), we install a file into fdtable only after the struct
file is fully initialized, see kern_openat(). The file is allocated
with falloc_noinstall(), then file is initialized, then finstall() does
FILEDESC_LOCK/UNLOCK, which ensures the full barrier. Other file
accessors do fget_unlocked(), which does acquire (of fp->f_count, but
this does not matter).

The only critical constraint is that other accessors must not see
f_ops != badfileops while struct file is not fully visible.  IMO,
the full barrier in finstall() and acquire in fget*() guarantee
that we see badfileops until writes to other members are visible.

For falloc(), indeed the write to f_ops could become visible too early
(but not on x86).

> 
> ============================
> GENERAL OVERVIEW OF CURRENT STATE:
> 
> fps are obtained and installed as follows:
> 
> struct file *fp;
> int fd;
> 
> if (error = falloc(&fp, &fd))
> 	return (error);
> if (error = something(....)) {
> 	fdclose(fp, fd);
> 	fdrop(fp);
> 	return (error);
> }
> 
> finit(fp, ....);
> fdrop(fp);
> return (0);
> 
> After falloc fp is installed in fdtable, it has refcount 2 and ops set to
> 'badfileops'.
> 
> if something() failed:
> fdclose() checks if it has anything to do. if so, it cleares fd and fdrops fp
> fdrop() clears the second reference, everything is cleared up
> 
> if something() succeeded:
> finit() finishes initialization of fp
> fdrop() cleares the second reference. fp now has expected refcount of 1.
> 
> Now a little complication:
> parallel close() execution:
> fd is recognizes as used. it is cleared and fdrop(fp) is called.
> 
> if something() succeeded after close:
> fdrop() kills fp
> 
> if something() failed after close:
> fdclose() concludes nothing to do
> fdrop() kill fp
> 
> Same story with dup2.
> 
> What readers need to do:
> - rmb() after reading fp_ops
> - check fp_ops for badfileops
How can readers see badfileops ?

> 
> ============================
> PROPOSAL:
> 
> struct file *fp;
> int fd;
> 
> if (error = falloc(&fp, &fd))
Use falloc_noinstall() there.

> 	return (error);
> if (error = something(....)) {
> 	fdclose(fp, fd);
> 	return (error);
> }
> 
> finit(fp, ....);
> factivate(fd, fp);
This function is spelled finstall().

It seems that all what is needed is conversion of places using
falloc() to falloc_noinstall()/finstall().

> return (0);
> 
> After falloc fd is only marked as used, fp is NOT installed.
> fp is returned with refcount of 1 and is invisible to anyone but
> curthread.
> 
> if something() failed:
> fdclose() marks fd as unused and kills fp
> 
> if something() succeeded:
> finit() finishes initialization of fp
> factivate() sets fp to non-null with a barrier
> 
> Now a little complication:
> parallel close() execution:
> since fp is null, fd is recognized as unused. EBADF is returned.
> 
> The only problem is with dup2 and I believe it is actually a step
> forward.
> 
> Let's assume fd was marked as used by falloc, but fp was not installed yet.
> dup2(n, fd) will see that fd is used. With current code there is no
> problem since there is fp to fdrop and it can proceed. With the proposal
> however, there is nothing to fdrop. Linux returns EBADF in this case
> which deals with the problem and does not seem to provide any drawbacks
> for behaving processes.
> 
> So, differences to current approach:
> 1. fewer barriers and atomic operations
> 2. no need to check for f_ops type
> 3. new case when dup2 can return an error
> 
> Note that 3 should not be a problem since Linux is doing this already.
> 
> Also note current approach is not implemented correctly at the moment as
> it misses rmbs, although I'm unsure how much this matters in practice.
> 
> Thoughts?
> -- 
> Mateusz Guzik <mjguzik gmail.com>
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"

[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBAgAGBQJTyRvVAAoJEJDCuSvBvK1BZ0oP/jcXsr1uN2vcD85ED6NlFOvs
I/PCpLyndYxOdMEGhUPiHw7nMvmpcHknGjxyPrzr64fMCk1WUiySA7gY5kqNxV/w
UZB61O9O57A9m3c+3uIHMWQdhZYPk7WC0d1H59rCcwlnMrFAZXgy+EcFvd4MNAgN
gTR4Ss2mb8sqjVpl34Sal4MRV2AuprGsWJfViomcGYOp02pu8pNbvq2hySmoTu/Y
HGsx9Pxv9Tq5P4pWBbb6RG6nEB2wiggv2qQa3MWy4YSf+e5GI5b0aeu9mZO9RD2/
t5iL8skUp4TTt/TZIM7KOpbvzqfKU30nFfB+b8ZNB2bKMkXgvWU4KS6vqhIcCX4z
DWq0XbEoQ63myLqcdzv8FSxuWhA2Djv8BoslpeIZeGltM+aMAhBw1htI5kP44aWo
IvDmW//O5QrUay1cct1Nnn170sRF8pjzeazQECiP3c8GVHYdUXqy4HWuLLs1lqGg
LVgruvQMcuzxYuB4gQtttvKYl7M5gS12H0zj/ALn+YoW9i4VHvjk31GvvZ4NAVw2
ahHohWU7KJSLukmgrYt5Cb33CyCd5obNpm4YLR/iQo6rjhxN4m9Lg5wTek8Rwz03
pjqP0ZKJJPX4lSX0CzKPStcO5Oz/gIVgoUGNhdRWHa1WxRFE0nBHLGfeYizU2Bsq
nfX+H5kv4pa7+pR0+SJz
=YyP8
-----END PGP SIGNATURE-----
home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140718130629.GJ93733>