Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 Oct 2014 01:41:12 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        arch@freebsd.org
Subject:   Re: RfC: fueword(9) and casueword(9)
Message-ID:  <20141022002825.H2080@besplex.bde.org>
In-Reply-To: <20141021094539.GA1877@kib.kiev.ua>
References:  <20141021094539.GA1877@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 21 Oct 2014, Konstantin Belousov wrote:

> FreeBSD provides the fuword(9) family of functions to fetch a word from
> the userspace. Functions return the value read, or -1 on failure (i.e.
> when faulted on access). This KPI has flaw, which makes it impossible
> to distinguish -1 read from usermode vs. the fault. As John Baldwin
> pointed out, fuword(9) cannot be replaced by copyin(9), since fuword(9)
> is atomic for aligned data, while copyin(9) is typically implemented as
> byte copy.

copyin() is usually implemented as register-sized copy for aligned data,
but may copy in any order, or more than once, or differently to reach
alignment points.  It is thus unsuitable for copying volatile data.  It
is also slow for small data.

> I wanted to fix this wart for long time, below is the prototyped patch,
> which adds fueword(9) family of functions.  They take the address of
> variable where to put the value read, and return 0 on success, -1 on
> failure.  In similar way, casueword(9) fixes casuword(9).

This API is larger, slower, and harder to use.  No better fix is evident,
except for fubyte() and fuword16().  These never had a problem on any
supported arch, since bytes are only 8 bits on all supported arches,
and 16-bit ints are not supported on any arch, so -1 is always out of
band.  Not touching them is a better fix.  You didn't change any of
their callers, but pessimized their implementation to a wrapper around
fue*().  (BTW, fuword16() and fuswintr() are misdocumented as taking a
non-const void * arg.).

Profiling counters use fuswintr().  You didn't touch that.  In FreeBSD
it is unimplemented for most or all arches.  It returns -1 to indicated
unimplemented.  NetBSD implemented it long ago.  It shouldn't be touched
since it is 16 bits.   However, it is a bug that profiling counters
are only 16 bits.  16-bit counters overflow in 8 seconds at the default
stathz of 8192.

> The tricky part of the patch are the changes to kern_umtx.c, where the
> logic of the loops in the lock acquire routines is delicate and care
> must be taken to not obliterate possible errors from the suspension
> check or signal test on loop retry.

> ...
> diff --git a/sys/kern/kern_umtx.c b/sys/kern/kern_umtx.c
> index 7cfef38..9f58e67 100644
> --- a/sys/kern/kern_umtx.c
> +++ b/sys/kern/kern_umtx.c
> ...
> @@ -921,7 +933,9 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags,
> 	 * can fault on any access.
> 	 */
> 	for (;;) {
> -		owner = fuword32(__DEVOLATILE(void *, &m->m_owner));
> +		rv = fueword32(__DEVOLATILE(void *, &m->m_owner), &owner);
> +		if (error == -1)
> +			return (EFAULT);
> 		if (mode == _UMUTEX_WAIT) {
> 			if (owner == UMUTEX_UNOWNED || owner == UMUTEX_CONTESTED)
> 				return (0);

A new API should try to fix these __DEVOLATILE() abominations.  I think it
is safe, and even correct, to declare the pointers as volatile const void *,
since the functions really can handle volatile data, unlike copyin().

Atomic op functions are declared as taking pointers to volatile for
similar reasons.  Often they are applied to non-volatile data, but
adding a qualifier is type-safe and doesn't cost efficiency since the
pointer access is is not known to the compiler.  (The last point is not
so clear -- the compiler can see things in the functions since they are
inline asm.  fueword() isn't inline so its (in)efficiency is not changed.)

The atomic read functions are not declared as taking pointers to const.
The __DECONST() abomination might be used to work around this bug.

umtx is indeed complicated.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141022002825.H2080>