Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Feb 2012 02:20:06 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, Gleb Smirnoff <glebius@freebsd.org>, src-committers@freebsd.org, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: svn commit: r230583 - head/sys/kern
Message-ID:  <20120201013621.V2889@besplex.bde.org>
In-Reply-To: <20120131105431.GB3283@deviant.kiev.zoral.com.ua>
References:  <20120126153641.GA68112@FreeBSD.org> <20120127194612.H1547@besplex.bde.org> <20120127091244.GZ2726@deviant.kiev.zoral.com.ua> <20120127194221.GA25723@zim.MIT.EDU> <20120128123748.GD2726@deviant.kiev.zoral.com.ua> <20120129001225.GA32220@zim.MIT.EDU> <20120129062327.GK2726@deviant.kiev.zoral.com.ua> <20120129223904.GA37483@zim.MIT.EDU> <20120130063034.GU2726@deviant.kiev.zoral.com.ua> <20120130190703.GA44663@zim.MIT.EDU> <20120131105431.GB3283@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 31 Jan 2012, Konstantin Belousov wrote:

> On Mon, Jan 30, 2012 at 02:07:03PM -0500, David Schultz wrote:
>> That's why I'm glad I'm not committing it. :)  A more conservative
>> change (also known as "kicking the can down the road") would be to
>> add a VFS flag, e.g., VFCF_LONGIO, and only set it on file systems
>> that have been thoroughly reviewed.  The VFS layer could cap the size
>> at INT_MAX for file systems without the flag.
> At least I will get more mail after the commit, I hope.
>
> I disagree with the VFCF_LONGIO approach. It will cause much head-scratching
> for unsuspecting user who would try to use > 4GB transfers.

No one cared when this was broken for almost 10 years by 4.4BSD changing
the type of iov_len from int to size_t.  You can almost equally safely
break it again :(.  Previously (in FreeBSD-1 and presumably in Net/2)
iov_len had the same type as uio_resid, and the code in readv(), etc.,
depended on this and also on benign undefined behaviour on overflow
to detect overflow (after it gave undefined behaviour by occurring)
when adding up iov_len's to get uio_resid.  The bug didn't require
passing a buffer of size > INT_MAX to reach -- it just required a
single or multiple iov_len whose total size exceeded INT_MAX.  FreeBSD
fixed this in 2003.

size_t for iov_len is bogus even if the limit is SSIZE max.  It allows
a single iov to have a size that cannot work, and n iov's to have a
size almost 2*n times too large to work.  Unfortunately, POSIX
standardizes the type of iov_len as size_t.  It is interesting that
POSIX says "shall fail" for readv() when the sum of the iov_len values
overflowed [sic] an ssize_t.  For read(), the behaviour when the count
exceeds {SSIZE_MAX} is implementation-defined.  This gives the silly
possibility that read() can work for a size that is almost twice as
large as readv(), using a single count instead of an array of counts.
It is also strange that readv() refers to ssize_t while read() refers
to {SSIZE_MAX}.  POSIX standardizes the bug that {SSIZE_MAX} is the
limit of the type, although that may be unrelated to sizes that can
work.  If these APIs had been correctly designed, then the limit would
be {READ_MAX} and unrelated to any size type (except that the type
must be large enough to represent {READ_MAX}).  It would normally be
INT_MAX or much smaller.  readv() also has the bogus specification
that if iovcnt is <= 0 or > {IOV_MAX}, then failure is optional.
So failure is optional for a hard error like iovcnt > {IOV_MAX},
but is required for a condition that is not required to be an error
for a similar API (count > {SSIZE_MAX}).  I only checked a 2001
draft for this.  Maybe some of these bugs have been fixed.

> What I can do, is to commit all changes except removals of the checks
> for INT_MAX. After type changes settle, I can try to gather enough
> bravery to flip the checks in HEAD, possibly with temporary sysctl
> to return to old behaviour for emergency (AKA hole).
>
>>
>>> diff --git a/sys/kern/sys_pipe.c b/sys/kern/sys_pipe.c
>>> index 9edcb74..332ec37 100644
>>> --- a/sys/kern/sys_pipe.c
>>> +++ b/sys/kern/sys_pipe.c
>> [...]
>>> @@ -757,14 +757,14 @@ pipe_build_write_buffer(wpipe, uio)
>>>    struct pipe *wpipe;
>>>    struct uio *uio;
>>>  {
>>> -	u_int size;
>>> +	size_t size;
>>>  	int i;
>>>
>>> 	PIPE_LOCK_ASSERT(wpipe, MA_NOTOWNED);
>>>  	KASSERT(wpipe->pipe_state & PIPE_DIRECTW,
>>>  				  ("Clone attempt on non-direct write pipe!"));
>>>
>>> -	size = (u_int) uio->uio_iov->iov_len;
>>> +	size = uio->uio_iov->iov_len;
>>>  	if (size > wpipe->pipe_buffer.size)
>>>  	   size = wpipe->pipe_buffer.size;
>>
>> The transfer can't be bigger than the max pipe buffer size (64k),
>> so `size = (int)MIN(uio->uio_iov->iov_len, wpipe->pipe_buffer.size)'
>> should suffice.  The same comment applies elsewhere in the file.
>
> True. If you much prefer this version, I will change the patch. But I do
> think that my changes are cleaner.

size_t is a bogus type for `size', since it must still be a u_int to
work.  For example, when it is assigned to the u_int wpipe->pipe_map.cnt.
This shouldn't be "fixed" by bloating the type to size_t throughout
(any global change should be to int to give suitably undefined behaviour
on overflow).  There must be an implicit or explicit check on it
somewhere that it fits.  This check is provided by the comparison with
wpipe->pipe_buffer.size.  But the above is a bad way to write it --
it should be checked before assignment, to avoid any possibility of
overflow or sign extension of either operand before the check:

 	u_int size;

 	if (uio->uio_iov->iov_len > wpipe->pipe_buffer.size)
 		size = wpipe->pipe_buffer.size;
 	else
 		size = uio->uio_iov->iov_len;

This is the same as the MIN() expression (without a bogus cast).  I
think I want to write it out verbosely like this and not use MIN(),
even if MIN() were not a style bug, to make it clear that this is
a correct bounds check.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120201013621.V2889>