Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Feb 2012 08:17:34 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>, Gleb Smirnoff <glebius@freebsd.org>, svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r230583 - head/sys/kern
Message-ID:  <20120202061734.GP3283@deviant.kiev.zoral.com.ua>
In-Reply-To: <20120131174849.GA50386@zim.MIT.EDU>
References:  <20120127091244.GZ2726@deviant.kiev.zoral.com.ua> <20120127194221.GA25723@zim.MIT.EDU> <20120128123748.GD2726@deviant.kiev.zoral.com.ua> <20120129001225.GA32220@zim.MIT.EDU> <20120129062327.GK2726@deviant.kiev.zoral.com.ua> <20120129223904.GA37483@zim.MIT.EDU> <20120130063034.GU2726@deviant.kiev.zoral.com.ua> <20120130190703.GA44663@zim.MIT.EDU> <20120131105431.GB3283@deviant.kiev.zoral.com.ua> <20120131174849.GA50386@zim.MIT.EDU>

next in thread | previous in thread | raw e-mail | index | archive | help

--X78YbkCBd9ye7Cvs
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jan 31, 2012 at 12:48:49PM -0500, David Schultz wrote:
> On Tue, Jan 31, 2012, Konstantin Belousov wrote:
> > On Mon, Jan 30, 2012 at 02:07:03PM -0500, David Schultz wrote:
> > > On Mon, Jan 30, 2012, Kostik Belousov wrote:
> > > > On Sun, Jan 29, 2012 at 05:39:04PM -0500, David Schultz wrote:
> > > > > On Sun, Jan 29, 2012, Kostik Belousov wrote:
> > > > > > On Sat, Jan 28, 2012 at 07:12:25PM -0500, David Schultz wrote:
> > > > > > > On Sat, Jan 28, 2012, Kostik Belousov wrote:
> > > > > > > > On Fri, Jan 27, 2012 at 02:42:21PM -0500, David Schultz wro=
te:
> > > > > > > > > The correct limit on the maximum size of a single read/wr=
ite is
> > > > > > > > > SSIZE_MAX, but FreeBSD uses INT_MAX.  It's not safe to ra=
ise the
> > > > > > > > > limit yet, though, because of bugs in several filesystems=
.  For
> > > > > > > > > example, FFS copies uio_resid into a local variable of ty=
pe int.
> > > > > > > > > I have some old patches that fix some of these issues for=
 FFS and
> > > > > > > > > cd9660, but surely there are more places I didn't notice.
> > > > > > > > >=20
> > > > > > > > Absolutely agree.
> > > > > > > >=20
> > > > > > > > http://people.freebsd.org/~kib/misc/uio_resid.5.patch
> > > > > > >=20
> > > > > > > Nice.  You found a lot more than I've got in my tree, and you=
 even
> > > > > > > fixed the return values.  There are at least a few more place=
s to
> > > > > > > fix.  For instance, cd9660 and the NFS client pass uio_resid =
or
> > > > > > > iov_len to min(), which operates on ints.  (Incidentally, C11
> > > > > > > generics ought to make it possible to write type-generic min()
> > > > > > > and max() functions.)
> > > > > >=20
> > > > > > Thank you, http://people.freebsd.org/~kib/misc/uio_resid.6.patch
> > > > > > changed them to MIN().
> > > > >=20
> > > > > This looks good to me.  I tried to think of other places that you
> > > > > might have missed, and the only one that occurred to me is the
> > > > Might ? I think this is a blatant understate.
> > > >=20
> > > > > pipe code.  sys_pipe.c has an `int orig_resid' and lots of bogus
> > > > > casts of iov_len and uio_resid to type u_int.  Some look harmless,
> > > > > although it appears that writing a multiple of 2^32 bytes might
> > > > > result in pipe_build_write_buffer() allocating a 0-length buffer.
> > > > >=20
> > > > > My only reservation is that raising the limit could unmask a
> > > > > kernel buffer overflow if we missed something, but I guess we have
> > > > > to cross that bridge some day anyway.
> > > > Yes, and it is an obvious reason why I am chicken to commit this for
> > > > so long time. One more place, if this is reasonable to count as 'on=
e'
> > > > place, are the cdevsw methods. devfs passes uio down to the drivers.
> > >=20
> > > That's why I'm glad I'm not committing it. :)  A more conservative
> > > change (also known as "kicking the can down the road") would be to
> > > add a VFS flag, e.g., VFCF_LONGIO, and only set it on file systems
> > > that have been thoroughly reviewed.  The VFS layer could cap the size
> > > at INT_MAX for file systems without the flag.
> > At least I will get more mail after the commit, I hope.
> >=20
> > I disagree with the VFCF_LONGIO approach. It will cause much head-scrat=
ching
> > for unsuspecting user who would try to use > 4GB transfers.
> >=20
> > What I can do, is to commit all changes except removals of the checks
> > for INT_MAX. After type changes settle, I can try to gather enough
> > bravery to flip the checks in HEAD, possibly with temporary sysctl
> > to return to old behaviour for emergency (AKA hole).
>=20
> That sounds like a good plan to me.
>=20
> As an aside, I wonder if we could convince the clang folks to add
> a warning similar to `lint -a', which complains about every long->int
> narrowing conversion that doesn't have an explicit cast.  Modern
> languages such as Java and C# require casts for narrowing
> conversions, and I'd be a lot more confident about this change if
> we did the same for the FreeBSD kernel.
>=20
> > > > diff --git a/sys/kern/sys_pipe.c b/sys/kern/sys_pipe.c
> > > > index 9edcb74..332ec37 100644
> > > > --- a/sys/kern/sys_pipe.c
> > > > +++ b/sys/kern/sys_pipe.c
> > > [...]
> > > > @@ -757,14 +757,14 @@ pipe_build_write_buffer(wpipe, uio)
> > > >    struct pipe *wpipe;
> > > >    struct uio *uio;
> > > >  {
> > > > -	u_int size;
> > > > +	size_t size;
> > > >  	int i;
> > > > =20
> > > > 	PIPE_LOCK_ASSERT(wpipe, MA_NOTOWNED);
> > > >  	KASSERT(wpipe->pipe_state & PIPE_DIRECTW,
> > > >  				  ("Clone attempt on non-direct write pipe!"));
> > > > =20
> > > > -	size =3D (u_int) uio->uio_iov->iov_len;
> > > > +	size =3D uio->uio_iov->iov_len;
> > > >  	if (size > wpipe->pipe_buffer.size)
> > > >  	   size =3D wpipe->pipe_buffer.size;
> > >=20
> > > The transfer can't be bigger than the max pipe buffer size (64k),
> > > so `size =3D (int)MIN(uio->uio_iov->iov_len, wpipe->pipe_buffer.size)'
> > > should suffice.  The same comment applies elsewhere in the file.
> >=20
> > True. If you much prefer this version, I will change the patch. But I do
> > think that my changes are cleaner.
>=20
> I don't mind either way.  I haven't touched anything remotely
> close to that code in years.

I did the changes along the way suggested by Bruce.

Also, I put the patch under the real test for UFS and new NFS
client, reading/writing files of sizes multiple of INT_MAX in single
transaction. This indeed revealed two more issues, one in ktrace, and
second in uiomove(). The later resulted in quite spectacular kernel
stack corruption, because uio_iovcnt become underflowed and iovec was
iterated past end.

The updated patch disables SSIZE_MAX i/o by default, and includes sysctl
knob debug.iosize_max_clamp, which removes the clamp if set to zero.
I consider this patch as a commit candidate.

http://people.freebsd.org/~kib/misc/uio_resid.9.patch

--X78YbkCBd9ye7Cvs
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk8qKn0ACgkQC3+MBN1Mb4g65ACgsLLHOMAckJciVwiRT5K4u+2H
4yEAn3JoSuIHG8d6edIxVMAevmh+MFJz
=lqxH
-----END PGP SIGNATURE-----

--X78YbkCBd9ye7Cvs--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120202061734.GP3283>