From owner-svn-src-all@freebsd.org  Tue Jul 16 09:48:11 2019
Return-Path: <owner-svn-src-all@freebsd.org>
Delivered-To: svn-src-all@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 4C016AAE96;
 Tue, 16 Jul 2019 09:48:11 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 2230D749D9;
 Tue, 16 Jul 2019 09:48:08 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 3449743C2CD;
 Tue, 16 Jul 2019 19:48:04 +1000 (AEST)
Date: Tue, 16 Jul 2019 19:48:02 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: John Baldwin <jhb@freebsd.org>
cc: cem@freebsd.org, Alan Somers <asomers@freebsd.org>, 
 src-committers <src-committers@freebsd.org>, 
 svn-src-all <svn-src-all@freebsd.org>, 
 svn-src-head <svn-src-head@freebsd.org>
Subject: Re: svn commit: r349391 - head/sys/kern
In-Reply-To: <3b2ade4e-d19d-31e4-7ee2-c0085a00e53e@FreeBSD.org>
Message-ID: <20190716180618.C962@besplex.bde.org>
References: <201906251944.x5PJiNaA093352@repo.freebsd.org>
 <CAG6CVpUscojbGvA0=BmZPhSR+5Yt5ah8SQKrBaouUNDpu9tw9A@mail.gmail.com>
 <3b2ade4e-d19d-31e4-7ee2-c0085a00e53e@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 cx=a_idp_d
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8
 a=h1vcgeBvbb_nU-wzG2IA:9 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22
X-Rspamd-Queue-Id: 2230D749D9
X-Spamd-Bar: -----
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates
 211.29.132.246 as permitted sender) smtp.mailfrom=brde@optusnet.com.au
X-Spamd-Result: default: False [-5.75 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-0.999,0];
 RCVD_IN_DNSWL_LOW(-0.10)[246.132.29.211.list.dnswl.org : 127.0.5.1];
 FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23];
 FREEMAIL_FROM(0.00)[optusnet.com.au];
 MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_ALL(0.00)[];
 DMARC_NA(0.00)[optusnet.com.au]; RCPT_COUNT_FIVE(0.00)[6];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 IP_SCORE(-2.54)[ip: (-7.21), ipnet: 211.28.0.0/14(-3.20), asn: 4804(-2.32),
 country: AU(0.01)]; 
 MX_GOOD(-0.01)[cached: extmail.optusnet.com.au];
 NEURAL_HAM_SHORT(-0.89)[-0.894,0]; RCVD_NO_TLS_LAST(0.10)[];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 FREEMAIL_ENVFROM(0.00)[optusnet.com.au];
 ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU];
 MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all/>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2019 09:48:11 -0000

On Mon, 15 Jul 2019, John Baldwin wrote:

> On 7/14/19 12:08 PM, Conrad Meyer wrote:
>>
>> This change restores the possible overflow beyond IO_SEQMAX that the
>> removed conditional prevented.
>>
>> On Tue, Jun 25, 2019 at 12:44 PM Alan Somers <asomers@freebsd.org> wrote:

I thought the same for a while on Tue, Jan 25, then didn't reply since I
saw a non-problem while writing the reply.  Actually, all versions are
broken.

>>> Author: asomers
>>> Date: Tue Jun 25 19:44:22 2019
>>> New Revision: 349391
>>> URL: https://svnweb.freebsd.org/changeset/base/349391
>>>
>>> --- head/sys/kern/vfs_vnops.c   Tue Jun 25 19:36:01 2019        (r349390)
>>> +++ head/sys/kern/vfs_vnops.c   Tue Jun 25 19:44:22 2019        (r349391)
>>> @@ -499,10 +499,8 @@ sequential_heuristic(struct uio *uio, struct file *fp)
>>>                  * closely related to the best I/O size for real disks than
>>>                  * to any block size used by software.
>>>                  */
>>> -               fp->f_seqcount += MIN(IO_SEQMAX,
>>> +               fp->f_seqcount += lmin(IO_SEQMAX,
>>>                     howmany(uio->uio_resid, 16384));
>>> -               if (fp->f_seqcount > IO_SEQMAX)
>>> -                       fp->f_seqcount = IO_SEQMAX;
>>>                 return (fp->f_seqcount << IO_SEQSHIFT);
>>>         }

Overflow occurs in all versions in howmany() when uio_resid is near
its limit of SSIZE_MAX.  Then adding (65536 - 1) to uio_resid overflows.
The behaviour is undefined, but usually the addition gives a large
negative value and howmany gives a not so large negative value after
dividing by 65536.

In the previous version, MIN() preserves signedness and the type of
howmany(), which is always long (since ssize_t happens to be long on all
supported arches).  The large negative value is less than IO_SEQMAX sinc
that has a correct type (signed int) and is non-negative.  So MIN() preserves
the large negative value.  This is blindly added to f_seqcount, giving
another overflow on 64-bit arches and a garbage negative value on 32-bit
arches if f_seqcount was not garbage.  The removed conditional only fixes
up large positive values, so has no effect in this case.  Finally, left
shifting the garbage gives further undefined behaviour.

The second overflow on 64-bit arches is because the not so large negative
value is still much larger than can be represented in f_seqcount.
Values near -SSIZE_MAX have 63 value bits, so dividing by 16384 leaves
47 value bits, but f_seqcount has only 31 value bits.

In this version, lmin() happens to give the same result as MIN() on all
supported machines, because ssize_t happens to have the same representation
on all supported machines.  So overflows occur as before for large uio_resid.

So this change makes no differences for the 2 overflows for uio_resid near
SSIZE_MAX.  But removing the conditional restores an overflow that actually
was fixed in a previous recent commit.

>> Perhaps instead this should be:
>>
>> fp->f_seqcount = lmin(IO_SEQMAX,
>>   fp->f_seqcount + howmany(...));

This works to restore the check in the removed conditional, but is a bit
subtle and leaves the 2 old overflow bugs unfixed.

> While I agree with your assessment of the removed conditional, I think the
> suggestion above can have a funky overflow where f_seqcount + howmany(...) ends
> up less than IO_SEQMAX and this expression would then be used instead of IO_SEQMAX?

This only happens when howmany() itself overflows.  Otherwise, dividing by
16384 reduces the value enough to add f_seqcount to howmany(...) without
overflow.

> The first lmin seems designed to minimize what we add in the hope that since the
> existing value is bounded to '0...IO_SEQMAX' the result of the += can only be in
> the range '0...2*IO_SEQMAX'.

One addition doesn't overflow, and the shift could be clamped later, but
overflow still occurs after approx. INT_MAX / IO_SEQMAX additions, so it
is better to clamp earlier.

> I'm not sure which variants are most readable:
>
> A)
>    fp->f_seqcount += lmin(IO_SEQMAX,
>        howmany(resid, 16384));
>    if (fp->f_seqcount > IO_SEQMAX)
>            fp->f_seqcount = IO_SEQMAX;

This change is because I said that using the unsafe macro MIN() is a style
bug.  I also said that using the lmin() family is too hard, since it is
not type generic.  I didn't notice the overflow problem in howmany().

This has another style bug -- using lmin() and then open-coding the clamp
f->f_seqcount = imax(f->f_seqcount, IO_SEQMAX).  This is sort of backwards --
we use open code for the simple case because it is simple with either
spelling, but we use lmin() for the complicated case to avoid spelling out
the howmany() expression twice.  This obscures the 2 overflow bugs in the
howmany() expression and many delicate type promotions and type equivalences
that are needed for lmin() to work.

> B)
>    fp->f_seqcount += lmin(IO_SEQMAX,
>        howmany(resid, 16384));
>    fp->f_seqcount = lmin(fp->f_seqcount, IO_SEQMAX);

Consistent style, but another has another type error.  f_seqcount has type
int.  Of course, lmin() handles ints too, but to see that it works you
still have to check that f_seqcount is representable as long and once you
do that it is easier to see that imin() is courrect.

> C)
>    if (fp->f_seqcount + howmany(resid, 16384) < fp->f_seqcount)
>            fp->f_seqcount = IO_SEQMAX;
>    else
>            fp->f_seqcount = lmin(IO_SEQMAX,
>                fp->f_seqcount + howmany(resid, 16384));
>
> I'm probably not a fan of C).

On supported arches, it recovers from overflow in howmany(), but only if
the compiler doesn't optimize away the recovery.

The first part can be done better:

 	if (uio->uio_resid >= IO_SEQMAX * 16384)
 		fp->f_seqcount = IO_SEQMAX;

Then for the second part, any version works since all values are small and
non-negative, but I prefer to restore the the version that I rewrote 15-20
years ago and committed 11 years ago (r175106):

 		fp->f_seqcount += howmany(uio->uio_resid, 16384);
 		if (fp->f_seqcount > IO_SEQMAX)
 			fp->f_seqcount = IO_SEQMAX;

My changes to this were to use 16384 instead of BKVASIZE and howmany()
instead of its expansion.  Changing to howmany() was a mistake since
it hides the first overflow.  When I rewrote it, uio_resid had type
int so it was small enough after division by 16384, but not small enough
to add (16384 - 1) to.  I didn't change the open-coded clamp on f_seqcount
in this because it was good enough.

Bruce