From owner-svn-src-head@freebsd.org Tue Jul 16 21:00:06 2019 Return-Path: Delivered-To: svn-src-head@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 3E731BA609; Tue, 16 Jul 2019 21:00:06 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com [209.85.167.68]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 78A5174E1C; Tue, 16 Jul 2019 21:00:04 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-lf1-f68.google.com with SMTP id b29so7511302lfq.1; Tue, 16 Jul 2019 14:00:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=q1h+bP7JnosNtW0UWQC/5MMsyPa3GKhEoc60xHo0bO0=; b=qHmHU3iRyEAI746m8jWMEmD2QX+dzZjG+vNxQzeCvFgjpfsGOOfx92QT385u/8Jbyr N/fK44r5qt6iWKy1JCUzeoR/xq16c8fmObMg9musmLZJKI7X/IbxqLKvdkOU8pYcqVew QZ037oM+Mo7/K72yRIOwHRAB2mxycrZ3f8FLbueQeHTcmhlTGh0o4O3W27TZCM5ElGOj Bg8ghNeRM9n66ST2fL8MUWmCPUrXQ5wKZFi8v8VfYvn0Vn9TpoEPxNR0twCWPWILyift 0j6FplnZLRs/vz79vlmYPeRCAxIbZOZSOYS+DB6XNEPwBnZekYBFuChL54FgBlGxdvQR DB9Q== X-Gm-Message-State: APjAAAWMCvviJia+jJuQw/laRKRiKk+l2k8WbpyYuMmrsfaC7GJi+6ia WMVByv4KvNygyOxPT869CR6qND8H9yZe5SEipNA= X-Google-Smtp-Source: APXvYqzJwSSQ2KQauw13pRligqgstrAHKNhmeSwwsGQsqAxNkeN1Dx18lT/hWgwYqAP6dllXYtuk6HmKssyOXMaZnNs= X-Received: by 2002:a19:6904:: with SMTP id e4mr16082230lfc.156.1563310370481; Tue, 16 Jul 2019 13:52:50 -0700 (PDT) MIME-Version: 1.0 References: <201906251944.x5PJiNaA093352@repo.freebsd.org> <3b2ade4e-d19d-31e4-7ee2-c0085a00e53e@FreeBSD.org> <20190716180618.C962@besplex.bde.org> In-Reply-To: <20190716180618.C962@besplex.bde.org> From: Alan Somers Date: Tue, 16 Jul 2019 14:52:38 -0600 Message-ID: Subject: Re: svn commit: r349391 - head/sys/kern To: Bruce Evans Cc: John Baldwin , "Conrad E. Meyer" , src-committers , svn-src-all , svn-src-head Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 78A5174E1C X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of asomers@gmail.com designates 209.85.167.68 as permitted sender) smtp.mailfrom=asomers@gmail.com X-Spamd-Result: default: False [-3.55 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.89)[-0.886,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[freebsd.org]; RCPT_COUNT_FIVE(0.00)[6]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; NEURAL_HAM_SHORT(-0.49)[-0.491,0]; RCVD_IN_DNSWL_NONE(0.00)[68.167.85.209.list.dnswl.org : 127.0.5.0]; RCVD_TLS_LAST(0.00)[]; FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com]; FREEMAIL_TO(0.00)[optusnet.com.au]; RWL_MAILSPIKE_POSSIBLE(0.00)[68.167.85.209.rep.mailspike.net : 127.0.0.17]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com]; IP_SCORE(-1.16)[ip: (0.15), ipnet: 209.85.128.0/17(-3.45), asn: 15169(-2.45), country: US(-0.06)]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Jul 2019 21:00:06 -0000 On Tue, Jul 16, 2019 at 3:48 AM Bruce Evans wrote: > > On Mon, 15 Jul 2019, John Baldwin wrote: > > > On 7/14/19 12:08 PM, Conrad Meyer wrote: > >> > >> This change restores the possible overflow beyond IO_SEQMAX that the > >> removed conditional prevented. > >> > >> On Tue, Jun 25, 2019 at 12:44 PM Alan Somers wrote: > > I thought the same for a while on Tue, Jan 25, then didn't reply since I > saw a non-problem while writing the reply. Actually, all versions are > broken. > > >>> Author: asomers > >>> Date: Tue Jun 25 19:44:22 2019 > >>> New Revision: 349391 > >>> URL: https://svnweb.freebsd.org/changeset/base/349391 > >>> > >>> --- head/sys/kern/vfs_vnops.c Tue Jun 25 19:36:01 2019 (r349390) > >>> +++ head/sys/kern/vfs_vnops.c Tue Jun 25 19:44:22 2019 (r349391) > >>> @@ -499,10 +499,8 @@ sequential_heuristic(struct uio *uio, struct file *fp) > >>> * closely related to the best I/O size for real disks than > >>> * to any block size used by software. > >>> */ > >>> - fp->f_seqcount += MIN(IO_SEQMAX, > >>> + fp->f_seqcount += lmin(IO_SEQMAX, > >>> howmany(uio->uio_resid, 16384)); > >>> - if (fp->f_seqcount > IO_SEQMAX) > >>> - fp->f_seqcount = IO_SEQMAX; > >>> return (fp->f_seqcount << IO_SEQSHIFT); > >>> } > > Overflow occurs in all versions in howmany() when uio_resid is near > its limit of SSIZE_MAX. Then adding (65536 - 1) to uio_resid overflows. > The behaviour is undefined, but usually the addition gives a large > negative value and howmany gives a not so large negative value after > dividing by 65536. > > In the previous version, MIN() preserves signedness and the type of > howmany(), which is always long (since ssize_t happens to be long on all > supported arches). The large negative value is less than IO_SEQMAX sinc > that has a correct type (signed int) and is non-negative. So MIN() preserves > the large negative value. This is blindly added to f_seqcount, giving > another overflow on 64-bit arches and a garbage negative value on 32-bit > arches if f_seqcount was not garbage. The removed conditional only fixes > up large positive values, so has no effect in this case. Finally, left > shifting the garbage gives further undefined behaviour. > > The second overflow on 64-bit arches is because the not so large negative > value is still much larger than can be represented in f_seqcount. > Values near -SSIZE_MAX have 63 value bits, so dividing by 16384 leaves > 47 value bits, but f_seqcount has only 31 value bits. > > In this version, lmin() happens to give the same result as MIN() on all > supported machines, because ssize_t happens to have the same representation > on all supported machines. So overflows occur as before for large uio_resid. > > So this change makes no differences for the 2 overflows for uio_resid near > SSIZE_MAX. But removing the conditional restores an overflow that actually > was fixed in a previous recent commit. > > >> Perhaps instead this should be: > >> > >> fp->f_seqcount = lmin(IO_SEQMAX, > >> fp->f_seqcount + howmany(...)); > > This works to restore the check in the removed conditional, but is a bit > subtle and leaves the 2 old overflow bugs unfixed. > > > While I agree with your assessment of the removed conditional, I think the > > suggestion above can have a funky overflow where f_seqcount + howmany(...) ends > > up less than IO_SEQMAX and this expression would then be used instead of IO_SEQMAX? > > This only happens when howmany() itself overflows. Otherwise, dividing by > 16384 reduces the value enough to add f_seqcount to howmany(...) without > overflow. > > > The first lmin seems designed to minimize what we add in the hope that since the > > existing value is bounded to '0...IO_SEQMAX' the result of the += can only be in > > the range '0...2*IO_SEQMAX'. > > One addition doesn't overflow, and the shift could be clamped later, but > overflow still occurs after approx. INT_MAX / IO_SEQMAX additions, so it > is better to clamp earlier. > > > I'm not sure which variants are most readable: > > > > A) > > fp->f_seqcount += lmin(IO_SEQMAX, > > howmany(resid, 16384)); > > if (fp->f_seqcount > IO_SEQMAX) > > fp->f_seqcount = IO_SEQMAX; > > This change is because I said that using the unsafe macro MIN() is a style > bug. I also said that using the lmin() family is too hard, since it is > not type generic. I didn't notice the overflow problem in howmany(). > > This has another style bug -- using lmin() and then open-coding the clamp > f->f_seqcount = imax(f->f_seqcount, IO_SEQMAX). This is sort of backwards -- > we use open code for the simple case because it is simple with either > spelling, but we use lmin() for the complicated case to avoid spelling out > the howmany() expression twice. This obscures the 2 overflow bugs in the > howmany() expression and many delicate type promotions and type equivalences > that are needed for lmin() to work. > > > B) > > fp->f_seqcount += lmin(IO_SEQMAX, > > howmany(resid, 16384)); > > fp->f_seqcount = lmin(fp->f_seqcount, IO_SEQMAX); > > Consistent style, but another has another type error. f_seqcount has type > int. Of course, lmin() handles ints too, but to see that it works you > still have to check that f_seqcount is representable as long and once you > do that it is easier to see that imin() is courrect. > > > C) > > if (fp->f_seqcount + howmany(resid, 16384) < fp->f_seqcount) > > fp->f_seqcount = IO_SEQMAX; > > else > > fp->f_seqcount = lmin(IO_SEQMAX, > > fp->f_seqcount + howmany(resid, 16384)); > > > > I'm probably not a fan of C). > > On supported arches, it recovers from overflow in howmany(), but only if > the compiler doesn't optimize away the recovery. > > The first part can be done better: > > if (uio->uio_resid >= IO_SEQMAX * 16384) > fp->f_seqcount = IO_SEQMAX; > > Then for the second part, any version works since all values are small and > non-negative, but I prefer to restore the the version that I rewrote 15-20 > years ago and committed 11 years ago (r175106): > > fp->f_seqcount += howmany(uio->uio_resid, 16384); > if (fp->f_seqcount > IO_SEQMAX) > fp->f_seqcount = IO_SEQMAX; > > My changes to this were to use 16384 instead of BKVASIZE and howmany() > instead of its expansion. Changing to howmany() was a mistake since > it hides the first overflow. When I rewrote it, uio_resid had type > int so it was small enough after division by 16384, but not small enough > to add (16384 - 1) to. I didn't change the open-coded clamp on f_seqcount > in this because it was good enough. > > Bruce Bruce, Is this how you want it? Index: sys/kern/vfs_vnops.c =================================================================== --- sys/kern/vfs_vnops.c (revision 349977) +++ sys/kern/vfs_vnops.c (working copy) @@ -499,8 +499,13 @@ * closely related to the best I/O size for real disks than * to any block size used by software. */ - fp->f_seqcount += lmin(IO_SEQMAX, - howmany(uio->uio_resid, 16384)); + if (uio->uio_resid >= IO_SEQMAX * 16384) + fp->f_seqcount = IO_SEQMAX; + else { + fp->f_seqcount += howmany(uio->uio_resid, 16384); + if (fp->f_seqcount > IO_SEQMAX) + fp->f_seqcount = IO_SEQMAX; + } return (fp->f_seqcount << IO_SEQSHIFT); } -Alan