From owner-svn-src-head@freebsd.org Fri Jan 26 11:36:41 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5C0EEED1627; Fri, 26 Jan 2018 11:36:41 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 9AE866F51D; Fri, 26 Jan 2018 11:36:40 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 8415DD4E3E2; Fri, 26 Jan 2018 22:36:38 +1100 (AEDT) Date: Fri, 26 Jan 2018 22:36:38 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Pedro Giffuni cc: Bruce Evans , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r328346 - in head/sys: fs/ext2fs ufs/ffs ufs/ufs In-Reply-To: <21b6bdda-65b7-89da-4dd6-bed64978eba8@FreeBSD.org> Message-ID: <20180126214948.C1040@besplex.bde.org> References: <201801241758.w0OHwm26063524@repo.freebsd.org> <20180126020540.B2181@besplex.bde.org> <8d5ddd06-14b2-e7e1-14dd-5e9d42f9b33c@FreeBSD.org> <20180126053133.R3207@besplex.bde.org> <21b6bdda-65b7-89da-4dd6-bed64978eba8@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=LKgWeNe9 c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=yrInW6LWO6Wa9mYlnkQA:9 a=CjuIK1q_8ugA:10 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jan 2018 11:36:41 -0000 On Thu, 25 Jan 2018, Pedro Giffuni wrote: > On 25/01/2018 14:24, Bruce Evans wrote: >> ... >> This code only works because (if?) nfs is the only caller and nfs never >> passes insane values. >> > > I am starting to think that we should simply match uio_resid and set it to > ssize_t. > Returning the value to int is certainly not the solution. Of course using the correct type (int) is part of the solution. uio_must be checked before it is used for cookies, and after checking it, it is small so it fits easily in an int. It must also checked to be nonnegative, so that it doesn't suffer unsigned poisoning when it is promoted, so it would also fit in a u_int, but using u_int to store it is silly as using 1U instead of 1 for a count of 1. The bounds checking is something like: if (ap->uio_resid < 0) ap->uio_resid = 0; if (ap->a_ncookies != NULL) { if (ap->uio_resid >= 64 * 1024) ap->uio_resid = 64 * 1024; ncookies = ap->uio_resid; } This checks for negative values for all cases and converts to 0 (EOF) to preserve historical behaviour for the syscall case and to avoid overflow for the cookies case (in case the caller is buggy). The correct handling is to return EINVAL, but EOF is good enough. In the syscall case, uio_resid can be up to SSIZE_MAX, so don't check it or corrupt it by assigning it to an int or u_int. Limit uio_resid from above only in the cookies case. The final limit should be about 128K (whatever nfs uses) or maybe 1M. Don't return EINVAL above the limit, since nfs probably wouldn't know how to handle that (by retrying with a smaller size). Test its handling of short counts instead. It is expected than nfs asks for 128K and we supply at most 64K. The supply is always reduced at EOF. Hopefully nfs doesn't treat the short count as EOF. It should retry until we supply 0. After limiting uio_resid, assign it to the int ncookies. This doesn't fix the abuse of the ncookies counter to hold the size of the cookies array in bytes for this and the next couple of statements. Normally the bounds checking should be at the top level, with at most KASSERT()s at lower levels, but here the levels are mixed, and it isn't clear if kernel callers have already checked, and it doesn't cost much to do much the same checking for the kernel callers as for the syscall callers. Perhaps the 128K limit is good for all cases (this depends on callers not having buggy short count handling). Directories of this size are very rare (don't forget to create very large ones when you test this). Doing anything with directories of this size tends to be slow anyway, and the slowness has nothing to do with reading only 128K instead of SSIZE_MAX bytes at a time. readdir() in FreeBSD seems to use a read size of only PAGE_SIZE, except in the unionfs case it seems to try to read the whole direction. It malloc()s the buffer in both cases. Blindy malloc()ing or mmap()ing a buffer large enough for a whole file or directory is no good, since in theory even directory sizes can be much larger than memory. Bruce