Date: Wed, 22 Sep 1999 19:55:26 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: Matthew Dillon <dillon@apollo.backplane.com> Cc: current@FreeBSD.ORG Subject: Re: request for review, patch to specfs to fix EOF condition alignment with buffer Message-ID: <Pine.BSF.4.10.9909221825570.52968-100000@alphplex.bde.org> In-Reply-To: <199909201919.MAA83623@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> This is a request for a review. This patch fixes a bug in specfs > relating to dealing with the EOF condition of a block device. > > If the EOF occurs in the middle of a block, specfs was not > properly calculating the truncation for the I/O. I didn't have time to review this properly when you first asked. It is a bit over-commented, and returns wrong error codes for EOF. > Index: miscfs/specfs/spec_vnops.c > =================================================================== > RCS file: /home/ncvs/src/sys/miscfs/specfs/spec_vnops.c,v > retrieving revision 1.108 > diff -u -r1.108 spec_vnops.c > --- spec_vnops.c 1999/09/17 06:10:26 1.108 > +++ spec_vnops.c 1999/09/20 17:50:48 > @@ -311,19 +311,37 @@ > do { > bn = btodb(uio->uio_offset) & ~(bscale - 1); > on = uio->uio_offset % bsize; > - n = min((unsigned)(bsize - on), uio->uio_resid); > if (seqcount > 1) { > nextbn = bn + bscale; > error = breadn(vp, bn, (int)bsize, &nextbn, > (int *)&bsize, 1, NOCRED, &bp); > } else { > error = bread(vp, bn, (int)bsize, NOCRED, &bp); > + } > + > + /* > + * Figure out how much of the buffer is valid relative > + * to our offset into the buffer, which may be negative > + * if we are beyond the EOF. > + * > + * The valid size of the buffer is based on > + * bp->b_bcount (which may have been truncated by > + * dscheck or the device) minus bp->b_resid, which > + * may be indicative of an I/O error if non-zero. > + */ The short read must have been caused by EOF unless error != 0. Note that filesystems depend on this, i.e., the following handling in VOP_STRATEGY(): Case 1: do some i/o (possibly null), then hit an i/o error: Return the amount not done in b_resid, and set B_ERROR to indicate that i/o was truncated because of an error. Case 2: do some i/o (possibly null), then hit EOF: Return the amount not done in b_resid, and don't touch B_ERROR. Some drivers change b_bcount, but this is dubious, especially for writes where the data beyond b_bcount is still valid and something may be using it. ffs handles these cases as follows: Case 1: Throw away the partial block. Depend on VOP_STRATEGY() setting B_ERROR so that bread()/bwrite()/etc. return nonzero in this case. Case 2: Assume that this case can't happen for metadata (a reasonable assumption, since filesystems don't use blocks beyond EOF), and don't check for it. Similarly for VOP_WRITE(). Handle it in VOP_READ(), although I think it can't happen there either. specfs (for bdevs) currently handle these cases like ffs. Short counts are only handled for Case 2. physio() handles the cases like VOP_STRATEGY(). It passes the distinction bewtween EOF and error to its caller by returning nonzero only if there was an error. (Distinguishing the cases is especially important for write-once devices.) Control eventually reaches the level dofileread(), etc., where the distinction is screwed up :-( (`error' is only reset to 0 for nonzero short counts if it is ERESTART, EINTR or EWOULDBLOCK). > + n = bp->b_bcount - on; > + if (n < 0) { > + error = EINVAL; `error' shouldn't be changed here if it is already nonzero. EINVAL is a wrong value to return for EOF here (but neccessary to give bug for bug compatibility here). See the comment in my version. > + } else { > + n -= bp->b_resid; > + if (n < 0) > + error = EIO; EIO is a wronger value to return for EOF. Here are my fixes (relative to the old version) for spec_read(). They are over-commented to temporarily document the buggy EOF return code. --- diff -c2 spec_vnops.c~ spec_vnops.c *** spec_vnops.c~ Sat Sep 11 15:54:39 1999 --- spec_vnops.c Wed Sep 22 19:47:52 1999 *************** *** 309,313 **** bn = btodb(uio->uio_offset) & ~(bscale - 1); on = uio->uio_offset % bsize; - n = min((unsigned)(bsize - on), uio->uio_resid); if (vp->v_lastr + bscale == bn) { nextbn = bn + bscale; --- 311,314 ---- *************** *** 317,325 **** error = bread(vp, bn, (int)bsize, NOCRED, &bp); vp->v_lastr = bn; - n = min(n, bsize - bp->b_resid); if (error) { brelse(bp); return (error); } error = uiomove((char *)bp->b_data + on, n, uio); brelse(bp); --- 318,343 ---- error = bread(vp, bn, (int)bsize, NOCRED, &bp); vp->v_lastr = bn; if (error) { brelse(bp); return (error); } + + /* + * The amount of valid data in the buffer is + * bp->b_bcount - bp->b_resid. If this is smaller + * than `on', return EINVAL to give bug for bug + * compatibility with physio() and with ourself + * (most strategy routines erroneously return EINVAL + * for i/o completely beyond EOF). POSIX and POLA + * specify returning 0 for reads from all offsets + * beyond EOF. + */ + n = bp->b_bcount - bp->b_resid - on; + if (n < 0) { + brelse(bp); + return (EINVAL); + } + + n = imin(n, uio->uio_resid); error = uiomove((char *)bp->b_data + on, n, uio); brelse(bp); --- I was also concerned about what happens when buffers with a short b_bcount or a nonzero b_resid are left in the cache. It seems that nothing fatal happens, but they just get in the way. I observed the following farcial handling for rereading block 4 on /dev/vn0a where /dev/vn0a has size 6: spec_read(): call bread() bread(): call getblk() getblk(): call gbincore() gb_incore(): returns the buffer successfully getblk(): The buffer has the wrong size, so it is thrown away. Do a VOP_WRITE() as part of throwing it away, although the buffer is not dirty! :-( Return an empty buffer. bread(): do a VOP_STRATEGY() to read the buffer again. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.10.9909221825570.52968-100000>