Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 Sep 1999 19:55:26 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        current@FreeBSD.ORG
Subject:   Re: request for review, patch to specfs to fix EOF condition alignment with buffer
Message-ID:  <Pine.BSF.4.10.9909221825570.52968-100000@alphplex.bde.org>
In-Reply-To: <199909201919.MAA83623@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
>     This is a request for a review.  This patch fixes a bug in specfs
>     relating to dealing with the EOF condition of a block device.
> 
>     If the EOF occurs in the middle of a block, specfs was not
>     properly calculating the truncation for the I/O.

I didn't have time to review this properly when you first asked.

It is a bit over-commented, and returns wrong error codes for EOF.

> Index: miscfs/specfs/spec_vnops.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/miscfs/specfs/spec_vnops.c,v
> retrieving revision 1.108
> diff -u -r1.108 spec_vnops.c
> --- spec_vnops.c	1999/09/17 06:10:26	1.108
> +++ spec_vnops.c	1999/09/20 17:50:48
> @@ -311,19 +311,37 @@
>  		do {
>  			bn = btodb(uio->uio_offset) & ~(bscale - 1);
>  			on = uio->uio_offset % bsize;
> -			n = min((unsigned)(bsize - on), uio->uio_resid);
>  			if (seqcount > 1) {
>  				nextbn = bn + bscale;
>  				error = breadn(vp, bn, (int)bsize, &nextbn,
>  					(int *)&bsize, 1, NOCRED, &bp);
>  			} else {
>  				error = bread(vp, bn, (int)bsize, NOCRED, &bp);
> +			}
> +
> +			/*
> +			 * Figure out how much of the buffer is valid relative
> +			 * to our offset into the buffer, which may be negative
> +			 * if we are beyond the EOF.
> +			 *
> +			 * The valid size of the buffer is based on 
> +			 * bp->b_bcount (which may have been truncated by
> +			 * dscheck or the device) minus bp->b_resid, which
> +			 * may be indicative of an I/O error if non-zero.
> +			 */

The short read must have been caused by EOF unless error != 0.  Note that
filesystems depend on this, i.e., the following handling in VOP_STRATEGY():

    Case 1: do some i/o (possibly null), then hit an i/o error:
        Return the amount not done in b_resid, and set B_ERROR to indicate
        that i/o was truncated because of an error.
    Case 2: do some i/o (possibly null), then hit EOF:
        Return the amount not done in b_resid, and don't touch B_ERROR.
        Some drivers change b_bcount, but this is dubious, especially
        for writes where the data beyond b_bcount is still valid and
        something may be using it.

ffs handles these cases as follows:

    Case 1:
        Throw away the partial block.  Depend on VOP_STRATEGY() setting
	B_ERROR so that bread()/bwrite()/etc. return nonzero in this
	case.
    Case 2:
        Assume that this case can't happen for metadata (a reasonable
	assumption, since filesystems don't use blocks beyond EOF),
	and don't check for it.  Similarly for VOP_WRITE().  Handle it
	in VOP_READ(), although I think it can't happen there either.

specfs (for bdevs) currently handle these cases like ffs.  Short counts
are only handled for Case 2.

physio() handles the cases like VOP_STRATEGY().  It passes the distinction
bewtween EOF and error to its caller by returning nonzero only if there
was an error.  (Distinguishing the cases is especially important for
write-once devices.)  Control eventually reaches the level dofileread(),
etc., where the distinction is screwed up :-( (`error' is only reset to
0 for nonzero short counts if it is ERESTART, EINTR or EWOULDBLOCK).

> +			n = bp->b_bcount - on;
> +			if (n < 0) {
> +				error = EINVAL;

`error' shouldn't be changed here if it is already nonzero.

EINVAL is a wrong value to return for EOF here (but neccessary to give
bug for bug compatibility here).  See the comment in my version.

> +			} else {
> +				n -= bp->b_resid;
> +				if (n < 0)
> +					error = EIO;

EIO is a wronger value to return for EOF.

Here are my fixes (relative to the old version) for spec_read().  They are
over-commented to temporarily document the buggy EOF return code.

---
diff -c2 spec_vnops.c~ spec_vnops.c
*** spec_vnops.c~	Sat Sep 11 15:54:39 1999
--- spec_vnops.c	Wed Sep 22 19:47:52 1999
***************
*** 309,313 ****
  			bn = btodb(uio->uio_offset) & ~(bscale - 1);
  			on = uio->uio_offset % bsize;
- 			n = min((unsigned)(bsize - on), uio->uio_resid);
  			if (vp->v_lastr + bscale == bn) {
  				nextbn = bn + bscale;
--- 311,314 ----
***************
*** 317,325 ****
  				error = bread(vp, bn, (int)bsize, NOCRED, &bp);
  			vp->v_lastr = bn;
- 			n = min(n, bsize - bp->b_resid);
  			if (error) {
  				brelse(bp);
  				return (error);
  			}
  			error = uiomove((char *)bp->b_data + on, n, uio);
  			brelse(bp);
--- 318,343 ----
  				error = bread(vp, bn, (int)bsize, NOCRED, &bp);
  			vp->v_lastr = bn;
  			if (error) {
  				brelse(bp);
  				return (error);
  			}
+ 
+ 			/*
+ 			 * The amount of valid data in the buffer is
+ 			 * bp->b_bcount - bp->b_resid.  If this is smaller
+ 			 * than `on', return EINVAL to give bug for bug
+ 			 * compatibility with physio() and with ourself
+ 			 * (most strategy routines erroneously return EINVAL
+ 			 * for i/o completely beyond EOF).  POSIX and POLA
+ 			 * specify returning 0 for reads from all offsets
+ 			 * beyond EOF.
+ 			 */
+ 			n = bp->b_bcount - bp->b_resid - on;
+ 			if (n < 0) {
+ 				brelse(bp);
+ 				return (EINVAL);
+ 			}
+ 
+ 			n = imin(n, uio->uio_resid);
  			error = uiomove((char *)bp->b_data + on, n, uio);
  			brelse(bp);
---

I was also concerned about what happens when buffers with a short
b_bcount or a nonzero b_resid are left in the cache.  It seems that
nothing fatal happens, but they just get in the way.  I observed
the following farcial handling for rereading block 4 on /dev/vn0a
where /dev/vn0a has size 6:

    spec_read(): call bread()
      bread(): call getblk()
        getblk(): call gbincore()
          gb_incore(): returns the buffer successfully
        getblk(): The buffer has the wrong size, so it is thrown away.
                  Do a VOP_WRITE() as part of throwing it away, although
                  the buffer is not dirty! :-(
                  Return an empty buffer.
      bread(): do a VOP_STRATEGY() to read the buffer again.

Bruce



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.10.9909221825570.52968-100000>