From owner-freebsd-emulation@FreeBSD.ORG Mon May 7 04:14:46 2007 Return-Path: X-Original-To: emulation@freebsd.org Delivered-To: freebsd-emulation@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 87EC416A400 for ; Mon, 7 May 2007 04:14:46 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2-3.pacific.net.au [61.8.2.226]) by mx1.freebsd.org (Postfix) with ESMTP id 2901213C459 for ; Mon, 7 May 2007 04:14:46 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.2.162]) by mailout2.pacific.net.au (Postfix) with ESMTP id ECE4C111217; Mon, 7 May 2007 14:14:37 +1000 (EST) Received: from besplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (Postfix) with ESMTP id 4575A8C07; Mon, 7 May 2007 14:14:43 +1000 (EST) Date: Mon, 7 May 2007 14:14:42 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Scot Hetzel In-Reply-To: <790a9fff0705061005v1a1a883ehc155bac7f747c3eb@mail.gmail.com> Message-ID: <20070507124848.S47783@besplex.bde.org> References: <790a9fff0705021345j2ad9ae98o56aaf357d556fe27@mail.gmail.com> <790a9fff0705040004oab16ed8q1a1c476386379ea9@mail.gmail.com> <20070504190007.Y37951@besplex.bde.org> <790a9fff0705040819u24e4c2f0s5c9fc34b93770e13@mail.gmail.com> <790a9fff0705061005v1a1a883ehc155bac7f747c3eb@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: emulation@freebsd.org Subject: Re: linuxolator: LTP lseek03 failure X-BeenThere: freebsd-emulation@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Development of Emulators of other operating systems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2007 04:14:46 -0000 On Sun, 6 May 2007, Scot Hetzel wrote: > On 5/4/07, Scot Hetzel wrote: >> Would the best fix be to change the native lseek to return EINVAL when >> fo_ioctl returns ENOTTY? Probably. > I had a look at the OpenSolaris implementation of lseek and found that > we are returning the wrong error code for the ENOTTY case. When > ENOTTY is returned by fo_ioctl, we should be checking for the > following cases: > > SEEK_DATA - Is offset past end of file > SEEK_HOLE - Return virtual hole at end of file, if offset is valid > > on error, SEEK_DATA and SEEK_HOLE should be returning ENXIO for these > cases. OpenSolaris also checks offset > OFF_MAX, and returns > EOVERFLOW. > > When the lseek03 test is run, with these changes, it returns with ENXIO. Shouldn't it return with EOVERFLOW even for zfs? I guess the test doesn't check for that, so the result should be EINVAL for non-zfs and maybe ENXIO for zfs if the test didn't set up right for the seeks to work (presumably it sets the offset to something reasonable so as not to get spurious errors from args other than `whence', but it expects i/o to fail so it doesn't set up for i/o. > I sent the attached patch for kern/vfs_syscalls.c to pjd for review. The case of negative offsets also seems to be broken. POSIX requires returning EINVAL for negative offsets for files of type regular, block special and some others. We do this near the end of the the function in the noneg case, but: - block special files have rotted to just character special variants. Seeking to negative offsets on disk files is useless, so the noneg case should apply to cdevs that are disks like it must apply to bdevs that were disks. - the SEEK_DATA/SEEK_HOLE ioctl runs before the check for negative offsets (other cases fall through to the check before doing anything that will break the check). zfs_ioctl() does bogus bounds checking via zfs_holey(): - "off_t offset" has become "offset_t off" (offset_t == off_t so there is no type error yet) - "off" is assigned to uint64_t noff. This may overflow in theory, but doesn't in practive since off_t is int64_t. uoff_t (spelled u_offset_t in the opensolaris compat layer) should be used to avoid this logic error. - the bounds of noff are checked. If offset was negative, then noff is large unsugned so ENXIO is returned. I can't see anywhere that translates this to EOVERFLOW. % Index: vfs_syscalls.c % =================================================================== % RCS file: /home/ncvs/src/sys/kern/vfs_syscalls.c,v % retrieving revision 1.438 % diff -u -r1.438 vfs_syscalls.c % --- vfs_syscalls.c 4 May 2007 14:23:28 -0000 1.438 % +++ vfs_syscalls.c 6 May 2007 15:37:39 -0000 % @@ -1755,9 +1755,36 @@ % break; % case SEEK_DATA: % error = fo_ioctl(fp, FIOSEEKDATA, &offset, cred, td); % + if (error == ENOTTY) { % + /* % + * The ioctl is not supported. Is the offset % + * past the end of the file. % + */ % + error = VOP_GETATTR(vp, &vattr, fp->f_cred, td); % + if (error == 0 && % + (offset >= (uoff_t)vattr.va_size)) % + error = ENXIO; % + } I think this should just return EINVAL (or later, when all file systems are supposed to support SEEK_DATA, maybe EOPNOTSUPP). Detecting an error only at the end of a file (and still returning the impossible error ENOTTY otherwise) is not useful. % + if (noneg && (offset > OFF_MAX)) % + error = EOVERFLOW; This shouldn't happen. offset has type off_t, and OFF_MAX is the maximum value of an off_t. I think EOVERFLOW is only the appropriate error if the final offset would be unrepresentable, as can happen for relative seeks, but here the offset is either garbage or the same as the origina; offset, and if SEEK_DATA is actually implemented then the final offset could only be > OFF_MAX if the file system implements files with unreachable offsets. Strictly, `offset' is garbage on error, so we shouldn't check it. The case where the original offset is < 0 still falls through to the "offset < 0" check later, but that check is only done if errno == 0 (correctly, since the offset is garbage on error), so the above still returns ENOTTY for negative offsets. % break; % case SEEK_HOLE: % error = fo_ioctl(fp, FIOSEEKHOLE, &offset, cred, td); % + if (error == ENOTTY) { % + /* % + * The ioctl is not supported. Return virtual % + * hole at end of file, if offset is valid. % + */ % + error = VOP_GETATTR(vp, &vattr, fp->f_cred, td); % + if (error == 0) { % + if (uap->offset < (off_t)vattr.va_size) False positive if uap->offset < 0. % + offset = (uoff_t)vattr.va_size; % + else % + error = ENXIO; % + } % + } Skipping over possible holes to reach EOF is even less useful than detecting some errors since it has wrong sematics. However, this case has no effect, since error remains set at ENOTTY, so the rest of the function does nothing except clean up and return ENOTTY. % + if (noneg && (offset > OFF_MAX)) % + error = EOVERFLOW; % break; As above. % default: % error = EINVAL; Bruce