From owner-freebsd-fs@freebsd.org Mon Feb 1 19:22:20 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EC0AAA97C51 for ; Mon, 1 Feb 2016 19:22:20 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mail-wm0-x22f.google.com (mail-wm0-x22f.google.com [IPv6:2a00:1450:400c:c09::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 92D511084 for ; Mon, 1 Feb 2016 19:22:20 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mail-wm0-x22f.google.com with SMTP id r129so86065469wmr.0 for ; Mon, 01 Feb 2016 11:22:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sippysoft-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=5XHIGGxfleYmkXbc9JUeLoO7PvtH8cPa/a7jVyFtsfA=; b=1M+rYPtb3mpUBil9BnGSoGQTY6tptThBa9vqSsNjGHpdwW/tG65lve8Yq2PTl8rrkS QV+05FvfpkSlZ2Nt8iwkg8fRJlhZ9a/HsaZ68loHezj4iSqJzsrDFUPZ2qdU+XuwgPZ5 Tveq/GCIsWNKfILm64OJs4xxWXVW7Jwfvn3c8mgFzSBAcFwk5sbnU//a3T9xdd2ofFdB j5waMqH8iQu6FUJk+pbQWoEaZyuVIxkumGU6Z3DcU8rwGmvZep7dxSr/spgCyNuNDXUA bKtk+8pblPST3gS6nXVJ+SYJvtqZZGHPRTsRMyZYL2HFXijZ5oxAUBwQmoRdq8rl4Rs3 X3Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=5XHIGGxfleYmkXbc9JUeLoO7PvtH8cPa/a7jVyFtsfA=; b=UzmUJnSBODCe+G5qGLE4xONRtolJrAo2F4yyVrFBTondBd7ehaelitTFow2R5LIlNJ R0taZ2pb5jSa4cCYG8LC4hCnjGAjh7WiD1V5BKi3gPn+tSkAbZtaR55DyTCTIZTAWFQv 5bmRbpq58IEuAn4FqwPSF6ib+flqQi8Q8LYgwbgMSZjQnpzyBw7f4UkPLcvFqxG18unR A887t9NNrbq5Ll//15Tizi7jOfZKrfisscMZuR5Uv3XrPETujiWD8B0x556jeQ6y5ZPo nsQj6nHFxnbJS3QT2Ob96Zo483wy2qw5Flt1oG6LlXTaatJwTBK20tVdX3WLQ7ZKzd7N rWOQ== X-Gm-Message-State: AG10YOTtF0vjrsX2PoGCyjcHE4CKCwBi8H0I9yO5FmpXthzSFqPQa89+l7xhbugwpe89mKp/MyzEJC2NqEQAnXPp MIME-Version: 1.0 X-Received: by 10.28.1.23 with SMTP id 23mr12952224wmb.37.1454354538995; Mon, 01 Feb 2016 11:22:18 -0800 (PST) Sender: sobomax@sippysoft.com Received: by 10.27.39.195 with HTTP; Mon, 1 Feb 2016 11:22:18 -0800 (PST) In-Reply-To: <20160201182257.GN91220@kib.kiev.ua> References: <20160201165648.GM91220@kib.kiev.ua> <20160201182257.GN91220@kib.kiev.ua> Date: Mon, 1 Feb 2016 11:22:18 -0800 X-Google-Sender-Auth: xbhdZSQI8Pvg5CM9JCZeBWm1pT0 Message-ID: Subject: Re: Inconsistency between lseek(SEEK_HOLE) and lseek(SEEK_DATA) From: Maxim Sobolev To: Konstantin Belousov Cc: freebsd-fs@freebsd.org, Kirk McKusick Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Feb 2016 19:22:21 -0000 Well, it's still seems to be quite obscure. At the very least, the lseek(2) manual page needs to reflect that. Right now it says: ERRORS [...] [ENXIO] For SEEK_DATA, there are no more data regions past the supplied offset. For SEEK_HOLE, there are no more holes past the supplied offset. Which is not true, the SEEK_HOLE would return st_size when there are no more holes past the supplied offset, not ENXIO. It is also interesting that somehow empty file is a special case as well. Both SEEK_HOLE and SEEK_DATA return -1 on those. Anybody who programs to that document would probably get as confused as myself. However, having said that, our cousin Linux behaves the same - i.e. returns EOF+1 on SEEK_HOLE and -1 on SEEK_DATA, and does the same for empty files, so at least we are consistent with that. -Max On Mon, Feb 1, 2016 at 10:22 AM, Konstantin Belousov wrote: > On Mon, Feb 01, 2016 at 09:17:49AM -0800, Maxim Sobolev wrote: > > Here it is: > > > > The expected outcome is return code 0, the failure condition is in the > > lseek() returning 4 (i.e. sizeof(int)), not -1. > > > > ------ > > #include > > #include > > #include > > #include > > #include > > #include > > > > int main(void) > > { > > char tempname[] = "/tmp/temp.XXXXXX"; > > char *fname; > > int fd; > > off_t hole; > > > > fname = mktemp(tempname); > > if (fname == NULL) { > > exit (1); > > } > > fd = open(fname, O_WRONLY | O_CREAT | O_TRUNC, DEFFILEMODE); > > if (fd == -1) { > > exit (1); > > } > > if (write(fd, &fd, sizeof(fd)) <= 0) { > > exit (1); > > } > > hole = lseek(fd, 0, SEEK_HOLE); > > close(fd); > > unlink(fname); > > if (hole >= 0) { > > fprintf(stderr, "lseek() returned %jd, not -1\n", > > (intmax_t)hole); > > exit (1); > > } > > exit (0); > > } > > ------ > I tested you program on both UFS and ZFS, and the behaviour is > identical, lseek(SEEK_HOLE) points to the end of file. In fact, when I > did UFS implementation, I most likely considered this case and tested > ZFS compatibility, because the case is handled explicitely. Look at the > lines 2193-2197 in kern/vfs_vnops.c:vn_bmap_seekhole(), esp. the comment. > > For me, the results of the test are reasonable. There is no data > after EOF, and the idea of 'implicit hole' after EOF is one which > is quite intuitive. > > > > > > > On Mon, Feb 1, 2016 at 8:56 AM, Konstantin Belousov > > > wrote: > > > > > On Mon, Feb 01, 2016 at 07:57:40AM -0800, Maxim Sobolev wrote: > > > > Hi, > > > > > > > > I've noticed that lseek() behaved inconsistently with regards to > > > SEEK_HOLE > > > > and SEEK_DATA operations. The SEEK_HOLE on a data-only file returns > > > st_size > > > > (i.e. EOF + 1), while the SEEK_DATA on a hole-only file returns -1 > and > > > sets > > > > errno to ENXIO. The latter seems to be a documented way to indicate > that > > > > the file has no more data sections past this point. > > > > > > > > My first idea was that somehow most files has a hole attached to its > end > > > to > > > > fill up the FS block, but that does not seem to be a case. Trying to > > > > SEEK_HOLE past the end of any of those data-only files produces an > error > > > > (i.e. lseek(fd, st_size, SEEK_HOLE) == -1). > > > > > > > > In short, for some reason I cannot get proper ENXIO from the > SEEK_HOLE. > > > > What currently returned implies that there is 1-byte hole attached to > > > each > > > > file past its EOF and that does not smell right. > > > > > > > > All tests are done on UFS, fairly recent 11-current. > > > > > > > > > > There is no 'hole-only' files on UFS, the last byte in the UFS file > must > > > be populated, either by allocated fragment if the last byte is in the > > > direct blocks range, or by the full block if in the indirect range. > > > > > > Please show an exact minimal test case which reproduces what you > > > consider the bug, with the comment about the expected outcome in the > > > failing location. > > > > > > > >