From owner-freebsd-current@freebsd.org Fri Mar 3 16:39:17 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 55797CF6A0C; Fri, 3 Mar 2017 16:39:17 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-yw0-x234.google.com (mail-yw0-x234.google.com [IPv6:2607:f8b0:4002:c05::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 13D531D59; Fri, 3 Mar 2017 16:39:17 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-yw0-x234.google.com with SMTP id o4so22476647ywd.3; Fri, 03 Mar 2017 08:39:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=lZ+Qryg0b+bCBA27Yre2S2cBCxTDhzPcM54+HMtGvbU=; b=MNE29DVGeE9V0PcjeIAYbX6wwmFg+ZIoh7v1XHiXgfk1w0SY7equk5lzijHxtpvXod ISh6koP0XUQr7z7OyK6skjpECZiKIldx02Mlf7dx2uRrHZMYesTke6UaLm3XfdQknmmq rKRzkluJs4ssGWTV0mwDA3xE+O3B567jUxkrxGaE3IdpzNfCjG/t2RKlh5OcMLkYJFXF Ktq7a6W0sADOVvCJQJeUWCcyDYFwS0bPZ6JZ4w642IY+SCuHTSoEhSwq1MVJHVXcMkvc NR2uCqVA0VF6p2+hhcok+ANOneUTTRCBklpmhXkv1fNye8WZ7ueXM60Rx4Sikofy+Vqv YAxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=lZ+Qryg0b+bCBA27Yre2S2cBCxTDhzPcM54+HMtGvbU=; b=PHmiVXfHTxypMsd+QKJO9yLrOA5pmyzVmPCS1l9mIJ0hQQeWd+focDiGMwV4iUJAKf 7LrsFH/em8Vgsgpv/4sCOygrI9RPMdIGk+xqzUESHEawHoVXX4kIfTv2ZyWsH9m3FFWT SxLR1cU9ejWrACwmYqcN+m6zsKgK1hEC+vMwNny4GoYR1MfHPKONGr2sJ5UEEg/dv4Fa 1oYH5+Qr0Fw9ct2md2+kzENzhmalSUhA6HmVK5aKoGaZkFxZFXC4lf4KK0yRXouzyL/N sTfA9OITcxF84mXhN3q4brt2rcjF0E07xoEJsg+55erRte5YlIzjJN1vaMnj5W4sbpPA ilgA== X-Gm-Message-State: AMke39ngwbIsm52lRtjdRb87hZDpgLqmpNyhxOwZb2M3Tyjn6AyyFYceNky5my9bb2ACHVgDDIhCI1xJO/IDtA== X-Received: by 10.129.173.68 with SMTP id l4mr2234595ywk.351.1488559156153; Fri, 03 Mar 2017 08:39:16 -0800 (PST) MIME-Version: 1.0 Sender: asomers@gmail.com Received: by 10.129.38.133 with HTTP; Fri, 3 Mar 2017 08:39:15 -0800 (PST) In-Reply-To: <201703031411.v23EBUdM069969@pdx.rh.CN85.dnsmgr.net> References: <20170303092143.GM4503@server.rulingia.com> <201703031411.v23EBUdM069969@pdx.rh.CN85.dnsmgr.net> From: Alan Somers Date: Fri, 3 Mar 2017 09:39:15 -0700 X-Google-Sender-Auth: L-5vPRC_iJFswuRI1uojQjdZX9Q Message-ID: Subject: Re: effect of strip(1) on du(1) To: "Rodney W. Grimes" Cc: Peter Jeremy , freebsd-hackers , Subbsd , freebsd-current Current , Ngie Cooper Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Mar 2017 16:39:17 -0000 On Fri, Mar 3, 2017 at 7:11 AM, Rodney W. Grimes wrote: > -- Start of PGP signed section. > [ Charset ISO-8859-1 unsupported, converting... ] >> On 2017-Mar-02 22:19:10 -0800, "Rodney W. Grimes" wrote: >> >> du(1) is using fts_read(3), which is based on the stat(2) information. >> >> The OpenGroup defines st_blocksize as "Number of blocks allocated for >> >> this object." In the case of ZFS, a write(2) may return before any >> >> blocks are actually allocated. And thanks to compression, gang >> ... >> >My gut tells me that this is gona cause problems, is it ONLY >> >the st_blocksize data that is incorrect then not such a big >> >problem, or are we returning other meta data that is wrong? >> >> Note that it's st_blocks, not st_blocksize. > Yes, I just ignore that digretion, as well as the digretion into fts_read > being anything special about this, as it just ends up calling stat(2) in > the end anyway. > >> >> I did an experiment, writing a (roughly) 113MB file (some data I had >> lying around), close()ing it and then stat()ing it in a loop. This is >> FreeBSD 10.3 with ZFS and lz4 compression. Over the 26ms following the >> close(), st_blocks gradually rose from 24169 to 51231. It then stayed >> stable until 4.968s after the close, when st_blocks again started >> increasing until it stabilized after a total of 5.031s at 87483. Based >> on this, st_blocks reflects the actual number of blocks physically >> written to disk. None of the other fields in the struct stat vary. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Thank you for doing the proper regression test, that satisfies me that > we dont have a lattent bug sitting here and infact what we have is > exposure of the kernel caching, which I might be too thrilled about, > is just how its gona have to be. > >> >> The 5s delay is presumably the TXG delay (since this system is basically >> unloaded). I'm not sure why it writes roughly ? the data immediately >> and the rest as part of the next TXG write. >> >> >My expectactions of executing a stat(2) call on a file would >> >be that the data returned is valid and stable. I think almost >> >any program would expect that. >> >> I think a case could be made that st_blocks is a valid representation >> of "the number of blocks allocated for this object" - with the number >> increasing as the data is physically written to disk. As for it being >> stable, consider a (hypothetical) filesystem that can transparently >> migrate data between different storage media, with different compression >> algorithms etc (ZFS will be able to do this once the mythical block >> rewrite code is written). > > I could counter argue that st_blocks is: > st_blocks The actual number of blocks allocated for the file in > 512-byte units. > > Nothing in that says anything about "on disk". So while this thing > is sitting in memory on the TXG queue we should return the number of > 512 byte blocks used by the memory holding the data. > I think that would be the more correct thing than exposing the > fact this thing is setting in a write back cache to userland. > > -- > Rod Grimes rgrimes@freebsd.org "Transparent" does not mean "undetectable". For example, ZFS's transparent compression will affect the st_blocks reported for a file. I think the only sane use of st_blocks is to treat it as advisory. I've seen a lot of bugs caused by programmers assuming a certain mathematical relationship between the numbers presented by "df", "zfs list", etc. BTW, I've confirmed that ZFS on Illumos has the same behavior. A file's st_blocks doesn't stabilize until a few seconds after you write it. And it turns out that the fsync(1) doesn't work. This suggests that ZFS doesn't consider blocks in the ZIL when it reports st_blocks. -Alan