From owner-freebsd-hackers@freebsd.org Tue Dec 8 16:43:04 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D11189D37DB for ; Tue, 8 Dec 2015 16:43:04 +0000 (UTC) (envelope-from des@des.no) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id 62D151DAC for ; Tue, 8 Dec 2015 16:43:04 +0000 (UTC) (envelope-from des@des.no) Received: from desk.des.no (smtp.des.no [194.63.250.102]) by smtp.des.no (Postfix) with ESMTP id 167DED3A8; Tue, 8 Dec 2015 16:43:03 +0000 (UTC) Received: by desk.des.no (Postfix, from userid 1001) id 9B8F1482D4; Tue, 8 Dec 2015 17:42:58 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Warner Losh Cc: "freebsd-hackers\@freebsd.org" Subject: Re: Fwd: DELETE support in the VOP_STRATEGY(9)? References: <201512052002.tB5K2ZEA026540@chez.mckusick.com> <86poyhqsdh.fsf@desk.des.no> <86fuzdqjwn.fsf@desk.des.no> Date: Tue, 08 Dec 2015 17:42:58 +0100 In-Reply-To: (Warner Losh's message of "Tue, 8 Dec 2015 08:43:33 -0700") Message-ID: <864mfssxgt.fsf@desk.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Dec 2015 16:43:05 -0000 Warner Losh writes: > Dag-Erling Sm=C3=B8rgrav writes: > > Here are a few of our options for implementing FALLOC_FL_PUNCH_HOLE: > > > > a) create a filesystem-level hole in the disk image; > > b) perform a), then issue a BIO_DELETE for the blocks that were > > released; > > c) perform a) or b), then zero the overspill if the requested range is > > unaligned; > > d) zero the entire range; > > e) perform d) followed by either a) or b); > > f) nothing at all. > I don't think f is an option. Unless it is OK to have random contents > after creating a file and seeking some ways into and writing a > byte. When you punch a hole in the file, you should get the same > semantics as if you'd written up to just before the hole originally, > then skipped to the end of the punched range and written the rest of > the file. I didn't realize there was a spec, so I didn't know what the intended semantics were. > You are correct, though, that the decision to issue a BIO_DELETE is > between the filesystem and the storage device. This makes a-e possible > implementations, but some are stupider than others (which ones depend > on the situation). Each of them except f) is the optimal solution for at least one of the 36 cases I outlined, or 18 if you ignore the zvol and device points on the first axis. > > Discuss the advantages and drawbacks of each option I listed above > > for each of the 36 points in the space defined by the following > > axes: > > [...] > > If you think the answer is the same in all cases, you are deluded. > That's why these decisions are left to the stack. Define "stack". Do you mean the entire food chain from the hardware to the POSIX filesystem API? By design, no element in the stack has any knowledge of any other element, beyond the names and dimensions of its immediate consumers and suppliers (I find "producer" ambiguous). > The only semantic that is required by the punch hole operation is that > the filesystem return 0's on reads to that range. What the filesystem > does to ensure this is up to the filesystem. That's easy to say, but each option has advantages and disadvantages depending on information which is not necessarily available where it is needed. A filesystem-level hole results in fragmentation, which can have a huge performance impact on electromechanical storage but is negligible on solid-state storage. But the filesystem does not know whether the underlying storage is electromechanical or solid-state, nor does it know whether the user cares much about seek times (unless we introduce the heuristic "avoid creating holes unless the file already has them, in which case the userland probably does not care"). Then again, either the filesystem or the underlying storage *or both* may have copy-on-write semantics, in which case zeroing is worse than creating a hole. BTW, writing zeroes to NAND flash does not require erasing the block. I don't know whether SSDs take advantage of that to avoid unnecessarily reallocating or erasing a block, nor whether they automatically release and erase blocks that end up being completely zeroed. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no