From owner-freebsd-arch@FreeBSD.ORG Fri Apr 15 14:31:47 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2A1581065670; Fri, 15 Apr 2011 14:31:47 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 8E6148FC14; Fri, 15 Apr 2011 14:31:46 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 1532645CD9; Fri, 15 Apr 2011 16:31:45 +0200 (CEST) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 473B245684; Fri, 15 Apr 2011 16:31:39 +0200 (CEST) Date: Fri, 15 Apr 2011 16:31:30 +0200 From: Pawel Jakub Dawidek To: mdf@FreeBSD.org Message-ID: <20110415143130.GC4526@garage.freebsd.pl> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="iFRdW5/EC4oqxDHL" Content-Disposition: inline In-Reply-To: X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.6 required=4.5 tests=BAYES_00,RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: FreeBSD Arch Subject: Re: posix_fallocate(2) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Apr 2011 14:31:47 -0000 --iFRdW5/EC4oqxDHL Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Apr 14, 2011 at 12:35:34PM -0700, mdf@FreeBSD.org wrote: > For work we need a functionality in our filesystem that is pretty much > like posix_fallocate(2), so we're using the name and I've added a > default VOP_ALLOCATE definition that does the right, but dumb, thing. >=20 > The most recent mention of this function in FreeBSD was another thread > lamenting it's failure to exist: > http://lists.freebsd.org/pipermail/freebsd-ports/2010-February/059268.html >=20 > The attached files are the core of the kernel implementation of the > syscall and a default VOP for any filesystem not supporting > VOP_ALLOCATE, which allows the syscall to work as expected but in a > non-performant manner. I didn't see this syscall in NetBSD or > OpenBSD, so I plan to add it to the end of our syscall table. >=20 > What I wanted to check with -arch about was: >=20 > 1) is there still a desire for this syscall? > 2) is this naive implementation useful enough to serve as a default > for all filesystems until someone with more knowledge fills them in? > 3) are there any obvious bugs or missing elements? As I understand it you have two cases to consider: 1. The caller wants to reserve space in region that might be a hole, so we read and rewrite this region. 2. The caller wants to reserve space beyond file size. We need to write zeros there. For the first case I don't see a point in rewriting the block if it contains data that are not all-zeros. Hole can contain only zeros, so there is a place for optimization right there - skip write step if data is not all-zeros. Of course you need to know somehow what smallest block size file system uses. In case of ZFS overwriting hole with zeros won't reserve the space if you have compression turned on. All-zeros are turned into holes by ZFS internally when compression is on. The first case would be better implemented using SEEK_HOLE/SEEK_DATA, but those are not implemented yet in UFS, but will allow to find holes in the file and just overwrite them. You could entirely avoid reading and most of the writes in general purpose implementation. You could also add a flag to VFS_SET(9) to mark file systems that support holes. If file system doesn't support holes, first case might be skipped. For the second case I find it as a waste to first extend file size and then read those zeros. Why can't you just write zeros and avoid read step when you are extending file? --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --iFRdW5/EC4oqxDHL Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk2oVsEACgkQForvXbEpPzRXjQCgx0wDZsaeZUugBi9+sjYN+M4T wf8An2GK/pVsFb+Db/WUIGcttkvEruIi =N2pF -----END PGP SIGNATURE----- --iFRdW5/EC4oqxDHL--