From owner-freebsd-arch@FreeBSD.ORG Fri Apr 15 10:54:16 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 41754106566B; Fri, 15 Apr 2011 10:54:16 +0000 (UTC) (envelope-from gleb.kurtsou@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 74FD98FC13; Fri, 15 Apr 2011 10:54:14 +0000 (UTC) Received: by wwc33 with SMTP id 33so2935676wwc.31 for ; Fri, 15 Apr 2011 03:54:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=8dcHTGAT71oEefxStGUOdY4ienvaZPZyBkQILsZp/i4=; b=ex/0itZLPpUPNEGm4L+8nYleb/2Ist30TAHs4lm5A6rx5T8xZ4gsZOaKAbscf9+QFR rR1XhOXxiuVD5XLRqPHGwWOMvpnB/YJKuAmzY/Niglgp4oIecCt2/7MWASCorQVg7SBy ylOf8EXk0bxqbWM1fL8unaFTfyTYR3tb6ikBo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent; b=uDvCp3fMkfXk34H+rUDK7CfwaNz8eMjCX2tgObHOLW53gAZVh72Ze7Ypzv1+VOnyA5 DHJcUchHwr6ZVhi9oMUpqF9LiAQpYr4X0JTbXS/C0RQElmwaEm36aArMPtnOUOF41ScP xq22GIQt2hV2nISLOY7VSuKq8WhS60D6qMrrw= Received: by 10.227.0.140 with SMTP id 12mr1915517wbb.122.1302864854218; Fri, 15 Apr 2011 03:54:14 -0700 (PDT) Received: from localhost (lan-78-157-92-5.vln.skynet.lt [78.157.92.5]) by mx.google.com with ESMTPS id w12sm1537419wby.24.2011.04.15.03.54.12 (version=SSLv3 cipher=OTHER); Fri, 15 Apr 2011 03:54:13 -0700 (PDT) Date: Fri, 15 Apr 2011 13:54:09 +0300 From: Gleb Kurtsou To: mdf@FreeBSD.org Message-ID: <20110415105409.GA14344@tops> References: <20110414213610.GB92382@tops> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: FreeBSD Arch Subject: Re: posix_fallocate(2) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Apr 2011 10:54:16 -0000 On (14/04/2011 15:41), mdf@FreeBSD.org wrote: > On Thu, Apr 14, 2011 at 2:36 PM, Gleb Kurtsou wrote: > > On (14/04/2011 12:35), mdf@FreeBSD.org wrote: > >> For work we need a functionality in our filesystem that is pretty much > >> like posix_fallocate(2), so we're using the name and I've added a > >> default VOP_ALLOCATE definition that does the right, but dumb, thing. > >> > >> The most recent mention of this function in FreeBSD was another thread > >> lamenting it's failure to exist: > >> http://lists.freebsd.org/pipermail/freebsd-ports/2010-February/059268.html > >> > >> The attached files are the core of the kernel implementation of the > >> syscall and a default VOP for any filesystem not supporting > >> VOP_ALLOCATE, which allows the syscall to work as expected but in a > >> non-performant manner.  I didn't see this syscall in NetBSD or > >> OpenBSD, so I plan to add it to the end of our syscall table. > >> > >> What I wanted to check with -arch about was: > >> > >> 1) is there still a desire for this syscall? > > It looks not to play well architecturally with modern COW file systems > > like ZFS and HUMMER. So potentially it can be implemented only for UFS. > > The syscall, or the dumb implementation? I don't see why the syscall > itself would be a problem; presumably ZFS can figure out whether an > fallocate() block is worth COWing or not... It is good to have if there is a chance to get a real implementation for UFS. Having only dumb implementation will fool user software that we support it. As far as I understand ZFS caches large chunk of changes and than writes all of them at once. I doubt blocks can be preallocated. You preallocate block, it's marked as used in file systems meta data, changes to meta data are written to disk -- it results in inconsistency because preallocated block is marked as "used" in meta data and thus can't be overwritten. I might be absolutely wrong, ZFS experts are better answer this. Grepping reveals no fallocate support in ZFS. > >> 2) is this naive implementation useful enough to serve as a default > >> for all filesystems until someone with more knowledge fills them in? > > Maillist ate the patch. Only man page attached. > > Whoops! > > http://people.freebsd.org/~mdf/bsd-fallocate.diff What was performance impact on copying large files? I had sparse file support in PEFS implemented similar way. Performance was terrible, vm and buf caches where saturated first by writing huge chunks of zeros and than by mmap'ing and writing actual data. sched_yeld() and/or vnode lock/unlock didn't improve interactive performance either. Why wouldn't you just call VOP_SETATTR(newsize) in dumb implementation. File systems expect files such behavior, cp is using mmap for a while already. > > Cheers, > matthew