From owner-freebsd-arch@FreeBSD.ORG Fri Apr 15 16:42:43 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1864D1065675; Fri, 15 Apr 2011 16:42:43 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id C99D18FC14; Fri, 15 Apr 2011 16:42:42 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 63C1346B99; Fri, 15 Apr 2011 12:42:42 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id F0DDA8A02B; Fri, 15 Apr 2011 12:42:41 -0400 (EDT) From: John Baldwin To: Kostik Belousov Date: Fri, 15 Apr 2011 12:32:59 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; ) References: <20110415093057.GJ48734@deviant.kiev.zoral.com.ua> In-Reply-To: <20110415093057.GJ48734@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201104151232.59770.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Fri, 15 Apr 2011 12:42:42 -0400 (EDT) Cc: mdf@freebsd.org, Gleb Kurtsou , FreeBSD Arch Subject: Re: posix_fallocate(2) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Apr 2011 16:42:43 -0000 On Friday, April 15, 2011 5:30:57 am Kostik Belousov wrote: > On Thu, Apr 14, 2011 at 03:41:30PM -0700, mdf@freebsd.org wrote: > > On Thu, Apr 14, 2011 at 2:36 PM, Gleb Kurtsou wrote: > > > On (14/04/2011 12:35), mdf@FreeBSD.org wrote: > > >> For work we need a functionality in our filesystem that is pretty much > > >> like posix_fallocate(2), so we're using the name and I've added a > > >> default VOP_ALLOCATE definition that does the right, but dumb, thing. > > >> > > >> The most recent mention of this function in FreeBSD was another thread > > >> lamenting it's failure to exist: > > >> http://lists.freebsd.org/pipermail/freebsd-ports/2010- February/059268.html > > >> > > >> The attached files are the core of the kernel implementation of the > > >> syscall and a default VOP for any filesystem not supporting > > >> VOP_ALLOCATE, which allows the syscall to work as expected but in a > > >> non-performant manner. I didn't see this syscall in NetBSD or > > >> OpenBSD, so I plan to add it to the end of our syscall table. > > >> > > >> What I wanted to check with -arch about was: > > >> > > >> 1) is there still a desire for this syscall? > > > It looks not to play well architecturally with modern COW file systems > > > like ZFS and HUMMER. So potentially it can be implemented only for UFS. > > > > The syscall, or the dumb implementation? I don't see why the syscall > > itself would be a problem; presumably ZFS can figure out whether an > > fallocate() block is worth COWing or not... > > > > >> 2) is this naive implementation useful enough to serve as a default > > >> for all filesystems until someone with more knowledge fills them in? > > > Maillist ate the patch. Only man page attached. > > > > Whoops! > > > > http://people.freebsd.org/~mdf/bsd-fallocate.diff > > New syscall symbols for 9.0 should go in under FBSD_1.2 version, not FBSD_1.0. > > You have inconsistent spacing in the kern_posix_fallocate(). > > I do not quite understand the locking for vnode you did. > You marked the vop as taking and returning unlocked vnode. But, you > do call VOP_GETATTR in the vop std implementation before locking the vnode. > Did you tested with DEBUG_VFS_LOCKS config ? > > Usual (and proper) practice is to have such vop require locked vnode, in > case of VOP_ALLOCATE, exclusive lock is appropriate. The Giant dance and > vn_start_write() + vn_lock() go into kern_posix_fallocate() then. > Also, you should call bwillwrite() before taking any vfs locks. > > Is locking/unlocking the vnode in loop is done to allow other callers > to perform i/o on the vnode in between ? In particular, to truncate it ? > I think this is not needed, and previous suggestion would take care of it. > > Why do you need stdallocate_extend() ? VOP_WRITE does the right thing > with extending the vnode. > > You might find vn_rdwr easier to use then the bare vops. In particular, > it would not omit the mac calls for read/write. I agree with pretty much all of this esp. as regards the locking, etc. -- John Baldwin