Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Apr 2011 22:28:10 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Matthew D Fleming <mdf@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r220791 - in head: lib/libc/sys sys/compat/freebsd32 sys/kern sys/sys
Message-ID:  <20110418192810.GX48734@deviant.kiev.zoral.com.ua>
In-Reply-To: <201104181632.p3IGWM5v077720@svn.freebsd.org>
References:  <201104181632.p3IGWM5v077720@svn.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--FT/vNgDLxVq4t/AU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Apr 18, 2011 at 04:32:22PM +0000, Matthew D Fleming wrote:
> Author: mdf
> Date: Mon Apr 18 16:32:22 2011
> New Revision: 220791
> URL: http://svn.freebsd.org/changeset/base/220791
>=20
> Log:
>   Add the posix_fallocate(2) syscall.  The default implementation in
>   vop_stdallocate() is filesystem agnostic and will run as slow as a
>   read/write loop in userspace; however, it serves to correctly
>   implement the functionality for filesystems that do not implement a
>   VOP_ALLOCATE.
>  =20
>   Note that __FreeBSD_version was already bumped today to 900036 for any
>   ports which would like to use this function.
>  =20
>   Also reserve space in the syscall table for posix_fadvise(2).
>  =20
>   Reviewed by:	-arch (previous version)
Thank you for taking the remarks into consideration.

> =20
> +int
> +vop_stdallocate(struct vop_allocate_args *ap)
> +{
> +#ifdef __notyet__
> +	struct statfs sfs;
> +#endif
> +	struct iovec aiov;
> +	struct vattr vattr, *vap;
> +	struct uio auio;
> +	off_t len, cur, offset;
> +	uint8_t *buf;
> +	struct thread *td;
> +	struct vnode *vp;
> +	size_t iosize;
> +	int error, locked;
> +
> +	buf =3D NULL;
> +	error =3D 0;
> +	locked =3D 1;
> +	td =3D curthread;
> +	vap =3D &vattr;
> +	vp =3D ap->a_vp;
> +	len =3D ap->a_len;
> +	offset =3D ap->a_offset;
> +
> +	error =3D VOP_GETATTR(vp, vap, td->td_ucred);
> +	if (error !=3D 0)
> +		goto out;
> +	iosize =3D vap->va_blocksize;
> +	if (iosize =3D=3D 0)
> +		iosize =3D BLKDEV_IOSIZE;
> +	if (iosize > MAXPHYS)
> +		iosize =3D MAXPHYS;
> +	buf =3D malloc(iosize, M_TEMP, M_WAITOK);
> +
> +#ifdef __notyet__
> +	/*
> +	 * Check if the filesystem sets f_maxfilesize; if not use
> +	 * VOP_SETATTR to perform the check.
> +	 */
> +	error =3D VFS_STATFS(vp->v_mount, &sfs, td);
> +	if (error !=3D 0)
> +		goto out;
> +	if (sfs.f_maxfilesize) {
> +		if (offset > sfs.f_maxfilesize || len > sfs.f_maxfilesize ||
> +		    offset + len > sfs.f_maxfilesize) {
> +			error =3D EFBIG;
> +			goto out;
> +		}
> +	} else
> +#endif
> +	if (offset + len > vap->va_size) {
> +		VATTR_NULL(vap);
> +		vap->va_size =3D offset + len;
> +		error =3D VOP_SETATTR(vp, vap, td->td_ucred);
> +		if (error !=3D 0)
> +			goto out;
> +	}
I still do not see a reason to do VOP_SETATTR() there. VOP_WRITE() will
do auto-extend as needed. Also, see below.

> +
> +	while (len > 0) {
> +		if (should_yield()) {
> +			VOP_UNLOCK(vp, 0);
> +			locked =3D 0;
> +			kern_yield(-1);
Please note that, despite dropping the vnode lock, the snapshot creation
is still blocked at this point, due to previous vn_start_write().

If doing vn_finished_write() there, then bwillwrite() before
next iteration is desired.
> +			error =3D vn_lock(vp, LK_EXCLUSIVE);
> +			if (error !=3D 0)
> +				break;
> +			locked =3D 1;
> +			error =3D VOP_GETATTR(vp, vap, td->td_ucred);
> +			if (error !=3D 0)
> +				break;
> +		}
> +
> +		/*
> +		 * Read and write back anything below the nominal file
> +		 * size.  There's currently no way outside the filesystem
> +		 * to know whether this area is sparse or not.
> +		 */
> +		cur =3D iosize;
> +		if ((offset % iosize) !=3D 0)
> +			cur -=3D (offset % iosize);
> +		if (cur > len)
> +			cur =3D len;
> +		if (offset < vap->va_size) {
> +			aiov.iov_base =3D buf;
> +			aiov.iov_len =3D cur;
> +			auio.uio_iov =3D &aiov;
> +			auio.uio_iovcnt =3D 1;
> +			auio.uio_offset =3D offset;
> +			auio.uio_resid =3D cur;
> +			auio.uio_segflg =3D UIO_SYSSPACE;
> +			auio.uio_rw =3D UIO_READ;
> +			auio.uio_td =3D td;
> +			error =3D VOP_READ(vp, &auio, 0, td->td_ucred);
> +			if (error !=3D 0)
> +				break;
> +			if (auio.uio_resid > 0) {
> +				bzero(buf + cur - auio.uio_resid,
> +				    auio.uio_resid);
> +			}
> +		} else {
> +			bzero(buf, cur);
> +		}
Wouldn't VOP_SETATTR() at the start of the function mostly prevent
this bzero from executing ?

> +
> +		aiov.iov_base =3D buf;
> +		aiov.iov_len =3D cur;
> +		auio.uio_iov =3D &aiov;
> +		auio.uio_iovcnt =3D 1;
> +		auio.uio_offset =3D offset;
> +		auio.uio_resid =3D cur;
> +		auio.uio_segflg =3D UIO_SYSSPACE;
> +		auio.uio_rw =3D UIO_WRITE;
> +		auio.uio_td =3D td;
> +
> +		error =3D VOP_WRITE(vp, &auio, 0, td->td_ucred);
> +		if (error !=3D 0)
> +			break;
> +
> +		len -=3D cur;
> +		offset +=3D cur;
> +	}
> +
> + out:
> +	KASSERT(locked || error !=3D 0, ("How'd I get unlocked with no error?")=
);
> +	if (locked && error !=3D 0)
> +		VOP_UNLOCK(vp, 0);
> +	free(buf, M_TEMP);
> +	return (error);
> +}

I estimated what it would take to do the optimized implementation for UFS,
and I think that the following change would allow to lessen the code
duplication much.

What if the vnode lock drop and looping be handled by the syscall, instead
of the vop implementation ? In other words, allow the VOP_ALLOCATE()
to  allocate less then requested, and return the allocated amount to
the caller. The loop would be centralized then, freeing fs from doing
the dance. Also, if fs considers that suitable, it would do a whole
allocation in one run.

--FT/vNgDLxVq4t/AU
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk2skMoACgkQC3+MBN1Mb4jHIwCgpCqlFUXXeEEMX1WNBJN+ulTm
Rq0AnRKY8W0QQDL+ehilruivUg3J/Ot1
=+QkD
-----END PGP SIGNATURE-----

--FT/vNgDLxVq4t/AU--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110418192810.GX48734>