From owner-freebsd-fs@FreeBSD.ORG  Sun Dec  2 03:57:13 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BB2A816A417;
	Sun,  2 Dec 2007 03:57:11 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au
	[211.29.132.194])
	by mx1.freebsd.org (Postfix) with ESMTP id 40B3313C4D5;
	Sun,  2 Dec 2007 03:57:11 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au
	(c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213])
	by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	lB23v5b1010753
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 2 Dec 2007 14:57:08 +1100
Date: Sun, 2 Dec 2007 14:57:05 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Don Lewis <truckman@FreeBSD.org>
In-Reply-To: <200712012214.lB1MEl2Q015881@gw.catspoiler.org>
Message-ID: <20071202132955.M18602@delplex.bde.org>
References: <200712012214.lB1MEl2Q015881@gw.catspoiler.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@FreeBSD.org
Subject: Re: File remove problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Dec 2007 03:57:13 -0000

On Sat, 1 Dec 2007, Don Lewis wrote:

> On  2 Dec, Bruce Evans wrote:
>
>> Here is a non-hackish patch which explains why ignoring MNT_RDONLY in
>> the above or in ffs_mount() helps.  It just fixes the confusion between
>> IN_MODIFIED and IN_CHANGE in critical places.
>>
>> % Index: ffs_softdep.c

[All settings here, but not yet in ufs_inactive() and ffs_truncate(),
which are critical, and in other places which might not be critical.]

>> Without this change, soft updates depends on IN_CHANGE being converted
>> to IN_MODIFIED by ufs_itimes(), but this conversion doesn't happen
>> when MNT_RDONLY is set.  With soft updates, changes are often delayed
>> until sync time, and when the sync is for mount-update it is done after
>> setting MNT_RDONLY so the above doesn't work.
>
> ufs_itimes() should probably also be looking at fs_ronly instead of
> MNT_RDONLY, *but* all the paths leading from userland to ufs_itimes()
> would need to be checked to verify that they check MNT_RDONLY to prevent
> new file system write operations from happening while the remount is in
> progress.

Yes, that is probably why MNT_RDONLY is (ab)used now.

I found old (Y2002) private mail from mckusick that explains a previous
change in this area, a change that mostly avoided the problem but has
been lost:

% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
% Working file: ufs_vnops.c
% ----------------------------
% revision 1.182
% date: 2002/01/15 07:17:12;  author: mckusick;  state: Exp;  lines: +4 -5
% When downgrading a filesystem from read-write to read-only, operations
% involving file removal or file update were not always being fully
% committed to disk. The result was lost files or corrupted file data.
% This change ensures that the filesystem is properly synced to disk
% before the filesystem is down-graded.
% 
% This delta also fixes a long standing bug in which a file open for
% reading has been unlinked. When the last open reference to the file
% is closed, the inode is reclaimed by the filesystem. Previously,
% if the filesystem had been down-graded to read-only, the inode could
% not be reclaimed, and thus was lost and had to be later recovered
% by fsck.  With this change, such files are found at the time of the
% down-grade.  Normally they will result in the filesystem down-grade
% failing with `device busy'. If a forcible down-grade is done, then
% the affected files will be revoked causing the inode to be released
% and the open file descriptors to begin failing on attempts to read.
% 
% Submitted by:	"Sam Leffler" <sam@errno.com>
% ----------------------------
% 
% Index: ufs_vnops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
% retrieving revision 1.181
% retrieving revision 1.182
% diff -u -2 -r1.181 -r1.182
% --- ufs_vnops.c	22 Nov 2001 15:33:12 -0000	1.181
% +++ ufs_vnops.c	15 Jan 2002 07:17:12 -0000	1.182
% @@ -37,5 +37,5 @@
%   *
%   *	@(#)ufs_vnops.c	8.27 (Berkeley) 5/27/95
% - * $FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.181 2001/11/22 15:33:12 guido Exp $
% + * $FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.182 2002/01/15 07:17:12 mckusick Exp $
%   */
% 
% @@ -159,11 +159,10 @@
%  	if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
%  		return;
% +	if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp))
% +		ip->i_flag |= IN_LAZYMOD;
% +	else
% +		ip->i_flag |= IN_MODIFIED;
%  	if ((vp->v_mount->mnt_flag & MNT_RDONLY) == 0) {
%  		vfs_timestamp(&ts);
% -		if ((vp->v_type == VBLK || vp->v_type == VCHR) &&
% -		    !DOINGSOFTDEP(vp))
% -			ip->i_flag |= IN_LAZYMOD;
% -		else
% -			ip->i_flag |= IN_MODIFIED;
%  		if (ip->i_flag & IN_ACCESS) {
%  			ip->i_atime = ts.tv_sec;

This is in ufs_itimes().  Note that it moves the setting of the modified
flag before the check of MNT_RDONLY, so that when the wrong or incomplete
flags are set earlier and the wrong flags aren't converted to the
modified flag before MNT_RDONLY is set, then we only lose the timestamps
but not critical updates here.

% Index: ffs_inode.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_inode.c,v
% retrieving revision 1.73
% retrieving revision 1.74
% diff -u -2 -r1.73 -r1.74
% --- ffs_inode.c	13 Dec 2001 05:07:48 -0000	1.73
% +++ ffs_inode.c	15 Jan 2002 07:17:12 -0000	1.74
% @@ -32,5 +32,5 @@
%   *
%   *	@(#)ffs_inode.c	8.13 (Berkeley) 4/21/95
% - * $FreeBSD: src/sys/ufs/ffs/ffs_inode.c,v 1.73 2001/12/13 05:07:48 mckusick Exp $
% + * $FreeBSD: src/sys/ufs/ffs/ffs_inode.c,v 1.74 2002/01/15 07:17:12 mckusick Exp $
%   */
% 
% @@ -88,7 +88,7 @@
%  		return (0);
%  	ip->i_flag &= ~(IN_LAZYMOD | IN_MODIFIED);
% -	if (vp->v_mount->mnt_flag & MNT_RDONLY)
% -		return (0);
%  	fs = ip->i_fs;
% +	if (fs->fs_ronly)
% +		return (0);
%  	/*
%  	 * Ensure that uid and gid are correct. This is a temporary

This fixes loss of the critical updates a little later in ffs_update().

% @@ -153,4 +153,6 @@
%  	oip = VTOI(ovp);
%  	fs = oip->i_fs;
% +	if (fs->fs_ronly)
% +		panic("ffs_truncate: read-only filesystem");
%  	if (length < 0)
%  		return (EINVAL);

This is a sanity check in ffs_truncate().  I think all callers except the
one in ufs_inactive() automatically pass the check, since they are higher
level so they do a correct check of MNT_RDONLY.  The call in ufs_inactive()
used to be unconditional (except for the (i_nlink == 0) condition of course).
In -current it is conditional on MNT_ONLY.  kib's patch changes it to be
conditional on fs_ronly, but I hope it can become unconditional again --
it should be an error to reach ufs_inactive() with a partially deleted
file after syncing before changing fs_ronly to 0.

ffs_update() should panic instead of returning 0 when (fs->fs_ronly)
too, so that bugs get noticed.

mckusick's explanation says that "[fs_ronly is the only believable flag].
Thus the killing of IN_MODIFIED has to happen a few lines later [in]
ffs_update()".

It should be safe to blindly ignore all modification flags except IN_ACCESS
in ufs_itimes(), since ffs_update() will kill the completely invalid ones.

4.4BSD-Lite blindly ignored all modification flags in ITIMES(), and
checks the wrong read-only flag (MNT_RDONLY) in open code that duplicates
ITIMES() plus adds the wrong r/o check.  When I converted ITIMES() to
ufs_itimes(), I centralized this wrong r/o check.  This was mainly a
cleanup, but I think it fixes wrong setting of atimes (IN_ATIME is set
without checking any r/o flags, and IN_ATIME was sometimes converted
into a timestamp before ffs_update killed it, so applications could
see atime changing even on purely r/o mounted file systems).

The fix in ufs_vnops.c was lost relatively recently as part of related
changes to fix IN_ACCESS for non-exclusively-locked reads:

% Index: ufs_vnops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
% retrieving revision 1.279
% retrieving revision 1.280
% diff -u -2 -r1.279 -r1.280
% --- ufs_vnops.c	2 Oct 2006 02:08:31 -0000	1.279
% +++ ufs_vnops.c	10 Oct 2006 09:20:54 -0000	1.280
% @@ -36,5 +36,5 @@
% 
%  #include <sys/cdefs.h>
% -__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.279 2006/10/02 02:08:31 tegge Exp $");
% +__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.280 2006/10/10 09:20:54 kib Exp $");
% 
%  #include "opt_mac.h"
% @@ -129,29 +129,47 @@
%  	struct inode *ip;
%  	struct timespec ts;
% +	int mnt_locked;
% 
%  	ip = VTOI(vp);
% +	mnt_locked = 0;
% +	if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) {
% +		VI_LOCK(vp);
% +		goto out;
% +	}
% +	MNT_ILOCK(vp->v_mount);		/* For reading of mnt_kern_flags. */
% +	mnt_locked = 1;
% +	VI_LOCK(vp);
%  	if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
% -		return;
% +		goto out_unl;
% +
% 
%  		ip->i_flag |= IN_LAZYMOD;
% -	else
% +	else if (((vp->v_mount->mnt_kern_flag &
% +		    (MNTK_SUSPENDED | MNTK_SUSPEND)) == 0) ||
% +		    (ip->i_flag & (IN_CHANGE | IN_UPDATE)))
%  		ip->i_flag |= IN_MODIFIED;
% -	if ((vp->v_mount->mnt_flag & MNT_RDONLY) == 0) {
% -		vfs_timestamp(&ts);
% -		if (ip->i_flag & IN_ACCESS) {
% -			DIP_SET(ip, i_atime, ts.tv_sec);
% -			DIP_SET(ip, i_atimensec, ts.tv_nsec);
% -		}
% -		if (ip->i_flag & IN_UPDATE) {
% -			DIP_SET(ip, i_mtime, ts.tv_sec);
% -			DIP_SET(ip, i_mtimensec, ts.tv_nsec);
% -			ip->i_modrev++;
% -		}
% -		if (ip->i_flag & IN_CHANGE) {
% -			DIP_SET(ip, i_ctime, ts.tv_sec);
% -			DIP_SET(ip, i_ctimensec, ts.tv_nsec);
% -		}
% +	else if (ip->i_flag & IN_ACCESS)
% +		ip->i_flag |= IN_LAZYACCESS;
% +	vfs_timestamp(&ts);
% +	if (ip->i_flag & IN_ACCESS) {
% +		DIP_SET(ip, i_atime, ts.tv_sec);
% +		DIP_SET(ip, i_atimensec, ts.tv_nsec);
% +	}
% +	if (ip->i_flag & IN_UPDATE) {
% +		DIP_SET(ip, i_mtime, ts.tv_sec);
% +		DIP_SET(ip, i_mtimensec, ts.tv_nsec);
% +		ip->i_modrev++;
% +	}
% +	if (ip->i_flag & IN_CHANGE) {
% +		DIP_SET(ip, i_ctime, ts.tv_sec);
% +		DIP_SET(ip, i_ctimensec, ts.tv_nsec);
%  	}
% +
% + out:
%  	ip->i_flag &= ~(IN_ACCESS | IN_CHANGE | IN_UPDATE);
% + out_unl:
% +	VI_UNLOCK(vp);
% +	if (mnt_locked)
% +		MNT_IUNLOCK(vp->v_mount);
%  }
%

Now we trust MNT_RDONLY again, so that when the wrong or incomplete
flags are set earlier and the wrong flags aren't converted to the
modified flag before MNT_RDONLY is set, then we only lose both timestamps
and critical updates here.

I think the best quick fix would be to trust fs_ronly here, except for
IN_ACCESS.  Then wrong IN_CHANGE | IN_UPDATE flags would just cause
wrong updates of i_ctime and i_mtime, and an extra i/o to write these
changes, but missing IN_MODIFIED flags would be fixed up as in rev.1.182
and associated correct IN_CHANGE | IN_UPDATE flags wouldn't be incorrectly
discarded.  IN_ACCESS needs special handling even in the non-snapshot
cases so that read() doesn't race mount-update (at best, read()s might
keep dirtying inodes, so there would be a problem setting fs_ronly
atomically with completing the sync).

Most other file systems are primitive or broken in this area.  ext2fs is
at the level of ufs_vnops.c 1.182.  msdosfs is at level before 4.4BSD-Lite
(it still uses its clone of ITIMES() and looks more like the Net/2 ffs
than the 4.4BSD one).  But most other file systems are Giant-locked, so
they don't have the IN_ACCESS races.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Sun Dec  2 06:14:39 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8372016A420;
	Sun,  2 Dec 2007 06:14:39 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197])
	by mx1.freebsd.org (Postfix) with ESMTP id 2CC8C13C448;
	Sun,  2 Dec 2007 06:14:39 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from [212.82.216.226] (helo=deviant.kiev.zoral.com.ua)
	by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256)
	(Exim 4.67) (envelope-from <kostikbel@gmail.com>)
	id 1IyhrP-0004Dy-2L; Sun, 02 Dec 2007 07:59:23 +0200
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.1/8.14.1) with ESMTP id
	lB25xK8m085144; Sun, 2 Dec 2007 07:59:20 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id lB25xJfN085143; 
	Sun, 2 Dec 2007 07:59:19 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Sun, 2 Dec 2007 07:59:19 +0200
From: Kostik Belousov <kostikbel@gmail.com>
To: Don Lewis <truckman@freebsd.org>
Message-ID: <20071202055919.GR83121@deviant.kiev.zoral.com.ua>
References: <20071201215706.B12006@besplex.bde.org>
	<200712012207.lB1M7oNg015468@gw.catspoiler.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="+Yg8W10oK6rlW0RR"
Content-Disposition: inline
In-Reply-To: <200712012207.lB1M7oNg015468@gw.catspoiler.org>
User-Agent: Mutt/1.4.2.3i
X-Scanner-Signature: efe3bc1504fa577d936f03857ce6b673
X-DrWeb-checked: yes
X-SpamTest-Envelope-From: kostikbel@gmail.com
X-SpamTest-Group-ID: 00000000
X-SpamTest-Info: Profiles 1838 [Dec 01 2007]
X-SpamTest-Info: helo_type=3
X-SpamTest-Info: {received from trusted relay: not dialup}
X-SpamTest-Method: none
X-SpamTest-Method: Local Lists
X-SpamTest-Rate: 0
X-SpamTest-Status: Not detected
X-SpamTest-Status-Extended: not_detected
X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release
Cc: freebsd-fs@freebsd.org
Subject: Re: File remove problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Dec 2007 06:14:39 -0000


--+Yg8W10oK6rlW0RR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Dec 01, 2007 at 02:07:50PM -0800, Don Lewis wrote:
> On  1 Dec, Bruce Evans wrote:
> > On Sat, 1 Dec 2007, Kostik Belousov wrote:
>=20
> >> +static int
> >> +ffs_isronly(struct ufsmount *ump)
> >> +{
> >> +	struct fs *fs =3D ump->um_fs;
> >> +
> >> +	return (fs->fs_ronly);
> >> +}
> >> +
> >=20
> > Could be ump->um_fs->fs_ronly.
>=20
> That's the change that I would have made.  A #include for <ufs/ffs/fs.h>
> would have to be added, which some might argue would be a layering
> violation.  I'd prefer to avoid the extra indirection.

I would argue that the ufs already knows too much about the ffs. But,
this seems to be the first explicit reference to the ffs from the ufs
code. With your approval, see below.

diff --git a/sys/ufs/ufs/ufs_inode.c b/sys/ufs/ufs/ufs_inode.c
index 448f436..22e29e9 100644
--- a/sys/ufs/ufs/ufs_inode.c
+++ b/sys/ufs/ufs/ufs_inode.c
@@ -60,6 +60,7 @@ __FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.69 20=
07/06/22 13:22:37 kib E
 #ifdef UFS_GJOURNAL
 #include <ufs/ufs/gjournal.h>
 #endif
+#include <ufs/ffs/fs.h>
=20
 /*
  * Last reference to an inode.  If necessary, write or delete it.
@@ -90,8 +91,7 @@ ufs_inactive(ap)
 	ufs_gjournal_close(vp);
 #endif
 	if ((ip->i_effnlink =3D=3D 0 && DOINGSOFTDEP(vp)) ||
-	    (ip->i_nlink <=3D 0 &&
-	     (vp->v_mount->mnt_flag & MNT_RDONLY) =3D=3D 0)) {
+	    (ip->i_nlink <=3D 0 && !VFSTOUFS(mp)->um_fs->fs_ronly)) {
 	loop:
 		if (vn_start_secondary_write(vp, &mp, V_NOWAIT) !=3D 0) {
 			/* Cannot delete file while file system is suspended */
@@ -121,7 +121,7 @@ ufs_inactive(ap)
 	}
 	if (ip->i_effnlink =3D=3D 0 && DOINGSOFTDEP(vp))
 		softdep_releasefile(ip);
-	if (ip->i_nlink <=3D 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) =3D=3D 0) {
+	if (ip->i_nlink <=3D 0 && !VFSTOUFS(mp)->um_fs->fs_ronly) {
 #ifdef QUOTA
 		if (!getinoquota(ip))
 			(void)chkiq(ip, -1, NOCRED, FORCE);

--+Yg8W10oK6rlW0RR
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQFHUkm3C3+MBN1Mb4gRAiKBAJ9eJ0tNa94jZv9Aav5edNLWaiQsdwCg3GCV
O1SqQzn7h0lN0eNywv/0qYg=
=L+Fh
-----END PGP SIGNATURE-----

--+Yg8W10oK6rlW0RR--

From owner-freebsd-fs@FreeBSD.ORG  Sun Dec  2 08:35:41 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B3B5C16A419;
	Sun,  2 Dec 2007 08:35:41 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au
	[211.29.132.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 3387113C458;
	Sun,  2 Dec 2007 08:35:41 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au
	[211.30.219.213])
	by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	lB28ZaGK023561
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 2 Dec 2007 19:35:38 +1100
Date: Sun, 2 Dec 2007 19:35:36 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Kostik Belousov <kostikbel@gmail.com>
In-Reply-To: <20071202055919.GR83121@deviant.kiev.zoral.com.ua>
Message-ID: <20071202183809.I1560@besplex.bde.org>
References: <20071201215706.B12006@besplex.bde.org>
	<200712012207.lB1M7oNg015468@gw.catspoiler.org>
	<20071202055919.GR83121@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org, Don Lewis <truckman@freebsd.org>
Subject: Re: File remove problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Dec 2007 08:35:41 -0000

On Sun, 2 Dec 2007, Kostik Belousov wrote:

> On Sat, Dec 01, 2007 at 02:07:50PM -0800, Don Lewis wrote:
>> On  1 Dec, Bruce Evans wrote:
>>> On Sat, 1 Dec 2007, Kostik Belousov wrote:
>>
>>>> +static int
>>>> +ffs_isronly(struct ufsmount *ump)
>>>> +{
>>>> +	struct fs *fs = ump->um_fs;
>>>> +
>>>> +	return (fs->fs_ronly);
>>>> +}
>>>> +
>>>
>>> Could be ump->um_fs->fs_ronly.
>>
>> That's the change that I would have made.  A #include for <ufs/ffs/fs.h>
>> would have to be added, which some might argue would be a layering
>> violation.  I'd prefer to avoid the extra indirection.
>
> I would argue that the ufs already knows too much about the ffs. But,
> this seems to be the first explicit reference to the ffs from the ufs
> code. With your approval, see below.

It's more like the fourth:
- ufs_itimes() is a layering violation.  However, with both ffs and ufs
   needing to set timestamps (for ffs, only in ffs_update()), and with
   both ffs and ufs both needing to set IN_* all over the place, it isn't
   clear which layer timstamps belong in.
- ufs_vnops.c now includes ffs_extern.h for some reason (5.2 didn't).
- ufs_gjournal.c includes both ffs_extern.h and fs.h.  It uses ip->i_fs
   a lot to access the superblock in ufs_gjournal_modref().

> diff --git a/sys/ufs/ufs/ufs_inode.c b/sys/ufs/ufs/ufs_inode.c
> index 448f436..22e29e9 100644
> --- a/sys/ufs/ufs/ufs_inode.c
> +++ b/sys/ufs/ufs/ufs_inode.c
> @@ -60,6 +60,7 @@ __FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.69 2007/06/22 13:22:37 kib E
> #ifdef UFS_GJOURNAL
> #include <ufs/ufs/gjournal.h>
> #endif
> +#include <ufs/ffs/fs.h>
>
> /*
>  * Last reference to an inode.  If necessary, write or delete it.

ufs/ffs includes are conventionally separated from ufs/ufs includes by a
blank line.  About 2/3 of the files in ufs/ffs follow this convention.

> @@ -90,8 +91,7 @@ ufs_inactive(ap)
> 	ufs_gjournal_close(vp);
> #endif
> 	if ((ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) ||
> -	    (ip->i_nlink <= 0 &&
> -	     (vp->v_mount->mnt_flag & MNT_RDONLY) == 0)) {
> +	    (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly)) {
> 	loop:
> 		if (vn_start_secondary_write(vp, &mp, V_NOWAIT) != 0) {
> 			/* Cannot delete file while file system is suspended */
> @@ -121,7 +121,7 @@ ufs_inactive(ap)
> 	}
> 	if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp))
> 		softdep_releasefile(ip);
> -	if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == 0) {
> +	if (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly) {
> #ifdef QUOTA
> 		if (!getinoquota(ip))
> 			(void)chkiq(ip, -1, NOCRED, FORCE);
>

Should be ip->i_fs->fs_ronly.

The locking for fs_ronly is unclear.  It seems to be locked mainly by
vn_start_write(), and that enough for everything except probably access
time changes.

I've now tested the following similar change in ufs_itimes() after
removing all other related fixes.

% Index: ufs_vnops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
% retrieving revision 1.293
% diff -u -2 -r1.293 ufs_vnops.c
% --- ufs_vnops.c	8 Nov 2007 17:21:51 -0000	1.293
% +++ ufs_vnops.c	2 Dec 2007 04:56:58 -0000
% @@ -89,4 +89,5 @@
%  #endif
% 
% +#include <ufs/ffs/fs.h>
%  #include <ufs/ffs/ffs_extern.h>
% 
% @@ -137,8 +138,38 @@
% 
%  	ip = VTOI(vp);
% +	/*
% +	 * MNT_RDONLY can barely be trusted here.  Full r/o mode is indicated
% +	 * by fs_ronly, and the MNT_RDONLY setting [should] differ from the
% +	 * fs_ronly setting only during transition from r/w mode to r/o mode.
% +	 * We set IN_ACCESS even in full r/o mode, so we must discard it
% +	 * unconditionally here.  During the transition, we must convert
% +	 * IN_CHANGE | IN_UPDATE to IN_MODIFIED, since some callers neglect
% +	 * to set IN_MODIFIED.  We also set the timestamps indicated by
% +	 * IN_CHANGE | IN_UPDATE normally during the transition, since the
% +	 * update marks may have been set correctly before the transition and
% +	 * not yet converted into timestamps.  Callers that set IN_CHANGE |
% +	 * IN_UPDATE during the transition are buggy, since userland writes
% +	 * are supposed to be denied (by MNT_RDONLY checks) during the
% +	 * transition, while kernel writes should should only be for syncs
% +	 * and syncs should not touch timestamps except to convert old
% +	 * update marks to timestamps.  Callers that set any update mark or
% +	 * modification flag except IN_ACCESS while in full r/o mode are
% +	 * broken; we will panic for them later.
% +	 */
%  	if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0)
% -		goto out;
% +		ip->i_flag &= ~IN_ACCESS;
%  	if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
%  		return;
% +	if (ip->i_fs->fs_ronly) {	/* XXX locking? */
% +		vprint("ufs_itimes_locked: r/o mod", vp);
% +		/*
% +		 * Should panic here, or return and let ffs_update() panic.
% +		 * The fs_ronly check in ffs_update() is now almost redundant
% +		 * and should not succeed, so it should be replaced by a
% +		 * panic.  It detects more invariants failures than we detect
% +		 * here.
% +		 */
% +		goto out;
% +	}
% 
%  	if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp))

The comments in this are too verbose.

This seems to work for "rm a; mount -u -o ro /f" and "mv a b; mount ...",
but since this is without your change to ufs_inactive(), I'm now surprised
that it works.  I think it works for the simple test cases as follows:

- there are no unlinked open files, so ufs_inactive() should have already
   set up all the needed i/o, and the sync for the mount-update should
   finish that i/o.

- however, in ufs_inactive() seems to be called as part of the sync, and
   in ~5.2 where it doesn't do any read-only checks, it seems to call
   ffs_truncate().  This truncate should be null (in the simple test
   cases), but it seems to have the side effect of generating more i/o
   (I hope just to convert bogus settings of IN_CHANGE | IN_UPDATE into
   dinode changes).  Then other bugs cause an inconsistent fs if MNT_RDONLY
   is set.

   BTW, the other bugs don't affect plain 5.2, since it still has
   rev.1.182 of ufs_vnops.c to convert the bogus settings of IN_CHANGE |
   IN_UPDATE into IN_MODIFIED.  I was "lucky" to see almost the same
   bugs in ~5.2 as in -current because I have debugging code in
   ffs_update() instead of rev.1.182 in ufs_vnops.c, but the debugging
   code showed too many apparently-harmless problems so it was turned
   off.

In cases involving unlinked open files, the truncation has to be delayed
until the sync.  Things seem to work reasonably:  If a file on the fs
is open for read, then mount-update from rw to ro is allowed unless the
file is unlinked; if the file is unlinked then there is an EBUSY error
unless MNT_FORCE is used, but if MNT_FORCE is used, then the mount-update
must be allowed to complete and this involves truncating and otherwise
completing the removal of unlinked open files.  In -current, your patch
should make this work again, and with only my patch above the
"update error: blocks 32: files 1" is back because ufs_inactive() doesn't
do the truncation.

I don't understand how WRITECLOSE inter-operates with this -- mount-update
always sets it but there is still an EBUSY error unless MNT_FORCE is
used, while MNT_FORCE should kill all opens.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Sun Dec  2 09:07:30 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4F68816A420;
	Sun,  2 Dec 2007 09:07:30 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail14.syd.optusnet.com.au (mail14.syd.optusnet.com.au
	[211.29.132.195])
	by mx1.freebsd.org (Postfix) with ESMTP id C482213C44B;
	Sun,  2 Dec 2007 09:07:29 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au
	[211.30.219.213])
	by mail14.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	lB297Hbr028642
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 2 Dec 2007 20:07:22 +1100
Date: Sun, 2 Dec 2007 20:07:07 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20071202183809.I1560@besplex.bde.org>
Message-ID: <20071202193924.P1745@besplex.bde.org>
References: <20071201215706.B12006@besplex.bde.org>
	<200712012207.lB1M7oNg015468@gw.catspoiler.org>
	<20071202055919.GR83121@deviant.kiev.zoral.com.ua>
	<20071202183809.I1560@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org, Don Lewis <truckman@freebsd.org>
Subject: Re: File remove problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Dec 2007 09:07:30 -0000

A second reply.  Sorry for so many.

On Sun, 2 Dec 2007, Bruce Evans wrote:

> On Sun, 2 Dec 2007, Kostik Belousov wrote:

>> I would argue that the ufs already knows too much about the ffs. But,
>> this seems to be the first explicit reference to the ffs from the ufs
>> code. With your approval, see below.

>> diff --git a/sys/ufs/ufs/ufs_inode.c b/sys/ufs/ufs/ufs_inode.c
>> index 448f436..22e29e9 100644
>> --- a/sys/ufs/ufs/ufs_inode.c
>> +++ b/sys/ufs/ufs/ufs_inode.c
>> @@ -60,6 +60,7 @@ __FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.69 
>> 2007/06/22 13:22:37 kib E
>> #ifdef UFS_GJOURNAL
>> #include <ufs/ufs/gjournal.h>
>> #endif
>> +#include <ufs/ffs/fs.h>
>> 
>> /*
>>  * Last reference to an inode.  If necessary, write or delete it.
>
> ufs/ffs includes are conventionally separated from ufs/ufs includes by a
> blank line.  About 2/3 of the files in ufs/ffs follow this convention.
>
>> @@ -90,8 +91,7 @@ ufs_inactive(ap)
>> 	ufs_gjournal_close(vp);
>> #endif
>> 	if ((ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) ||
>> -	    (ip->i_nlink <= 0 &&
>> -	     (vp->v_mount->mnt_flag & MNT_RDONLY) == 0)) {
>> +	    (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly)) {
>> 	loop:
>> 		if (vn_start_secondary_write(vp, &mp, V_NOWAIT) != 0) {
>> 			/* Cannot delete file while file system is suspended 
>> */
>> @@ -121,7 +121,7 @@ ufs_inactive(ap)
>> 	}
>> 	if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp))
>> 		softdep_releasefile(ip);
>> -	if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == 0) {
>> +	if (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly) {
>> #ifdef QUOTA
>> 		if (!getinoquota(ip))
>> 			(void)chkiq(ip, -1, NOCRED, FORCE);
>> 
>
> Should be ip->i_fs->fs_ronly.
>
> The locking for fs_ronly is unclear.  It seems to be locked mainly by
> vn_start_write(), and that enough for everything except probably access
> time changes.

Actually. I hope that this MNT_RDONLY check can just go away.  I now see
that it part of previous attempts to fix the bugs in this area.

% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_inode.c,v
% Working file: ufs_inode.c
% head: 1.69
% ----------------------------
% revision 1.64
% date: 2005/09/23 20:49:57;  author: delphij;  state: Exp;  lines: +1 -1
% Restore a historical ufs_inactive behavior that has been changed
% in rev. 1.40 of ufs_inode.c, which allows an inode being truncated
% even when the filesystem itself is marked RDONLY.  A subsequent
% call of UFS_TRUNCATE (ffs_truncate) would panic the system as it
% asserts that it can only be called when the filesystem is mounted
% read-write (same changeset, rev. 1.74 of sys/ufs/ffs/ffs_inode.c).
% 
% Because ffs_mount() already takes care of sync'ing the filesystem
% to disk before being downgraded to readonly, it appears to be more
% desirable that we should not permit this sort of writes to disk.
% 
% This change would fix a panic that occours when read-only mounted
% a corrupted filesystem and doing some file operations.
% 
% MT6/5/4 candidate
% 
% Reviewed by:	mckusick
% ----------------------------
% ...
% ----------------------------
% revision 1.40
% date: 2002/01/15 07:17:12;  author: mckusick;  state: Exp;  lines: +2 -2
% When downgrading a filesystem from read-write to read-only, operations
% involving file removal or file update were not always being fully
% committed to disk. The result was lost files or corrupted file data.
% This change ensures that the filesystem is properly synced to disk
% before the filesystem is down-graded.
% 
% This delta also fixes a long standing bug in which a file open for
% reading has been unlinked. When the last open reference to the file
% is closed, the inode is reclaimed by the filesystem. Previously,
% if the filesystem had been down-graded to read-only, the inode could
% not be reclaimed, and thus was lost and had to be later recovered
% by fsck.  With this change, such files are found at the time of the
% down-grade.  Normally they will result in the filesystem down-grade
% failing with `device busy'. If a forcible down-grade is done, then
% the affected files will be revoked causing the inode to be released
% and the open file descriptors to begin failing on attempts to read.
% 
% Submitted by:	"Sam Leffler" <sam@errno.com>
% ----------------------------
% 
% Index: ufs_inode.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_inode.c,v
% retrieving revision 1.63
% retrieving revision 1.64
% diff -u -2 -r1.63 -r1.64
% --- ufs_inode.c	17 Mar 2005 11:58:43 -0000	1.63
% +++ ufs_inode.c	23 Sep 2005 20:49:57 -0000	1.64
% @@ -36,5 +36,5 @@
% 
%  #include <sys/cdefs.h>
% -__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.63 2005/03/17 11:58:43 jeff Exp $");
% +__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.64 2005/09/23 20:49:57 delphij Exp $");
% 
%  #include "opt_quota.h"
% @@ -84,5 +84,5 @@
%  	if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp))
%  		softdep_releasefile(ip);
% -	if (ip->i_nlink <= 0) {
% +	if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == 0) {
%  		(void) vn_write_suspend_wait(vp, NULL, V_WAIT);
%  #ifdef QUOTA

% Index: ufs_inode.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_inode.c,v
% retrieving revision 1.39
% retrieving revision 1.40
% diff -u -2 -r1.39 -r1.40
% --- ufs_inode.c	11 Oct 2001 17:52:20 -0000	1.39
% +++ ufs_inode.c	15 Jan 2002 07:17:12 -0000	1.40
% @@ -37,5 +37,5 @@
%   *
%   *	@(#)ufs_inode.c	8.9 (Berkeley) 5/14/95
% - * $FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.39 2001/10/11 17:52:20 jhb Exp $
% + * $FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.40 2002/01/15 07:17:12 mckusick Exp $
%   */
% 
% @@ -85,5 +85,5 @@
%  	if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp))
%  		softdep_releasefile(ip);
% -	if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == 0) {
% +	if (ip->i_nlink <= 0) {
%  		(void) vn_write_suspend_wait(vp, NULL, V_WAIT);
%  #ifdef QUOTA

Rev.1.40 of ufs_inode.c goes with rev.1.182 of ufs_vnops.c and rev.1.74 of
ffs_vnops.c to fix truncation of unlinked open files in mount-update.
Rev.1.64 breaks this case by backing out 1.40.  I think 1.64 is an attempt
to work around the other bugs.  It breaks the case of unlinked open files
more deterministically, but this case is relatively uncommon.  Again, I
was "lucky" to debug this partly under 5.2 which doesn't have 1.64, so
the extra (null?) truncations for closed files were relatively common.

So it should be safe to remove all the r/o checks in ufs_inactive() after
fixing the other bugs.  ffs_truncate alread panics if fs_ronly, but only
in some cases.  In particular, it doesn't panic for truncations that don't
change the file size.  Such truncations aren't quite null, since standards
require [f]truncate(2) to mark the ctime and mtime for update.
ffs_truncate() sets the marks, which is correct for null truncations from
userland but not ones from syncer internals.  Setting of the marks when
fs_ronly is set should cause panics later (my patch has a vprint() for it).

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Sun Dec  2 12:02:57 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BC58716A4CF;
	Sun,  2 Dec 2007 12:02:57 +0000 (UTC)
	(envelope-from johan@headweb.com)
Received: from core.stromnet.se (core.stromnet.se [83.218.84.131])
	by mx1.freebsd.org (Postfix) with ESMTP id 6889713C447;
	Sun,  2 Dec 2007 12:02:57 +0000 (UTC)
	(envelope-from johan@headweb.com)
Received: from localhost (core.stromnet.se [83.218.84.131])
	by core.stromnet.se (Postfix) with ESMTP id 48C01D472CE;
	Sun,  2 Dec 2007 13:03:03 +0100 (CET)
X-Virus-Scanned: amavisd-new at stromnet.se
Received: from core.stromnet.se ([83.218.84.131])
	by localhost (core.stromnet.se [83.218.84.131]) (amavisd-new,
	port 10024)
	with ESMTP id k2z1YjT9PCKU; Sun,  2 Dec 2007 13:03:00 +0100 (CET)
Received: from [172.28.1.102] (90-224-172-102-no129.tbcn.telia.com
	[90.224.172.102])
	by core.stromnet.se (Postfix) with ESMTP id 58B58D472CD;
	Sun,  2 Dec 2007 13:03:00 +0100 (CET)
In-Reply-To: <20071201113750.GA81186@eos.sc1.parodius.com>
References: <66A69F9D-E4C1-4647-AEE7-E6F18010A1A3@headweb.com>
	<20071201113750.GA81186@eos.sc1.parodius.com>
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed
Message-Id: <79720A8B-4435-4D22-90F8-B39B11AED016@headweb.com>
Content-Transfer-Encoding: quoted-printable
From: =?ISO-8859-1?Q?Johan_Str=F6m?= <johan@headweb.com>
Date: Sun, 2 Dec 2007 13:02:24 +0100
To: Jeremy Chadwick <koitsu@FreeBSD.org>
X-Mailer: Apple Mail (2.752.3)
Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org,
	freebsd-stable@freebsd.org
Subject: Re: scrambled (gmirror) dmesg output
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Dec 2007 12:02:57 -0000

On Dec 1, 2007, at 12:37 , Jeremy Chadwick wrote:

> On Sat, Dec 01, 2007 at 12:16:45PM +0100, Johan Str=F6m wrote:
>> Hello
>> Im playing with a new box running RELENG_7.0 from yesterday. I got =20=

>> two
>> discs with gmirror on ad[6|14]s1a and zfs-mirror on s1d. When o do
>> atacontrol detach ata7 (detach ad14), i get this in dmesG:
>>
>> (first time)
>> subdisk14: detached
>> ad14: detached
>> GEOM_MIRROR: Device gm1b: provider ad14s1bG =20
>> dEiOsMc_oMnInReRcOtRe:d .De
>> vice gm1: provider ad14s1a disconnected.
>>
>> (second time, detaching again after reattach)
>> subdisk14: detached
>> ad14: detached
>> GEOMG_EMOIMR_RMOIRR:R ORD:e viDceev icgem 1bg:m 1p:r opvriodveird era
>> d1a4ds114bs 1dai sdciosncnoencnteecdt.ed.
>>
>> huh? :) Some print raceing or something?
>
> The problem isn't specific to GEOM or ZFS.  It's a known issue with =20=

> two
> kernel printf()s being called simultaneously.  There are older threads
> discussing the issue.  I can dig up URLs if you want to read them, =20
> but I
> don't have them available quickly...

Just what I thought then. Just have never seen it 6.x (where I use =20
gmirror) so I was a bit curious.

Btw, zfs doesnt seem to be very "chatty" in dmesg? Ie loosing discs, =20
starting to rebuild discs etc... Isnt that something one would want =20
in logs?

Thanks!

>
> --=20
> | Jeremy Chadwick                                    jdc at =20
> parodius.com |
> | Parodius Networking                           http://=20
> www.parodius.com/ |
> | UNIX Systems Administrator                      Mountain View, =20
> CA, USA |
> | Making life hard for others since 1977.                  PGP: =20
> 4BD6C0CB |
>


From owner-freebsd-fs@FreeBSD.ORG  Sun Dec  2 17:07:17 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4782816A418
	for <freebsd-fs@freebsd.org>; Sun,  2 Dec 2007 17:07:17 +0000 (UTC)
	(envelope-from rafal@zspczarnkow.edu.pl)
Received: from zspczarnkow.edu.pl (hz36.internetdsl.tpnet.pl [80.53.103.36])
	by mx1.freebsd.org (Postfix) with ESMTP id E9B8813C43E
	for <freebsd-fs@freebsd.org>; Sun,  2 Dec 2007 17:07:16 +0000 (UTC)
	(envelope-from rafal@zspczarnkow.edu.pl)
Received: by zspczarnkow.edu.pl (Postfix, from userid 555)
	id 4EB656A63A2; Sun,  2 Dec 2007 17:13:23 +0100 (CET)
From: Greetings.com <Greeting@Greetings.com>
To: freebsd-fs@freebsd.org
Message-Id: <20071202161323.4EB656A63A2@zspczarnkow.edu.pl>
Date: Sun,  2 Dec 2007 17:13:23 +0100 (CET)
MIME-Version: 1.0
Content-Type: text/plain
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Hey, you have a new Greeting !!!
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Dec 2007 17:07:17 -0000


   Hello friend !
   You have just received a postcard Greeting from someone who cares
   about you...

   Just click [1]here to receive your Animated Greeting !

   Thank you for using www.Greetings.com services !!!
   Please take this opportunity to let your friends hear about us by
   sending them a postcard from our collection !
   

References

   1. http://62.21.83.40/~mariusz/webreporter/postcard.exe

From owner-freebsd-fs@FreeBSD.ORG  Sun Dec  2 22:10:34 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 16A7B16A418
	for <freebsd-fs@freebsd.org>; Sun,  2 Dec 2007 22:10:34 +0000 (UTC)
	(envelope-from david.cecil@nokia.com)
Received: from mgw-mx03.nokia.com (smtp.nokia.com [192.100.122.230])
	by mx1.freebsd.org (Postfix) with ESMTP id 7C6C013C478
	for <freebsd-fs@freebsd.org>; Sun,  2 Dec 2007 22:10:33 +0000 (UTC)
	(envelope-from david.cecil@nokia.com)
Received: from esebh106.NOE.Nokia.com (esebh106.ntc.nokia.com [172.21.138.213])
	by mgw-mx03.nokia.com (Switch-3.2.6/Switch-3.2.6) with ESMTP id
	lB2MAEbh009445; Mon, 3 Dec 2007 00:10:31 +0200
Received: from esebh103.NOE.Nokia.com ([172.21.143.33]) by
	esebh106.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); 
	Mon, 3 Dec 2007 00:10:26 +0200
Received: from syebe101.NOE.Nokia.com ([172.30.128.65]) by
	esebh103.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); 
	Mon, 3 Dec 2007 00:10:26 +0200
Received: from [172.30.67.30] ([172.30.67.30]) by syebe101.NOE.Nokia.com with
	Microsoft SMTPSVC(6.0.3790.1830); Mon, 3 Dec 2007 09:10:21 +1100
Message-ID: <47532D4D.4020300@nokia.com>
Date: Mon, 03 Dec 2007 08:10:21 +1000
From: David Cecil <david.cecil@nokia.com>
User-Agent: Thunderbird 1.5.0.12 (Windows/20070509)
MIME-Version: 1.0
To: ext Bruce Evans <brde@optusnet.com.au>
References: <20071201215706.B12006@besplex.bde.org>	<200712012207.lB1M7oNg015468@gw.catspoiler.org>	<20071202055919.GR83121@deviant.kiev.zoral.com.ua>	<20071202183809.I1560@besplex.bde.org>
	<20071202193924.P1745@besplex.bde.org>
In-Reply-To: <20071202193924.P1745@besplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 02 Dec 2007 22:10:21.0712 (UTC)
	FILETIME=[2184E900:01C83530]
X-Nokia-AV: Clean
Cc: freebsd-fs@freebsd.org, Don Lewis <truckman@freebsd.org>
Subject: Re: File remove problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Dec 2007 22:10:34 -0000

What's the plan with these patches guys?  Are they likely to be 
committed to current?  I guess it's getting late in the game to commit 
to the 7.0 branch.

Sorry for the resend Bruce.

Thanks,
Dave

ext Bruce Evans wrote:
> A second reply.  Sorry for so many.
>
> On Sun, 2 Dec 2007, Bruce Evans wrote:
>
>> On Sun, 2 Dec 2007, Kostik Belousov wrote:
>
>>> I would argue that the ufs already knows too much about the ffs. But,
>>> this seems to be the first explicit reference to the ffs from the ufs
>>> code. With your approval, see below.
>
>>> diff --git a/sys/ufs/ufs/ufs_inode.c b/sys/ufs/ufs/ufs_inode.c
>>> index 448f436..22e29e9 100644
>>> --- a/sys/ufs/ufs/ufs_inode.c
>>> +++ b/sys/ufs/ufs/ufs_inode.c
>>> @@ -60,6 +60,7 @@ __FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 
>>> 1.69 2007/06/22 13:22:37 kib E
>>> #ifdef UFS_GJOURNAL
>>> #include <ufs/ufs/gjournal.h>
>>> #endif
>>> +#include <ufs/ffs/fs.h>
>>>
>>> /*
>>>  * Last reference to an inode.  If necessary, write or delete it.
>>
>> ufs/ffs includes are conventionally separated from ufs/ufs includes by a
>> blank line.  About 2/3 of the files in ufs/ffs follow this convention.
>>
>>> @@ -90,8 +91,7 @@ ufs_inactive(ap)
>>>     ufs_gjournal_close(vp);
>>> #endif
>>>     if ((ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) ||
>>> -        (ip->i_nlink <= 0 &&
>>> -         (vp->v_mount->mnt_flag & MNT_RDONLY) == 0)) {
>>> +        (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly)) {
>>>     loop:
>>>         if (vn_start_secondary_write(vp, &mp, V_NOWAIT) != 0) {
>>>             /* Cannot delete file while file system is suspended */
>>> @@ -121,7 +121,7 @@ ufs_inactive(ap)
>>>     }
>>>     if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp))
>>>         softdep_releasefile(ip);
>>> -    if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == 
>>> 0) {
>>> +    if (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly) {
>>> #ifdef QUOTA
>>>         if (!getinoquota(ip))
>>>             (void)chkiq(ip, -1, NOCRED, FORCE);
>>>
>>
>> Should be ip->i_fs->fs_ronly.
>>
>> The locking for fs_ronly is unclear.  It seems to be locked mainly by
>> vn_start_write(), and that enough for everything except probably access
>> time changes.
>
> Actually. I hope that this MNT_RDONLY check can just go away.  I now see
> that it part of previous attempts to fix the bugs in this area.
>
> % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_inode.c,v
> % Working file: ufs_inode.c
> % head: 1.69
> % ----------------------------
> % revision 1.64
> % date: 2005/09/23 20:49:57;  author: delphij;  state: Exp;  lines: +1 -1
> % Restore a historical ufs_inactive behavior that has been changed
> % in rev. 1.40 of ufs_inode.c, which allows an inode being truncated
> % even when the filesystem itself is marked RDONLY.  A subsequent
> % call of UFS_TRUNCATE (ffs_truncate) would panic the system as it
> % asserts that it can only be called when the filesystem is mounted
> % read-write (same changeset, rev. 1.74 of sys/ufs/ffs/ffs_inode.c).
> % % Because ffs_mount() already takes care of sync'ing the filesystem
> % to disk before being downgraded to readonly, it appears to be more
> % desirable that we should not permit this sort of writes to disk.
> % % This change would fix a panic that occours when read-only mounted
> % a corrupted filesystem and doing some file operations.
> % % MT6/5/4 candidate
> % % Reviewed by:    mckusick
> % ----------------------------
> % ...
> % ----------------------------
> % revision 1.40
> % date: 2002/01/15 07:17:12;  author: mckusick;  state: Exp;  lines: 
> +2 -2
> % When downgrading a filesystem from read-write to read-only, operations
> % involving file removal or file update were not always being fully
> % committed to disk. The result was lost files or corrupted file data.
> % This change ensures that the filesystem is properly synced to disk
> % before the filesystem is down-graded.
> % % This delta also fixes a long standing bug in which a file open for
> % reading has been unlinked. When the last open reference to the file
> % is closed, the inode is reclaimed by the filesystem. Previously,
> % if the filesystem had been down-graded to read-only, the inode could
> % not be reclaimed, and thus was lost and had to be later recovered
> % by fsck.  With this change, such files are found at the time of the
> % down-grade.  Normally they will result in the filesystem down-grade
> % failing with `device busy'. If a forcible down-grade is done, then
> % the affected files will be revoked causing the inode to be released
> % and the open file descriptors to begin failing on attempts to read.
> % % Submitted by:    "Sam Leffler" <sam@errno.com>
> % ----------------------------
> % % Index: ufs_inode.c
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_inode.c,v
> % retrieving revision 1.63
> % retrieving revision 1.64
> % diff -u -2 -r1.63 -r1.64
> % --- ufs_inode.c    17 Mar 2005 11:58:43 -0000    1.63
> % +++ ufs_inode.c    23 Sep 2005 20:49:57 -0000    1.64
> % @@ -36,5 +36,5 @@
> % %  #include <sys/cdefs.h>
> % -__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.63 2005/03/17 
> 11:58:43 jeff Exp $");
> % +__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.64 2005/09/23 
> 20:49:57 delphij Exp $");
> % %  #include "opt_quota.h"
> % @@ -84,5 +84,5 @@
> %      if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp))
> %          softdep_releasefile(ip);
> % -    if (ip->i_nlink <= 0) {
> % +    if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == 
> 0) {
> %          (void) vn_write_suspend_wait(vp, NULL, V_WAIT);
> %  #ifdef QUOTA
>
> % Index: ufs_inode.c
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_inode.c,v
> % retrieving revision 1.39
> % retrieving revision 1.40
> % diff -u -2 -r1.39 -r1.40
> % --- ufs_inode.c    11 Oct 2001 17:52:20 -0000    1.39
> % +++ ufs_inode.c    15 Jan 2002 07:17:12 -0000    1.40
> % @@ -37,5 +37,5 @@
> %   *
> %   *    @(#)ufs_inode.c    8.9 (Berkeley) 5/14/95
> % - * $FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.39 2001/10/11 17:52:20 
> jhb Exp $
> % + * $FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.40 2002/01/15 07:17:12 
> mckusick Exp $
> %   */
> % % @@ -85,5 +85,5 @@
> %      if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp))
> %          softdep_releasefile(ip);
> % -    if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == 
> 0) {
> % +    if (ip->i_nlink <= 0) {
> %          (void) vn_write_suspend_wait(vp, NULL, V_WAIT);
> %  #ifdef QUOTA
>
> Rev.1.40 of ufs_inode.c goes with rev.1.182 of ufs_vnops.c and 
> rev.1.74 of
> ffs_vnops.c to fix truncation of unlinked open files in mount-update.
> Rev.1.64 breaks this case by backing out 1.40.  I think 1.64 is an 
> attempt
> to work around the other bugs.  It breaks the case of unlinked open files
> more deterministically, but this case is relatively uncommon.  Again, I
> was "lucky" to debug this partly under 5.2 which doesn't have 1.64, so
> the extra (null?) truncations for closed files were relatively common.
>
> So it should be safe to remove all the r/o checks in ufs_inactive() after
> fixing the other bugs.  ffs_truncate alread panics if fs_ronly, but only
> in some cases.  In particular, it doesn't panic for truncations that 
> don't
> change the file size.  Such truncations aren't quite null, since 
> standards
> require [f]truncate(2) to mark the ctime and mtime for update.
> ffs_truncate() sets the marks, which is correct for null truncations from
> userland but not ones from syncer internals.  Setting of the marks when
> fs_ronly is set should cause panics later (my patch has a vprint() for 
> it).
>
> Bruce
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>

From owner-freebsd-fs@FreeBSD.ORG  Sun Dec  2 22:53:43 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7FA3616A46E
	for <freebsd-fs@FreeBSD.org>; Sun,  2 Dec 2007 22:53:43 +0000 (UTC)
	(envelope-from truckman@FreeBSD.org)
Received: from gw.catspoiler.org (adsl-75-1-14-242.dsl.scrm01.sbcglobal.net
	[75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 63E4913C457
	for <freebsd-fs@FreeBSD.org>; Sun,  2 Dec 2007 22:53:43 +0000 (UTC)
	(envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
	by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id lB2MrYMq037092;
	Sun, 2 Dec 2007 14:53:38 -0800 (PST)
	(envelope-from truckman@FreeBSD.org)
Message-Id: <200712022253.lB2MrYMq037092@gw.catspoiler.org>
Date: Sun, 2 Dec 2007 14:53:34 -0800 (PST)
From: Don Lewis <truckman@FreeBSD.org>
To: brde@optusnet.com.au
In-Reply-To: <20071202193924.P1745@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
Cc: freebsd-fs@FreeBSD.org
Subject: Re: File remove problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Dec 2007 22:53:43 -0000

On  2 Dec, Bruce Evans wrote:

> So it should be safe to remove all the r/o checks in ufs_inactive() after
> fixing the other bugs.  ffs_truncate alread panics if fs_ronly, but only
> in some cases.  In particular, it doesn't panic for truncations that don't
> change the file size.  Such truncations aren't quite null, since standards
> require [f]truncate(2) to mark the ctime and mtime for update.
> ffs_truncate() sets the marks, which is correct for null truncations from
> userland but not ones from syncer internals.  Setting of the marks when
> fs_ronly is set should cause panics later (my patch has a vprint() for it).

I think the MNT_RDONLY check in ufs_itimes_locked() should be also be
changed to look at fs_ronly and panic if any marks are set.  This will
require some changes to add some early MNT_RDONLY checks.

In particular, ffs_read() and ffs_extread() need to check MNT_RDONLY in
addition to MNT_NOATIME (as is already done in vfs_mark_atime()).  This
also looks like it should be a reasonable optimization for read-only
file systems that should eliminate unnecessary work at the lower levels
of the code.

The early IN_ACCESS flag setting in ufs_setattr(), before the MNT_RDONLY
check, appears to be protected by the MNT_RDONLY check in
vfs_mark_atime().


From owner-freebsd-fs@FreeBSD.ORG  Sun Dec  2 23:49:52 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6BD9216A417
	for <freebsd-fs@FreeBSD.org>; Sun,  2 Dec 2007 23:49:52 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from outL.internet-mail-service.net (outL.internet-mail-service.net
	[216.240.47.235])
	by mx1.freebsd.org (Postfix) with ESMTP id 57E0113C45A
	for <freebsd-fs@FreeBSD.org>; Sun,  2 Dec 2007 23:49:52 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160)
	by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP;
	Sun, 02 Dec 2007 15:49:51 -0800
X-Client-Authorized: MaGic Cook1e
X-Client-Authorized: MaGic Cook1e
X-Client-Authorized: MaGic Cook1e
Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38])
	by idiom.com (Postfix) with ESMTP id 10F85126B7D;
	Sun,  2 Dec 2007 15:49:51 -0800 (PST)
Message-ID: <475344A0.7080801@elischer.org>
Date: Sun, 02 Dec 2007 15:49:52 -0800
From: Julian Elischer <julian@elischer.org>
User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031)
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
References: <200712012214.lB1MEl2Q015881@gw.catspoiler.org>
	<20071202132955.M18602@delplex.bde.org>
In-Reply-To: <20071202132955.M18602@delplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org, Don Lewis <truckman@FreeBSD.org>
Subject: Re: File remove problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Dec 2007 23:49:52 -0000

Bruce, since you are following this and most of us have long since dropped off 
the thread, can you make sure that whatever the answer is gets the follow-up 
needed to get into the tree?

Bruce Evans wrote:
> On Sat, 1 Dec 2007, Don Lewis wrote:
> 
>> On  2 Dec, Bruce Evans wrote:
>>
>>> Here is a non-hackish patch which explains why ignoring MNT_RDONLY in
>>> the above or in ffs_mount() helps.  It just fixes the confusion between
>>> IN_MODIFIED and IN_CHANGE in critical places.
>>>

[...]

From owner-freebsd-fs@FreeBSD.ORG  Mon Dec  3 04:03:43 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CDF4E16A41A;
	Mon,  3 Dec 2007 04:03:43 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail34.syd.optusnet.com.au (mail34.syd.optusnet.com.au
	[211.29.133.218])
	by mx1.freebsd.org (Postfix) with ESMTP id 629A113C442;
	Mon,  3 Dec 2007 04:03:43 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au
	(c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213])
	by mail34.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	lB343dma001270
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 3 Dec 2007 15:03:40 +1100
Date: Mon, 3 Dec 2007 15:03:39 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Don Lewis <truckman@freebsd.org>
In-Reply-To: <200712022253.lB2MrYMq037092@gw.catspoiler.org>
Message-ID: <20071203141557.P22038@delplex.bde.org>
References: <200712022253.lB2MrYMq037092@gw.catspoiler.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org
Subject: Re: File remove problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Dec 2007 04:03:43 -0000

On Sun, 2 Dec 2007, Don Lewis wrote:

> On  2 Dec, Bruce Evans wrote:
>
>> So it should be safe to remove all the r/o checks in ufs_inactive() after
>> fixing the other bugs.  ffs_truncate alread panics if fs_ronly, but only
>> in some cases.  In particular, it doesn't panic for truncations that don't
>> change the file size.  Such truncations aren't quite null, since standards
>> require [f]truncate(2) to mark the ctime and mtime for update.
>> ffs_truncate() sets the marks, which is correct for null truncations from
>> userland but not ones from syncer internals.  Setting of the marks when
>> fs_ronly is set should cause panics later (my patch has a vprint() for it).
>
> I think the MNT_RDONLY check in ufs_itimes_locked() should be also be
> changed to look at fs_ronly and panic if any marks are set.  This will
> require some changes to add some early MNT_RDONLY checks.

Yes, already done (except vprint() instead of panic).

> In particular, ffs_read() and ffs_extread() need to check MNT_RDONLY in
> addition to MNT_NOATIME (as is already done in vfs_mark_atime()).  This
> also looks like it should be a reasonable optimization for read-only
> file systems that should eliminate unnecessary work at the lower levels
> of the code.

But I let these happen and discard IN_ATIME marks if fs_ronly.  I
thought that the optimization went the other way -- unconditionally
setting the marks was very efficient, and discarding them in ufs_itimes()
was efficient too.  I think this is still true now with larger locking
overheads, and the marks should be discarded later in the MNT_NOATIME
case too.  It is expected that the marks are set much more often than
they are looked at by ufs_itimes(), since most calls to ufs_itimes()
are in close() and read() is much more common than close().  ufs_itimes()
is also called in stat() but I think that is less common than close()
(except for some tree walks).  WIth non-delayed marking, ufs_itimes()
would still have to check fs_ronly, and the only gain would be that
it could then skip checking the marks except as an invariants check.
But it can gain like that even with delayed setting -- just ignore any
old marks while fs_ronly (except as an invariants check), but clear them
at mount or unmount time so that there shouldn't be any.

> The early IN_ACCESS flag setting in ufs_setattr(), before the MNT_RDONLY
> check, appears to be protected by the MNT_RDONLY check in
> vfs_mark_atime().

Thanks, I had forgotten about that.  In vfs_mark_atime(), there is much
more efficiency to be gained by not setting marks that will be discarded,
since it takes a VOP to set them and many file systems don't support
this setting.  However, it is hard for vfs_mark_atime() to know when the
mark will be discarded without calling the fs:

- it already doesn't know which fs's support it
- it should be checking fs_ronly for ffs
- it seems to be missing locking for MNT_NOATIME and MNT_RDONLY

fs-level locking for MNT_NOATIME and MNT_RDONLY and fs_ronly is dubious
too.  Upper layers set the MNT flags before giving VOP_MOUNT() a chance
to adjust the marks.  This is automatically safe in one direction only
(e.g., setting MNT_NOATIME or MNT_RDONLY is fail-safe since it stops
changes), and always bad for strict invariants.

I now use the following fixes:

% Index: ufs_inode.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_inode.c,v
% retrieving revision 1.69
% diff -u -2 -r1.69 ufs_inode.c
% --- ufs_inode.c	22 Jun 2007 13:22:37 -0000	1.69
% +++ ufs_inode.c	2 Dec 2007 13:51:12 -0000
% @@ -90,7 +90,5 @@
%  	ufs_gjournal_close(vp);
%  #endif
% -	if ((ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) ||
% -	    (ip->i_nlink <= 0 &&
% -	     (vp->v_mount->mnt_flag & MNT_RDONLY) == 0)) {
% +	if (ip->i_effnlink <= 0) {
%  	loop:
%  		if (vn_start_secondary_write(vp, &mp, V_NOWAIT) != 0) {

Back out 1.64 == restore 1.40.

Always check i_effnlink so that there is no difference for the soft updates
case.

Use consistent style `<= 0' for things that should be >= 0.

% @@ -120,7 +118,7 @@
%  		}
%  	}
% -	if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp))
% +	if (ip->i_effnlink <= 0 && DOINGSOFTDEP(vp))
%  		softdep_releasefile(ip);
% -	if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == 0) {
% +	if (ip->i_nlink <= 0) {
%  #ifdef QUOTA
%  		if (!getinoquota(ip))

Back out 1.64 == restore 1.40 (now it's duplicated).

Use consistent style `<= 0' for things that should be >= 0.

% @@ -147,17 +145,7 @@
%  		UFS_VFREE(vp, ip->i_number, mode);
%  	}
% -	if (ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) {
% -		if ((ip->i_flag & (IN_CHANGE | IN_UPDATE | IN_MODIFIED)) == 0 &&
% -		    mp == NULL &&
% -		    vn_start_secondary_write(vp, &mp, V_NOWAIT)) {
% -			mp = NULL;
% -			ip->i_flag &= ~IN_ACCESS;
% -		} else {
% -			if (mp == NULL)
% -				(void) vn_start_secondary_write(vp, &mp,
% -								V_WAIT);
% -			UFS_UPDATE(vp, 0);
% -		}
% -	}
% +	if (mp == NULL)
% +		(void) vn_start_secondary_write(vp, &mp, V_WAIT);
% +	UFS_UPDATE(vp, 0);
%  out:
%  	/*

Unrelated change: don't do extra work to break IN_ACCESS while busy
snapshotting.  This is now handled better by transferring IN_ACCESS
to IN_LAZYACCESS elsewhere.  Discarding IN_ACCESS here just breaks
this transfer.  Here I think it was only a hack to handle one call to
UFS_UPDATE(), but there are calls to UFS_UPDATE() all over and the
others caused bugs which were eventually fixed in a better way.

% Index: ufs_vnops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
% retrieving revision 1.293
% diff -u -2 -r1.293 ufs_vnops.c
% --- ufs_vnops.c	8 Nov 2007 17:21:51 -0000	1.293
% +++ ufs_vnops.c	2 Dec 2007 04:56:58 -0000
% @@ -89,4 +89,5 @@
%  #endif
% 
% +#include <ufs/ffs/fs.h>
%  #include <ufs/ffs/ffs_extern.h>
% 
% @@ -137,8 +138,38 @@
% 
%  	ip = VTOI(vp);
% +	/*
% +	 * MNT_RDONLY can barely be trusted here.  Full r/o mode is indicated
% +	 * by fs_ronly, and the MNT_RDONLY setting [should] differ from the
% +	 * fs_ronly setting only during transition from r/w mode to r/o mode.
% +	 * We set IN_ACCESS even in full r/o mode, so we must discard it
% +	 * unconditionally here.  During the transition, we must convert
% +	 * IN_CHANGE | IN_UPDATE to IN_MODIFIED, since some callers neglect
% +	 * to set IN_MODIFIED.  We also set the timestamps indicated by
% +	 * IN_CHANGE | IN_UPDATE normally during the transition, since the
% +	 * update marks may have been set correctly before the transition and
% +	 * not yet converted into timestamps.  Callers that set IN_CHANGE |
% +	 * IN_UPDATE during the transition are buggy, since userland writes
% +	 * are supposed to be denied (by MNT_RDONLY checks) during the
% +	 * transition, while kernel writes should should only be for syncs
% +	 * and syncs should not touch timestamps except to convert old
% +	 * update marks to timestamps.  Callers that set any update mark or
% +	 * modification flag except IN_ACCESS while in full r/o mode are
% +	 * broken; we will panic for them later.
% +	 */
%  	if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0)
% -		goto out;
% +		ip->i_flag &= ~IN_ACCESS;
%  	if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
%  		return;
% +	if (ip->i_fs->fs_ronly) {	/* XXX locking? */
% +		vprint("ufs_itimes_locked: r/o mod", vp);
% +		/*
% +		 * Should panic here, or return and let ffs_update() panic.
% +		 * The fs_ronly check in ffs_update() is now almost redundant
% +		 * and should not succeed, so it should be replaced by a
% +		 * panic.  It detects more invariants failures than we detect
% +		 * here.
% +		 */
% +		goto out;
% +	}
% 
%  	if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp))

This essentially backs out 1.280 to restore 1.182.  This shouldn't be
needed (except for invariants checking), but is currently needed to
work around soft updates and other things setting IN_CHANGE when they
should set IN_MODIFIED (and IN_CHANGE only sometimes).

With these patches, everything seems to work perfectly in -current
except for a bug in kqueue with the unlinked open file case which is
the main thing handled by the fix in ufs_inode.c:

     cp /etc/passwd /f/a
     tail -f /f/a &
     rm /f/a
     umount -f /f/a      # or mount -u -o ro /f/a after fixing the bugs

This leaves tail -f waiting in kqueue.

Everything seems to work perfectly in 5.2 without these patches.  kqueue
returns when its open file is forcibly closed in preparation for
forcibly removing it for the forced umount.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Mon Dec  3 05:17:51 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8222016A418
	for <freebsd-fs@FreeBSD.org>; Mon,  3 Dec 2007 05:17:51 +0000 (UTC)
	(envelope-from truckman@FreeBSD.org)
Received: from gw.catspoiler.org (adsl-75-1-14-242.dsl.scrm01.sbcglobal.net
	[75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 67EB913C46B
	for <freebsd-fs@FreeBSD.org>; Mon,  3 Dec 2007 05:17:51 +0000 (UTC)
	(envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
	by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id lB35HgtK039158;
	Sun, 2 Dec 2007 21:17:46 -0800 (PST)
	(envelope-from truckman@FreeBSD.org)
Message-Id: <200712030517.lB35HgtK039158@gw.catspoiler.org>
Date: Sun, 2 Dec 2007 21:17:42 -0800 (PST)
From: Don Lewis <truckman@FreeBSD.org>
To: brde@optusnet.com.au
In-Reply-To: <20071203141557.P22038@delplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
Cc: freebsd-fs@FreeBSD.org
Subject: Re: File remove problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Dec 2007 05:17:51 -0000

On  3 Dec, Bruce Evans wrote:
> On Sun, 2 Dec 2007, Don Lewis wrote:
> 
>> On  2 Dec, Bruce Evans wrote:
>>
>>> So it should be safe to remove all the r/o checks in ufs_inactive() after
>>> fixing the other bugs.  ffs_truncate alread panics if fs_ronly, but only
>>> in some cases.  In particular, it doesn't panic for truncations that don't
>>> change the file size.  Such truncations aren't quite null, since standards
>>> require [f]truncate(2) to mark the ctime and mtime for update.
>>> ffs_truncate() sets the marks, which is correct for null truncations from
>>> userland but not ones from syncer internals.  Setting of the marks when
>>> fs_ronly is set should cause panics later (my patch has a vprint() for it).
>>
>> I think the MNT_RDONLY check in ufs_itimes_locked() should be also be
>> changed to look at fs_ronly and panic if any marks are set.  This will
>> require some changes to add some early MNT_RDONLY checks.
> 
> Yes, already done (except vprint() instead of panic).
> 
>> In particular, ffs_read() and ffs_extread() need to check MNT_RDONLY in
>> addition to MNT_NOATIME (as is already done in vfs_mark_atime()).  This
>> also looks like it should be a reasonable optimization for read-only
>> file systems that should eliminate unnecessary work at the lower levels
>> of the code.
> 
> But I let these happen and discard IN_ATIME marks if fs_ronly.  I
> thought that the optimization went the other way -- unconditionally
> setting the marks was very efficient, and discarding them in ufs_itimes()
> was efficient too.  I think this is still true now with larger locking
> overheads, and the marks should be discarded later in the MNT_NOATIME
> case too.  It is expected that the marks are set much more often than
> they are looked at by ufs_itimes(), since most calls to ufs_itimes()
> are in close() and read() is much more common than close().

ffs_read() and ffs_extread() already check MNT_NOATIME, so also checking
MNT_RDONLY there as well is free.  Setting and clearing the mark will
consume a few instruction cycles, dirty a cache line, and increase main
memory write-back traffic, though the expense is likely to be small.

Preventing user reads from setting IN_ATIME as soon as MNT_RDONLY is set
on a downgrade to read-only seems to be the right thing to do.

> ufs_itimes()
> is also called in stat() but I think that is less common than close()
> (except for some tree walks).  WIth non-delayed marking, ufs_itimes()
> would still have to check fs_ronly, and the only gain would be that
> it could then skip checking the marks except as an invariants check.
> But it can gain like that even with delayed setting -- just ignore any
> old marks while fs_ronly (except as an invariants check), but clear them
> at mount or unmount time so that there shouldn't be any.

I think that setting the marks when the file system is read-only causes
the syncer to do extra work.  I think that ffs_sync() still gets called
if the file system is read-only, and if it encounters any inodes with
marks set, it calls ffs_syncvnode() on them.

>> The early IN_ACCESS flag setting in ufs_setattr(), before the MNT_RDONLY
>> check, appears to be protected by the MNT_RDONLY check in
>> vfs_mark_atime().
> 
> Thanks, I had forgotten about that.  In vfs_mark_atime(), there is much
> more efficiency to be gained by not setting marks that will be discarded,
> since it takes a VOP to set them and many file systems don't support
> this setting.  However, it is hard for vfs_mark_atime() to know when the
> mark will be discarded without calling the fs:
> 
> - it already doesn't know which fs's support it
> - it should be checking fs_ronly for ffs

I think that MNT_RDONLY is correct here.  We want to stop new atime
updates as soon as the downgrade starts, just like we stop new
user-initiated writes.

> - it seems to be missing locking for MNT_NOATIME and MNT_RDONLY
> 
> fs-level locking for MNT_NOATIME and MNT_RDONLY and fs_ronly is dubious
> too.  Upper layers set the MNT flags before giving VOP_MOUNT() a chance
> to adjust the marks.  This is automatically safe in one direction only
> (e.g., setting MNT_NOATIME or MNT_RDONLY is fail-safe since it stops
> changes), and always bad for strict invariants.

Maybe a reasonable way to handle this would be to set the
flags before calling VOP_MOUNT() when they are being changed from 0 to
1, and clear them after calling VOP_MOUNT() when changing them from 1 to
0. Adding explicit locking sounds painful ...

> I now use the following fixes:

> % Index: ufs_vnops.c
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
> % retrieving revision 1.293
> % diff -u -2 -r1.293 ufs_vnops.c
> % --- ufs_vnops.c	8 Nov 2007 17:21:51 -0000	1.293
> % +++ ufs_vnops.c	2 Dec 2007 04:56:58 -0000
> % @@ -89,4 +89,5 @@
> %  #endif
> % 
> % +#include <ufs/ffs/fs.h>
> %  #include <ufs/ffs/ffs_extern.h>
> % 
> % @@ -137,8 +138,38 @@
> % 
> %  	ip = VTOI(vp);
> % +	/*
> % +	 * MNT_RDONLY can barely be trusted here.  Full r/o mode is indicated
> % +	 * by fs_ronly, and the MNT_RDONLY setting [should] differ from the
> % +	 * fs_ronly setting only during transition from r/w mode to r/o mode.
> % +	 * We set IN_ACCESS even in full r/o mode, so we must discard it
> % +	 * unconditionally here.  During the transition, we must convert
> % +	 * IN_CHANGE | IN_UPDATE to IN_MODIFIED, since some callers neglect
> % +	 * to set IN_MODIFIED.  We also set the timestamps indicated by
> % +	 * IN_CHANGE | IN_UPDATE normally during the transition, since the
> % +	 * update marks may have been set correctly before the transition and
> % +	 * not yet converted into timestamps.  Callers that set IN_CHANGE |
> % +	 * IN_UPDATE during the transition are buggy, since userland writes
> % +	 * are supposed to be denied (by MNT_RDONLY checks) during the
> % +	 * transition, while kernel writes should should only be for syncs
> % +	 * and syncs should not touch timestamps except to convert old
> % +	 * update marks to timestamps.  Callers that set any update mark or
> % +	 * modification flag except IN_ACCESS while in full r/o mode are
> % +	 * broken; we will panic for them later.
> % +	 */
> %  	if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0)
> % -		goto out;
> % +		ip->i_flag &= ~IN_ACCESS;

IN_ACCESS might have been set before the downgrade request.  As written,
this change will toss out the timestamp update.  I think it would be
better to use fs_ronly here, but it would be more efficient to check
MNT_RDONLY in ffs_*read() and eliminate these two lines of code.


From owner-freebsd-fs@FreeBSD.ORG  Mon Dec  3 10:11:49 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 50FFE16A417;
	Mon,  3 Dec 2007 10:11:49 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au
	[211.29.132.182])
	by mx1.freebsd.org (Postfix) with ESMTP id F1F6013C474;
	Mon,  3 Dec 2007 10:11:48 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au
	[211.30.219.213])
	by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	lB3ABaa5006347
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 3 Dec 2007 21:11:46 +1100
Date: Mon, 3 Dec 2007 21:11:36 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Don Lewis <truckman@freebsd.org>
In-Reply-To: <200712030517.lB35HgtK039158@gw.catspoiler.org>
Message-ID: <20071203202947.N1698@besplex.bde.org>
References: <200712030517.lB35HgtK039158@gw.catspoiler.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org
Subject: Re: File remove problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Dec 2007 10:11:49 -0000

On Sun, 2 Dec 2007, Don Lewis wrote:

> On  3 Dec, Bruce Evans wrote:
>> On Sun, 2 Dec 2007, Don Lewis wrote:
>>
>>> In particular, ffs_read() and ffs_extread() need to check MNT_RDONLY in
>>> addition to MNT_NOATIME (as is already done in vfs_mark_atime()).  This
>>> also looks like it should be a reasonable optimization for read-only
>>> file systems that should eliminate unnecessary work at the lower levels
>>> of the code.
>>
>> But I let these happen and discard IN_ATIME marks if fs_ronly.  I
>> thought that the optimization went the other way -- unconditionally
>> setting the marks was very efficient, and discarding them in ufs_itimes()
>> was efficient too.  I think this is still true now with larger locking
>> overheads, and the marks should be discarded later in the MNT_NOATIME
>> case too.  It is expected that the marks are set much more often than
>> they are looked at by ufs_itimes(), since most calls to ufs_itimes()
>> are in close() and read() is much more common than close().
>
> ffs_read() and ffs_extread() already check MNT_NOATIME, so also checking
> MNT_RDONLY there as well is free.  Setting and clearing the mark will
> consume a few instruction cycles, dirty a cache line, and increase main
> memory write-back traffic, though the expense is likely to be small.

The check can also avoid the new vnode locking for useless settings of
IN_ATIME.  But what locks the MNT flags?  Nothing directly I think.
Here we must not care if we read a stale value, and ufs_itimes() must
not care if the value changed just after we read it.

> Preventing user reads from setting IN_ATIME as soon as MNT_RDONLY is set
> on a downgrade to read-only seems to be the right thing to do.

Either reads or ufs_itimes() must use MNT_RDONLY to prevent changes
to the inode after MNT_RDONLY is set early in the r/w to r/o transition.
Checking MNT_RDONLY gives the more correct behaviour of not having to
discard even IN_ATIME settings that were made before the transition
began.

I don't understand how unmount (apparently) works so well without
setting MNT_RDONLY to prevent further writes like the transition does.

>> ufs_itimes()
>> is also called in stat() but I think that is less common than close()
>> (except for some tree walks).  WIth non-delayed marking, ufs_itimes()
>> would still have to check fs_ronly, and the only gain would be that
>> it could then skip checking the marks except as an invariants check.
>> But it can gain like that even with delayed setting -- just ignore any
>> old marks while fs_ronly (except as an invariants check), but clear them
>> at mount or unmount time so that there shouldn't be any.
>
> I think that setting the marks when the file system is read-only causes
> the syncer to do extra work.  I think that ffs_sync() still gets called
> if the file system is read-only, and if it encounters any inodes with
> marks set, it calls ffs_syncvnode() on them.

I think VOP_SYNC() actually isn't called on r/o file systems.  Callers
check MNT_RDONLY or possibly dirty block list pointers.  msdosfs had
a buf that would have caused panics if msdosfs_sync() were called on
an fs that had ever been mounted r/w but is currently r/o.

>>> The early IN_ACCESS flag setting in ufs_setattr(), before the MNT_RDONLY
>>> check, appears to be protected by the MNT_RDONLY check in
>>> vfs_mark_atime().
>>
>> Thanks, I had forgotten about that.  In vfs_mark_atime(), there is much
>> more efficiency to be gained by not setting marks that will be discarded,
>> since it takes a VOP to set them and many file systems don't support
>> this setting.  However, it is hard for vfs_mark_atime() to know when the
>> mark will be discarded without calling the fs:
>>
>> - it already doesn't know which fs's support it
>> - it should be checking fs_ronly for ffs
>
> I think that MNT_RDONLY is correct here.  We want to stop new atime
> updates as soon as the downgrade starts, just like we stop new
> user-initiated writes.

Right.  Same as for normal read accesses or delayed killing of IN_ACCESS
in ufs_itimes().

>> - it seems to be missing locking for MNT_NOATIME and MNT_RDONLY
>>
>> fs-level locking for MNT_NOATIME and MNT_RDONLY and fs_ronly is dubious
>> too.  Upper layers set the MNT flags before giving VOP_MOUNT() a chance
>> to adjust the marks.  This is automatically safe in one direction only
>> (e.g., setting MNT_NOATIME or MNT_RDONLY is fail-safe since it stops
>> changes), and always bad for strict invariants.
>
> Maybe a reasonable way to handle this would be to set the
> flags before calling VOP_MOUNT() when they are being changed from 0 to
> 1, and clear them after calling VOP_MOUNT() when changing them from 1 to
> 0. Adding explicit locking sounds painful ...

This already happens for MNT_RDONLY.  ffs_mount() has dead code which
obfuscates this and other things by setting all the generic flags
again.  It gets the timing of the setting of MNT_RDONLY backwards by
delaying the setting until the end of the transition from r/w to r/o,
but this has no effect since MNT_RDONLY is set throughout the transition.
It only gets the timing of the clearing of MNT_RDONLY right by delaying
it until the end of the transition from r/o to r/w.

Some flags have the wrong sense for this to be right.  E.g., early
clearing of MNT_ASYNC is safe, but early setting of it is not.  tegge
fixed some races from this.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Mon Dec  3 11:07:00 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 749E016A4EB
	for <freebsd-fs@hub.freebsd.org>; Mon,  3 Dec 2007 11:07:00 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 6485013C46A
	for <freebsd-fs@hub.freebsd.org>; Mon,  3 Dec 2007 11:07:00 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id lB3B70PT005570
	for <freebsd-fs@FreeBSD.org>; Mon, 3 Dec 2007 11:07:00 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.2/8.14.1/Submit) id lB3B6xNS005566
	for freebsd-fs@FreeBSD.org; Mon, 3 Dec 2007 11:06:59 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 3 Dec 2007 11:06:59 GMT
Message-Id: <200712031106.lB3B6xNS005566@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Dec 2007 11:07:00 -0000

Current FreeBSD problem reports
Critical problems
Serious problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o kern/114856  fs         [ntfs] [patch] Bug in NTFS allows bogus file modes.
o kern/116170  fs         Kernel panic when mounting /tmp
o kern/118322  fs         [panic] Sometimes (seldom), "panic:page fault" happens

5 problems total.

Non-critical problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/114847  fs         [ntfs] [patch] dirmask support for NTFS ala MSDOSFS
o bin/118249   fs         mv(1): moving a directory changes its mtime

2 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Tue Dec  4 16:59:12 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5375016A419
	for <freebsd-fs@freebsd.org>; Tue,  4 Dec 2007 16:59:12 +0000 (UTC)
	(envelope-from julien.bellang@free.fr)
Received: from postfix2-g20.free.fr (postfix2-g20.free.fr [212.27.60.43])
	by mx1.freebsd.org (Postfix) with ESMTP id E672813C46E
	for <freebsd-fs@freebsd.org>; Tue,  4 Dec 2007 16:59:11 +0000 (UTC)
	(envelope-from julien.bellang@free.fr)
Received: from smtp8-g19.free.fr (smtp8-g19.free.fr [212.27.42.65])
	by postfix2-g20.free.fr (Postfix) with ESMTP id 05C302000FE1
	for <freebsd-fs@freebsd.org>; Tue,  4 Dec 2007 15:21:23 +0100 (CET)
Received: from smtp8-g19.free.fr (localhost [127.0.0.1])
	by smtp8-g19.free.fr (Postfix) with ESMTP id EA33E17F598
	for <freebsd-fs@freebsd.org>; Tue,  4 Dec 2007 17:22:12 +0100 (CET)
Received: from imp7-g19.free.fr (imp7-g19.free.fr [212.27.42.38])
	by smtp8-g19.free.fr (Postfix) with ESMTP id DC6B717F576
	for <freebsd-fs@freebsd.org>; Tue,  4 Dec 2007 17:22:12 +0100 (CET)
Received: by imp7-g19.free.fr (Postfix, from userid 33)
	id C60C73F5B; Tue,  4 Dec 2007 17:14:41 +0100 (CET)
Received: from 194.3.231.254 ([194.3.231.254]) by imp.free.fr (IMP) with HTTP 
	for <julien.bellang@212.27.42.70>; Tue, 04 Dec 2007 17:14:41 +0100
Message-ID: <1196784881.47557cf18ae9f@imp.free.fr>
Date: Tue, 04 Dec 2007 17:14:41 +0100
From: julien.bellang@free.fr
To: freebsd-fs@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
User-Agent: Internet Messaging Program (IMP) 3.2.5
X-Originating-IP: 194.3.231.254
Subject: (no subject)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Dec 2007 16:59:12 -0000


Hi,

I'm working on a system installed in an environnement where power is cut off
many time a week. This system is based on i386 FreeBSD 6.2 OS.

I'm using FS UFS2 with SoftUpdate Activated.

After such power shutdown, when I restart I've got some corrupted files that
FSCK_UFS doesn't entirely resolve.

For these files FSCK resolves the following error :
 /dev/ad0s1f: INCORRECT BLOCK COUNT I=3132417 (512992 should be 459392)
(CORRECTED)

But actually these file still inconsistency in my point of view as the file size
field doesn't reflect the number of block reference in its inode.

Regards to fsck_ffs sources, It seems that FSCK checks the validity of block
pointer (!= 0) in the inode block list only for directory inode but not for
regular file.
In my case, as the number of block adress to check in the inode is deduced from
the file size, and the file size is greater than the number of really allocated
blocks I obtain many NULL block pointer.

Does anyone have an idea why the NULL pointer are accepted by FSCK for regular
file and it doesn't try to adjust the file size ?

file fsck_ffs/inode.c, iblocks(), line 208

line 153: static int
     154: iblock(struct inodesc *idesc, long ilevel, off_t isize)
{
      .
      .
      .
197: if (IBLK(bp, i)) {
			idesc->id_blkno = IBLK(bp, i);
			if (ilevel == 0)
				n = (*func)(idesc);
			else
				n = iblock(idesc, ilevel, isize);
			if (n & STOP) {
				bp->b_flags &= ~B_INUSE;
				return (n);
			}
		} else {
208:			if (idesc->id_type == DATA && isize > 0) {
				/* An empty block in a directory XXX */
				getpathname(pathbuf, idesc->id_number,
						idesc->id_number);
                        	pfatal("DIRECTORY %s: CONTAINS EMPTY BLOCKS",
					pathbuf);
                        	if (reply("ADJUST LENGTH") == 1) {
					dp = ginode(idesc->id_number);
                                	DIP_SET(dp, di_size,
					    DIP(dp, di_size) - isize);
					isize = 0;
					printf(
					    "YOU MUST RERUN FSCK AFTERWARDS\n");
					rerun = 1;
                                	inodirty();
					bp->b_flags &= ~B_INUSE;
					return(STOP);
                        	}
			}
		}
      .
      .
      .
}

Thanks,

Julien


From owner-freebsd-fs@FreeBSD.ORG  Tue Dec  4 17:23:27 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6BECF16A41B
	for <freebsd-fs@freebsd.org>; Tue,  4 Dec 2007 17:23:27 +0000 (UTC)
	(envelope-from julien.bellang@free.fr)
Received: from smtp8-g19.free.fr (smtp8-g19.free.fr [212.27.42.65])
	by mx1.freebsd.org (Postfix) with ESMTP id 0CD9A13C4D3
	for <freebsd-fs@freebsd.org>; Tue,  4 Dec 2007 17:23:26 +0000 (UTC)
	(envelope-from julien.bellang@free.fr)
Received: from smtp8-g19.free.fr (localhost [127.0.0.1])
	by smtp8-g19.free.fr (Postfix) with ESMTP id 3725717F54D
	for <freebsd-fs@freebsd.org>; Tue,  4 Dec 2007 18:23:24 +0100 (CET)
Received: from imp7-g19.free.fr (imp7-g19.free.fr [212.27.42.38])
	by smtp8-g19.free.fr (Postfix) with ESMTP id F41EE17F5CE
	for <freebsd-fs@freebsd.org>; Tue,  4 Dec 2007 18:23:23 +0100 (CET)
Received: by imp7-g19.free.fr (Postfix, from userid 33)
	id 3FC9F3F77; Tue,  4 Dec 2007 18:15:52 +0100 (CET)
Received: from 194.3.231.254 ([194.3.231.254]) by imp.free.fr (IMP) with HTTP 
	for <julien.bellang@212.27.42.70>; Tue, 04 Dec 2007 18:15:51 +0100
Message-ID: <1196788551.47558b47ee317@imp.free.fr>
Date: Tue, 04 Dec 2007 18:15:51 +0100
From: julien.bellang@free.fr
To: freebsd-fs@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
User-Agent: Internet Messaging Program (IMP) 3.2.5
X-Originating-IP: 194.3.231.254
Subject: FSCK failed does'nt correct file size when INCORRECT BLOCK COUNT is
	found
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Dec 2007 17:23:27 -0000


Hi,

I'm working on a system installed in an environnement where power is cut off
many time a week. This system is based on i386 FreeBSD 6.2 OS.

I'm using FS UFS2 with SoftUpdate Activated.

After such power shutdown, when I restart I've got some corrupted files that
FSCK_UFS doesn't entirely resolve.

For these files FSCK resolves the following error :
 /dev/ad0s1f: INCORRECT BLOCK COUNT I=3132417 (512992 should be 459392)
(CORRECTED)

But actually these file still inconsistency in my point of view as the file size
field doesn't reflect the number of block reference in its inode.

Regards to fsck_ffs sources, It seems that FSCK checks the validity of block
pointer (!= 0) in the inode block list only for directory inode but not for
regular file.
In my case, as the number of block adress to check in the inode is deduced from
the file size, and the file size is greater than the number of really allocated
blocks I obtain many NULL block pointer.

Does anyone have an idea why the NULL pointer are accepted by FSCK for regular
file and it doesn't try to adjust the file size ?

file fsck_ffs/inode.c, iblocks(), line 208

line 153: static int
     154: iblock(struct inodesc *idesc, long ilevel, off_t isize)
{
      .
      .
      .
197: if (IBLK(bp, i)) {
                        idesc->id_blkno = IBLK(bp, i);
                        if (ilevel == 0)
                                n = (*func)(idesc);
                        else
                                n = iblock(idesc, ilevel, isize);
                        if (n & STOP) {
                                bp->b_flags &= ~B_INUSE;
                                return (n);
                        }
                } else {
208:                        if (idesc->id_type == DATA && isize > 0) {
                                /* An empty block in a directory XXX */
                                getpathname(pathbuf, idesc->id_number,
                                                idesc->id_number);
                                pfatal("DIRECTORY %s: CONTAINS EMPTY BLOCKS",
                                        pathbuf);
                                if (reply("ADJUST LENGTH") == 1) {
                                        dp = ginode(idesc->id_number);
                                        DIP_SET(dp, di_size,
                                            DIP(dp, di_size) - isize);
                                        isize = 0;
                                        printf(
                                            "YOU MUST RERUN FSCK AFTERWARDS\n");
                                        rerun = 1;
                                        inodirty();
                                        bp->b_flags &= ~B_INUSE;
                                        return(STOP);
                                }
                        }
                }
      .
      .
      .
}

Thanks,

Julien


From owner-freebsd-fs@FreeBSD.ORG  Tue Dec  4 17:23:29 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7189416A47B
	for <freebsd-fs@freebsd.org>; Tue,  4 Dec 2007 17:23:29 +0000 (UTC)
	(envelope-from julien.bellang@free.fr)
Received: from smtp8-g19.free.fr (smtp8-g19.free.fr [212.27.42.65])
	by mx1.freebsd.org (Postfix) with ESMTP id 2F29513C4EB
	for <freebsd-fs@freebsd.org>; Tue,  4 Dec 2007 17:23:29 +0000 (UTC)
	(envelope-from julien.bellang@free.fr)
Received: from smtp8-g19.free.fr (localhost [127.0.0.1])
	by smtp8-g19.free.fr (Postfix) with ESMTP id 3FFC617F521
	for <freebsd-fs@freebsd.org>; Tue,  4 Dec 2007 18:23:28 +0100 (CET)
Received: from imp7-g19.free.fr (imp7-g19.free.fr [212.27.42.38])
	by smtp8-g19.free.fr (Postfix) with ESMTP id 9465C17F59C
	for <freebsd-fs@freebsd.org>; Tue,  4 Dec 2007 18:23:27 +0100 (CET)
Received: by imp7-g19.free.fr (Postfix, from userid 33)
	id 07CA03F2E; Tue,  4 Dec 2007 18:15:55 +0100 (CET)
Received: from 194.3.231.254 ([194.3.231.254]) by imp.free.fr (IMP) with HTTP 
	for <julien.bellang@212.27.42.70>; Tue, 04 Dec 2007 18:15:55 +0100
Message-ID: <1196788555.47558b4bab0ab@imp.free.fr>
Date: Tue, 04 Dec 2007 18:15:55 +0100
From: julien.bellang@free.fr
To: freebsd-fs@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
User-Agent: Internet Messaging Program (IMP) 3.2.5
X-Originating-IP: 194.3.231.254
Subject: FSCK failed does'nt corrected file size when INCORRECT BLOCK COUNT
	is found
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Dec 2007 17:23:29 -0000


Hi,

I'm working on a system installed in an environnement where power is cut off
many time a week. This system is based on i386 FreeBSD 6.2 OS.

I'm using FS UFS2 with SoftUpdate Activated.

After such power shutdown, when I restart I've got some corrupted files that
FSCK_UFS doesn't entirely resolve.

For these files FSCK resolves the following error :
 /dev/ad0s1f: INCORRECT BLOCK COUNT I=3132417 (512992 should be 459392)
(CORRECTED)

But actually these file still inconsistency in my point of view as the file size
field doesn't reflect the number of block reference in its inode.

Regards to fsck_ffs sources, It seems that FSCK checks the validity of block
pointer (!= 0) in the inode block list only for directory inode but not for
regular file.
In my case, as the number of block adress to check in the inode is deduced from
the file size, and the file size is greater than the number of really allocated
blocks I obtain many NULL block pointer.

Does anyone have an idea why the NULL pointer are accepted by FSCK for regular
file and it doesn't try to adjust the file size ?

file fsck_ffs/inode.c, iblocks(), line 208

line 153: static int
     154: iblock(struct inodesc *idesc, long ilevel, off_t isize)
{
      .
      .
      .
197: if (IBLK(bp, i)) {
                        idesc->id_blkno = IBLK(bp, i);
                        if (ilevel == 0)
                                n = (*func)(idesc);
                        else
                                n = iblock(idesc, ilevel, isize);
                        if (n & STOP) {
                                bp->b_flags &= ~B_INUSE;
                                return (n);
                        }
                } else {
208:                        if (idesc->id_type == DATA && isize > 0) {
                                /* An empty block in a directory XXX */
                                getpathname(pathbuf, idesc->id_number,
                                                idesc->id_number);
                                pfatal("DIRECTORY %s: CONTAINS EMPTY BLOCKS",
                                        pathbuf);
                                if (reply("ADJUST LENGTH") == 1) {
                                        dp = ginode(idesc->id_number);
                                        DIP_SET(dp, di_size,
                                            DIP(dp, di_size) - isize);
                                        isize = 0;
                                        printf(
                                            "YOU MUST RERUN FSCK AFTERWARDS\n");
                                        rerun = 1;
                                        inodirty();
                                        bp->b_flags &= ~B_INUSE;
                                        return(STOP);
                                }
                        }
                }
      .
      .
      .
}

Thanks,

Julien


From owner-freebsd-fs@FreeBSD.ORG  Thu Dec  6 15:09:47 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E39BF16A41A
	for <freebsd-fs@freebsd.org>; Thu,  6 Dec 2007 15:09:47 +0000 (UTC)
	(envelope-from julien.bellang@free.fr)
Received: from smtp8-g19.free.fr (smtp8-g19.free.fr [212.27.42.65])
	by mx1.freebsd.org (Postfix) with ESMTP id 7F6A113C469
	for <freebsd-fs@freebsd.org>; Thu,  6 Dec 2007 15:09:45 +0000 (UTC)
	(envelope-from julien.bellang@free.fr)
Received: from smtp8-g19.free.fr (localhost [127.0.0.1])
	by smtp8-g19.free.fr (Postfix) with ESMTP id 7646A17F6C2
	for <freebsd-fs@freebsd.org>; Thu,  6 Dec 2007 16:09:44 +0100 (CET)
Received: from imp7-g19.free.fr (imp7-g19.free.fr [212.27.42.38])
	by smtp8-g19.free.fr (Postfix) with ESMTP id 689F717F6C1
	for <freebsd-fs@freebsd.org>; Thu,  6 Dec 2007 16:09:44 +0100 (CET)
Received: by imp7-g19.free.fr (Postfix, from userid 33)
	id 780353FD0; Thu,  6 Dec 2007 16:01:50 +0100 (CET)
Received: from 194.3.231.254 ([194.3.231.254]) by imp.free.fr (IMP) with HTTP 
	for <julien.bellang@212.27.42.70>; Thu, 06 Dec 2007 16:01:50 +0100
Message-ID: <1196953310.47580ede28676@imp.free.fr>
Date: Thu, 06 Dec 2007 16:01:50 +0100
From: julien.bellang@free.fr
To: freebsd-fs@freebsd.org
References: <1196788555.47558b4bab0ab@imp.free.fr>
In-Reply-To: <1196788555.47558b4bab0ab@imp.free.fr>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
User-Agent: Internet Messaging Program (IMP) 3.2.5
X-Originating-IP: 194.3.231.254
Subject: Re: FSCK doesn't correct file size when INCORRECT BLOCK COUNT is
	found
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Dec 2007 15:09:48 -0000


Hi,

Finally I understood why FSCK_UFS failed to correct corrupted files which had
been in write progress whereas the power was shut down.

1) First some information about the file corrupted :
In my case the Files System has the following characteristics
- the write cache is activated on the hard drive
- the SoftUpdate option is activated
- the FS is mount with the default option noasync

In this condition, when the power is cut off as a file is being written, at
restart the file is corrupted in the following way :
- in the inode metadata : the FILESIZE and the BLOCKCOUNT field are
corresponding to the final value waited for the file
- in the inode the list of BLOCKS is not up to date (seem pretty normal as the
written process was not achieved), and the list is made of holes (EMPTY BLOCK,
null block reference) that are not necessarily at the end of the list.


2) How FSCK treats these files :

When computing on such a file, when FSCK finds a hole in a regular file's inode
Block list, it skips it and doesn't increment the block count.
BUT if the inode isn't associated to a directory, FSCK DOESN'T consider this
hole as a default. Indeed, I think it may be possible that an application
voluntary creates such a file.

However after having checking the inode block list, FSCK's function checkinode()
finds that the Block count calculated doesn't correspond to the inode BLOCKCOUNT
field and only proposes to correct this field and doesn't correct the SIZE
field.

The problem for the end user, is that as the file seems to have the right Size,
he's not able to know that the write process was not actually ended normally
(I'm exactly in this situation) and thus that he will use a corrupted file.


3) A proposed solution :

I'm working on a workaround in FSCK (and it seems to work fine in my case) that
truncates the file with hole as soon as the first hole is discovered, and
modifies the inode SIZE and BLOCKCOUNT field in consequence.

However I have in mind that such a patch may be a problem for File that were
voluntary created with hole.

Maybe the solution is to pass a new option to FSCK, or only truncate the File if
the BLOCKCOUNT is inconsistent.


Is anyone interesting by this work and could react to this analyse ?

Especially, I'm interesting to know in which known cases applications or system
may be able to generate regular File containing hole.


Thanks,

Julien Bellanger


______________________________________________________


Selon julien.bellang@free.fr:

>
> Hi,
>
> I'm working on a system installed in an environnement where power is cut off
> many time a week. This system is based on i386 FreeBSD 6.2 OS.
>
> I'm using FS UFS2 with SoftUpdate Activated.
>
> After such power shutdown, when I restart I've got some corrupted files that
> FSCK_UFS doesn't entirely resolve.
>
> For these files FSCK resolves the following error :
>  /dev/ad0s1f: INCORRECT BLOCK COUNT I=3132417 (512992 should be 459392)
> (CORRECTED)
>
> But actually these file still inconsistency in my point of view as the file
> size
> field doesn't reflect the number of block reference in its inode.
>
> Regards to fsck_ffs sources, It seems that FSCK checks the validity of block
> pointer (!= 0) in the inode block list only for directory inode but not for
> regular file.
> In my case, as the number of block adress to check in the inode is deduced
> from
> the file size, and the file size is greater than the number of really
> allocated
> blocks I obtain many NULL block pointer.
>
> Does anyone have an idea why the NULL pointer are accepted by FSCK for
> regular
> file and it doesn't try to adjust the file size ?
>
> file fsck_ffs/inode.c, iblocks(), line 208
>
> line 153: static int
>      154: iblock(struct inodesc *idesc, long ilevel, off_t isize)
> {
>       .
>       .
>       .
> 197: if (IBLK(bp, i)) {
>                         idesc->id_blkno = IBLK(bp, i);
>                         if (ilevel == 0)
>                                 n = (*func)(idesc);
>                         else
>                                 n = iblock(idesc, ilevel, isize);
>                         if (n & STOP) {
>                                 bp->b_flags &= ~B_INUSE;
>                                 return (n);
>                         }
>                 } else {
> 208:                        if (idesc->id_type == DATA && isize > 0) {
>                                 /* An empty block in a directory XXX */
>                                 getpathname(pathbuf, idesc->id_number,
>                                                 idesc->id_number);
>                                 pfatal("DIRECTORY %s: CONTAINS EMPTY BLOCKS",
>                                         pathbuf);
>                                 if (reply("ADJUST LENGTH") == 1) {
>                                         dp = ginode(idesc->id_number);
>                                         DIP_SET(dp, di_size,
>                                             DIP(dp, di_size) - isize);
>                                         isize = 0;
>                                         printf(
>                                             "YOU MUST RERUN FSCK
> AFTERWARDS\n");
>                                         rerun = 1;
>                                         inodirty();
>                                         bp->b_flags &= ~B_INUSE;
>                                         return(STOP);
>                                 }
>                         }
>                 }
>       .
>       .
>       .
> }
>
> Thanks,
>
> Julien
>
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>


From owner-freebsd-fs@FreeBSD.ORG  Thu Dec  6 17:29:55 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DB9A016A417
	for <freebsd-fs@freebsd.org>; Thu,  6 Dec 2007 17:29:55 +0000 (UTC)
	(envelope-from bg@sics.se)
Received: from letter.sics.se (letter.sics.se [193.10.64.6])
	by mx1.freebsd.org (Postfix) with ESMTP id A338513C447
	for <freebsd-fs@freebsd.org>; Thu,  6 Dec 2007 17:29:55 +0000 (UTC)
	(envelope-from bg@sics.se)
Received: from sics.se (ibook.sics.se [193.10.66.104])
	by letter.sics.se (Postfix) with ESMTP id 0040A4022B;
	Thu,  6 Dec 2007 17:56:34 +0100 (CET)
Date: Thu, 6 Dec 2007 17:56:08 +0100
From: Bjorn Gronvall <bg@sics.se>
To: julien.bellang@free.fr
Message-ID: <20071206175608.594685d9@ibook.sics.se>
In-Reply-To: <1196953310.47580ede28676@imp.free.fr>
References: <1196788555.47558b4bab0ab@imp.free.fr>
	<1196953310.47580ede28676@imp.free.fr>
Organization: SICS.SE
X-Mailer: Claws Mail 2.9.1 (GTK+ 2.10.6; i386-portbld-freebsd6.2)
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Cc: freebsd-fs@freebsd.org
Subject: Re: FSCK doesn't correct file size when INCORRECT BLOCK COUNT is
 found
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Dec 2007 17:29:55 -0000

On Thu, 06 Dec 2007 16:01:50 +0100
julien.bellang@free.fr wrote:

Hi Julien,

> 1) First some information about the file corrupted :
> In my case the Files System has the following characteristics
> - the write cache is activated on the hard drive
> - the SoftUpdate option is activated
> - the FS is mount with the default option noasync

Filesystems in general and UFS with soft updates in particular rely on
disks providing accurate response to writes. When write caching is
enabled the disk will lie and tell the operating system that the write
has completed successfully, in reality the data is only cached in disk
RAM. When the power disappears the data will be gone forever.
 
In order to avoid this problem you can turn off write caching, this
way the software knows if the write completed successfully or
not. Alternatively you may power your disks from batteries, multiple
power supplies with UPS:es or come up with some other hardware
solution.

Cheers,
/b

-- 
  _     _                                           ,_______________.
Bjorn Gronvall (Bj�rn Gr�nvall)                    /_______________/|
Swedish Institute of Computer Science              |               ||
PO Box 1263, S-164 29 Kista, Sweden                | Schroedingers ||
Email: bg@sics.se, Phone +46 -8 633 15 25          |      Cat      |/
Cellular +46 -70 768 06 35, Fax +46 -8 751 72 30   '---------------'

From owner-freebsd-fs@FreeBSD.ORG  Thu Dec  6 21:40:44 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1D64216A418
	for <fs@freebsd.org>; Thu,  6 Dec 2007 21:40:44 +0000 (UTC)
	(envelope-from joe@tao.org.uk)
Received: from mailhost.tao.org.uk (tao.uscs.susx.ac.uk [139.184.131.101])
	by mx1.freebsd.org (Postfix) with ESMTP id DDFA313C461
	for <fs@freebsd.org>; Thu,  6 Dec 2007 21:40:43 +0000 (UTC)
	(envelope-from joe@tao.org.uk)
Received: by mailhost.tao.org.uk (Postfix, from userid 1000)
	id 85DD575A7; Thu,  6 Dec 2007 21:08:48 +0000 (GMT)
Date: Thu, 6 Dec 2007 21:08:48 +0000
From: Josef Karthauser <joe@tao.org.uk>
To: fs@freebsd.org
Message-ID: <20071206210848.GA63825@transwarp.tao.org.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.13 (2006-08-11)
X-taoresearch-MailScanner-Information: Please contact Tao Research for more
	information
X-taoresearch-MailScanner: Found to be clean
X-MailScanner-From: joe@tao.org.uk
Cc: 
Subject: -14% available on /tmp
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Dec 2007 21:40:44 -0000

One of my servers is reporting:

# df | grep tmp
/dev/mirror/boot0e    507630    -64328  531348   -14%    /tmp

How weird is that?  I wonder what is going on.
The kernel is dated:

6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #68: Mon Oct  2 14:36:13 BST 2006

Joe

From owner-freebsd-fs@FreeBSD.ORG  Thu Dec  6 21:53:54 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4594316A417
	for <fs@freebsd.org>; Thu,  6 Dec 2007 21:53:54 +0000 (UTC)
	(envelope-from gpalmer@freebsd.org)
Received: from noop.in-addr.com (unknown [IPv6:2001:5c0:8fff:fffe::214d])
	by mx1.freebsd.org (Postfix) with ESMTP id 0EF2F13C461
	for <fs@freebsd.org>; Thu,  6 Dec 2007 21:53:54 +0000 (UTC)
	(envelope-from gpalmer@freebsd.org)
Received: from gjp by noop.in-addr.com with local (Exim 4.54 (FreeBSD))
	id 1J0OfI-0004Ja-H7; Thu, 06 Dec 2007 16:53:52 -0500
Date: Thu, 6 Dec 2007 16:53:52 -0500
From: Gary Palmer <gpalmer@freebsd.org>
To: Josef Karthauser <joe@tao.org.uk>
Message-ID: <20071206215352.GA986@in-addr.com>
References: <20071206210848.GA63825@transwarp.tao.org.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20071206210848.GA63825@transwarp.tao.org.uk>
Cc: fs@freebsd.org
Subject: Re: -14% available on /tmp
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Dec 2007 21:53:54 -0000

On Thu, Dec 06, 2007 at 09:08:48PM +0000, Josef Karthauser wrote:
> One of my servers is reporting:
> 
> # df | grep tmp
> /dev/mirror/boot0e    507630    -64328  531348   -14%    /tmp
> 
> How weird is that?  I wonder what is going on.
> The kernel is dated:
> 
> 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #68: Mon Oct  2 14:36:13 BST 2006

http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/disks.html#DISK-MORE-THAN-FULL

Not sure why its -14% rather than the more normal -8%, but I suspect thats
whats happened.

Gary

From owner-freebsd-fs@FreeBSD.ORG  Thu Dec  6 22:10:40 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 797D716A417;
	Thu,  6 Dec 2007 22:10:40 +0000 (UTC)
	(envelope-from brooks@lor.one-eyed-alien.net)
Received: from lor.one-eyed-alien.net (cl-162.ewr-01.us.sixxs.net
	[IPv6:2001:4830:1200:a1::2])
	by mx1.freebsd.org (Postfix) with ESMTP id D3AD813C43E;
	Thu,  6 Dec 2007 22:10:39 +0000 (UTC)
	(envelope-from brooks@lor.one-eyed-alien.net)
Received: from lor.one-eyed-alien.net (localhost [127.0.0.1])
	by lor.one-eyed-alien.net (8.14.1/8.13.8) with ESMTP id lB6MAcjw070704; 
	Thu, 6 Dec 2007 16:10:38 -0600 (CST)
	(envelope-from brooks@lor.one-eyed-alien.net)
Received: (from brooks@localhost)
	by lor.one-eyed-alien.net (8.14.1/8.13.8/Submit) id lB6MAcEK070703;
	Thu, 6 Dec 2007 16:10:38 -0600 (CST) (envelope-from brooks)
Date: Thu, 6 Dec 2007 16:10:38 -0600
From: Brooks Davis <brooks@freebsd.org>
To: Gary Palmer <gpalmer@freebsd.org>
Message-ID: <20071206221038.GA70675@lor.one-eyed-alien.net>
References: <20071206210848.GA63825@transwarp.tao.org.uk>
	<20071206215352.GA986@in-addr.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="bg08WKrSYDhXBjb5"
Content-Disposition: inline
In-Reply-To: <20071206215352.GA986@in-addr.com>
User-Agent: Mutt/1.5.16 (2007-06-09)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0
	(lor.one-eyed-alien.net [127.0.0.1]);
	Thu, 06 Dec 2007 16:10:38 -0600 (CST)
Cc: Josef Karthauser <joe@tao.org.uk>, fs@freebsd.org
Subject: Re: -14% available on /tmp
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Dec 2007 22:10:40 -0000


--bg08WKrSYDhXBjb5
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Dec 06, 2007 at 04:53:52PM -0500, Gary Palmer wrote:
> On Thu, Dec 06, 2007 at 09:08:48PM +0000, Josef Karthauser wrote:
> > One of my servers is reporting:
> >=20
> > # df | grep tmp
> > /dev/mirror/boot0e    507630    -64328  531348   -14%    /tmp
> >=20
> > How weird is that?  I wonder what is going on.
> > The kernel is dated:
> >=20
> > 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #68: Mon Oct  2 14:36:13 BST 2006
>=20
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/disks.html#DISK-MORE=
-THAN-FULL
>=20
> Not sure why its -14% rather than the more normal -8%, but I suspect thats
> whats happened.

I've also seen the occasional corrupted fs where the counts were seriously =
out
of whack.  A fsck (or since it's just /tmp a newfs might be in order.

-- Brooks

--bg08WKrSYDhXBjb5
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQFHWHNdXY6L6fI4GtQRAi67AJ0ZKpil+01oxGJScEujXafmUNRmswCfexDE
2EIgi5Imo6xxpNpyyWdNyp8=
=fKBU
-----END PGP SIGNATURE-----

--bg08WKrSYDhXBjb5--

From owner-freebsd-fs@FreeBSD.ORG  Thu Dec  6 22:25:41 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D252A16A41A
	for <fs@freebsd.org>; Thu,  6 Dec 2007 22:25:41 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from raven.bwct.de (raven.bwct.de [85.159.14.73])
	by mx1.freebsd.org (Postfix) with ESMTP id 6456713C46A
	for <fs@freebsd.org>; Thu,  6 Dec 2007 22:25:41 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely5.cicely.de ([10.1.1.7])
	by raven.bwct.de (8.13.4/8.13.4) with ESMTP id lB6M84Xm046882;
	Thu, 6 Dec 2007 23:08:04 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14])
	by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id lB6M7utG024781
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 6 Dec 2007 23:07:57 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (localhost [127.0.0.1])
	by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id lB6M7ua7018406;
	Thu, 6 Dec 2007 23:07:56 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: (from ticso@localhost)
	by cicely12.cicely.de (8.13.4/8.13.3/Submit) id lB6M7uu4018405;
	Thu, 6 Dec 2007 23:07:56 +0100 (CET) (envelope-from ticso)
Date: Thu, 6 Dec 2007 23:07:56 +0100
From: Bernd Walter <ticso@cicely12.cicely.de>
To: Josef Karthauser <joe@tao.org.uk>
Message-ID: <20071206220755.GH10459@cicely12.cicely.de>
References: <20071206210848.GA63825@transwarp.tao.org.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20071206210848.GA63825@transwarp.tao.org.uk>
X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha
User-Agent: Mutt/1.5.9i
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=0.026,
	BAYES_00=-2.599 autolearn=ham version=3.1.7
X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on cicely12.cicely.de
Cc: fs@freebsd.org
Subject: Re: -14% available on /tmp
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ticso@cicely.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Dec 2007 22:25:41 -0000

On Thu, Dec 06, 2007 at 09:08:48PM +0000, Josef Karthauser wrote:
> One of my servers is reporting:
> 
> # df | grep tmp
> /dev/mirror/boot0e    507630    -64328  531348   -14%    /tmp
> 
> How weird is that?  I wonder what is going on.

Have seen this a few times as well.
It wasn't corrected automatically for whatever reason, but a manual
fsck always fixed it.
I'm not sure if it happend because of a crash or under normal load,
because I always noticed it after some kind of unintended reboot.
IIRC it was always a power failure, but I'm not 100% sure.

-- 
B.Walter                http://www.bwct.de      http://www.fizon.de
bernd@bwct.de           info@bwct.de            support@fizon.de

From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 06:43:45 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3F0F116A41A
	for <freebsd-fs@FreeBSD.org>; Fri,  7 Dec 2007 06:43:45 +0000 (UTC)
	(envelope-from truckman@FreeBSD.org)
Received: from gw.catspoiler.org (adsl-75-1-14-242.dsl.scrm01.sbcglobal.net
	[75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 01F9613C455
	for <freebsd-fs@FreeBSD.org>; Fri,  7 Dec 2007 06:43:44 +0000 (UTC)
	(envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
	by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id lB76hZYm063956;
	Thu, 6 Dec 2007 22:43:39 -0800 (PST)
	(envelope-from truckman@FreeBSD.org)
Message-Id: <200712070643.lB76hZYm063956@gw.catspoiler.org>
Date: Thu, 6 Dec 2007 22:43:35 -0800 (PST)
From: Don Lewis <truckman@FreeBSD.org>
To: julien.bellang@free.fr
In-Reply-To: <1196788551.47558b47ee317@imp.free.fr>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
Cc: freebsd-fs@FreeBSD.org
Subject: Re: FSCK failed does'nt correct file size when INCORRECT BLOCK
 COUNT is	found
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 06:43:45 -0000

On  4 Dec, julien.bellang@free.fr wrote:
> 
> Hi,
> 
> I'm working on a system installed in an environnement where power is cut off
> many time a week. This system is based on i386 FreeBSD 6.2 OS.
> 
> I'm using FS UFS2 with SoftUpdate Activated.

If your disks have write-cacheing enabled, you are likely to encounter
file system corruption caused by the loss of power that can't be fixed
automatically by fsck, which will require manual intervention.  The
reason is that soft updates attempts to write data to the disk in an
order that guarantees that the file system is always in a consistent
state so that it can always be properly cleaned up after a crash.  This
strategy is defeated by the write caching by the disk, which causes the
disk to immediately tell soft updates that data has been written, even
if the data is only saved to the disk's write cache.  This may allow
soft updates to write another set of data to disk that should not
actually be written before the previous set of data.  If the disk then
writes the second set of data to the media before the first set of data,
and a power failure occurs before the disk has written the first set of
data, the file system is then corrupted.

You can turn off write caching by putting the following into
/boot/loader.conf:
	hw.ata.wc=0
though it will greatly decrease your system's disk write performance.

Powering the system using an UPS that can initiate a clean system
shutdown on power failure may be a better option.

> After such power shutdown, when I restart I've got some corrupted files that
> FSCK_UFS doesn't entirely resolve.
> 
> For these files FSCK resolves the following error :
>  /dev/ad0s1f: INCORRECT BLOCK COUNT I=3132417 (512992 should be 459392)
> (CORRECTED)
> 
> But actually these file still inconsistency in my point of view as the file size
> field doesn't reflect the number of block reference in its inode.
> 
> Regards to fsck_ffs sources, It seems that FSCK checks the validity of block
> pointer (!= 0) in the inode block list only for directory inode but not for
> regular file.
> In my case, as the number of block adress to check in the inode is deduced from
> the file size, and the file size is greater than the number of really allocated
> blocks I obtain many NULL block pointer.
> 
> Does anyone have an idea why the NULL pointer are accepted by FSCK for regular
> file and it doesn't try to adjust the file size ?

Regular files are allowed to be sparse (have holes where no data is
stored and no blocks are allocated).  This is indicated by NULL block
pointers for the file offsets that correspond to the holes.

Sparse files are easy to create:

% dd if=/dev/zero of=/tmp/sparsefile bs=512 oseek=1000000 count=1
1+0 records in
1+0 records out
512 bytes transferred in 0.000132 secs (3876324 bytes/sec)
% ls -ls /tmp/sparsefile
 64 -rw-r--r--  1 dl  wheel  512000512 Dec  6 22:26 /tmp/sparsefile


From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 09:49:51 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6EA2016A417
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 09:49:51 +0000 (UTC)
	(envelope-from antik@bsd.ee)
Received: from zzz.ee (kalah.zzz.ee [194.204.30.253])
	by mx1.freebsd.org (Postfix) with ESMTP id 2ACEE13C465
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 09:49:51 +0000 (UTC)
	(envelope-from antik@bsd.ee)
Received: by zzz.ee (Postfix, from userid 3019)
	id EB31219867F; Fri,  7 Dec 2007 11:31:40 +0200 (EET)
X-Spam-Checker-Version: SpamAssassin on spamassassin.zzz.ee
X-Spam-Level: 
X-Spam-Guessed-Language: 
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,BAYES_50
X-Spam-Checker-URL: http://info.zzz.ee
Received: from andrei.demo (adsl215.uninet.ee [194.204.62.215])
	by zzz.ee (Postfix) with ESMTP id D0139198685
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 11:31:32 +0200 (EET)
From: Andrei Kolu <antik@bsd.ee>
To: freebsd-fs@freebsd.org
Date: Fri, 7 Dec 2007 11:31:31 +0200
User-Agent: KMail/1.9.7
MIME-Version: 1.0
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200712071131.31773.antik@bsd.ee>
Subject: raidtest: Cannot open 'raidtest.data' device: Operation not
	permitted
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 09:49:51 -0000

# uname -a
FreeBSD test.demo 7.0-BETA4 FreeBSD 7.0-BETA4 #0: Sun Dec  2 16:34:41 UTC 2007     
root@myers.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

3ware device driver for 9000 series storage controllers, version: 3.70.05.001
twa0: <3ware 9000 series Storage Controller> port 0x3000-0x30ff mem 
0xd8000000-0xd9ffffff,0xda300000-0xda300fff irq 16 at device 0.0 on pci7
twa0: [ITHREAD]
twa0: INFO: (0x04: 0x0053): Battery capacity test is overdue:
twa0: INFO: (0x15: 0x1300): Controller details:: Model 9650SE-8LPML, 8 ports, 
Firmware FE9X 3.08.02.007, BIOS BE9X 3.08.00.002

raidtest-1.1                        =   up-to-date with port

# set mediasize=`diskinfo /dev/da0 | awk '{print $3}'`
# set sectorsize=`diskinfo /dev/da0 | awk '{print $2}'`
# raidtest genfile -s $mediasize -S $sectorsize -n 50000
# raidtest test -d /dev/da0 -n 10
raidtest: Cannot open 'raidtest.data' device: Operation not permitted

# echo $mediasize
1919932170240
# echo $sectorsize
512


Or anyone can recommend other raid performance testing utility?


Andrei

From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 11:34:41 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1635B16A417
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 11:34:41 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id C3B9013C43E
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 11:34:40 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1J0bTV-0005eL-4g
	for freebsd-fs@freebsd.org; Fri, 07 Dec 2007 11:34:33 +0000
Received: from lara.cc.fer.hr ([161.53.72.113])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 07 Dec 2007 11:34:33 +0000
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 07 Dec 2007 11:34:33 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Fri, 07 Dec 2007 12:39:56 +0100
Lines: 40
Message-ID: <fjbb3v$n60$1@ger.gmane.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enig417C87F845D917AA452969FD"
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Thunderbird 2.0.0.6 (X11/20070801)
X-Enigmail-Version: 0.95.3
Sender: news <news@ger.gmane.org>
Subject: readv: parallel or sequential?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 11:34:41 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig417C87F845D917AA452969FD
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi,

I found this in readv(2):

     For readv() and preadv(), the iovec structure is defined as:

           struct iovec {
                   void   *iov_base;  /* Base address. */
                   size_t iov_len;    /* Length. */
           };

     Each iovec entry specifies the base address and length of an area
in mem-
     ory where data should be placed.  The readv() system call will alway=
s
     fill an area completely before proceeding to the next.

Does this mean that, in effect, readv() is just a loop of read() calls
(minus syscall overhead)?


--------------enig417C87F845D917AA452969FD
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)

iD8DBQFHWTETldnAQVacBcgRAhuZAKDyooMshocd2zxSBR9RKgz8kMrQcgCdFAon
EGLoGWs7t9y9eUKAoojvAjw=
=vs5U
-----END PGP SIGNATURE-----

--------------enig417C87F845D917AA452969FD--


From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 12:02:12 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ADCD116A417
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 12:02:12 +0000 (UTC)
	(envelope-from phk@critter.freebsd.dk)
Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222])
	by mx1.freebsd.org (Postfix) with ESMTP id 6ED4F13C458
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 12:02:12 +0000 (UTC)
	(envelope-from phk@critter.freebsd.dk)
Received: from critter.freebsd.dk (unknown [192.168.61.3])
	by phk.freebsd.dk (Postfix) with ESMTP id 3DDAB17105;
	Fri,  7 Dec 2007 11:37:32 +0000 (UTC)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.14.1/8.14.1) with ESMTP id lB7BbVHt004110;
	Fri, 7 Dec 2007 11:37:31 GMT (envelope-from phk@critter.freebsd.dk)
To: Ivan Voras <ivoras@freebsd.org>
From: "Poul-Henning Kamp" <phk@phk.freebsd.dk>
In-Reply-To: Your message of "Fri, 07 Dec 2007 12:39:56 +0100."
	<fjbb3v$n60$1@ger.gmane.org> 
Date: Fri, 07 Dec 2007 11:37:31 +0000
Message-ID: <4109.1197027451@critter.freebsd.dk>
Sender: phk@critter.freebsd.dk
Cc: freebsd-fs@freebsd.org
Subject: Re: readv: parallel or sequential? 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 12:02:12 -0000

In message <fjbb3v$n60$1@ger.gmane.org>, Ivan Voras writes:


>Does this mean that, in effect, readv() is just a loop of read() calls
>(minus syscall overhead)?

It's more correct to say that read() is just a readv() with a single
iovec.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 12:12:39 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 10EE416A420
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 12:12:39 +0000 (UTC)
	(envelope-from dudu@dudu.ro)
Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.235])
	by mx1.freebsd.org (Postfix) with ESMTP id CE2C413C4CC
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 12:12:38 +0000 (UTC)
	(envelope-from dudu@dudu.ro)
Received: by nz-out-0506.google.com with SMTP id l8so190977nzf
	for <freebsd-fs@freebsd.org>; Fri, 07 Dec 2007 04:12:38 -0800 (PST)
Received: by 10.142.214.5 with SMTP id m5mr2078467wfg.1197028045435;
	Fri, 07 Dec 2007 03:47:25 -0800 (PST)
Received: by 10.143.12.4 with HTTP; Fri, 7 Dec 2007 03:47:25 -0800 (PST)
Message-ID: <ad79ad6b0712070347s4a5d5bb2rc7adfdc54b107dac@mail.gmail.com>
Date: Fri, 7 Dec 2007 13:47:25 +0200
From: "Vlad GALU" <dudu@dudu.ro>
To: "Ivan Voras" <ivoras@freebsd.org>
In-Reply-To: <fjbb3v$n60$1@ger.gmane.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <fjbb3v$n60$1@ger.gmane.org>
Cc: freebsd-fs@freebsd.org
Subject: Re: readv: parallel or sequential?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 12:12:39 -0000

On 12/7/07, Ivan Voras <ivoras@freebsd.org> wrote:
> Hi,
>
> I found this in readv(2):
>
>      For readv() and preadv(), the iovec structure is defined as:
>
>            struct iovec {
>                    void   *iov_base;  /* Base address. */
>                    size_t iov_len;    /* Length. */
>            };
>
>      Each iovec entry specifies the base address and length of an area
> in mem-
>      ory where data should be placed.  The readv() system call will always
>      fill an area completely before proceeding to the next.
>
> Does this mean that, in effect, readv() is just a loop of read() calls
> (minus syscall overhead)?
>

   read() is just a particular case of readv() (with only one iovec
struct, plus the full buffer size), they both call kern_readv(), so
the effect is the same. I assume the manpage means that the iovec
structures are filled sequentially rather than in parallel.

>
>


-- 
Mahnahmahnah!

From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 12:48:22 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 968BA16A420
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 12:48:22 +0000 (UTC)
	(envelope-from des@des.no)
Received: from tim.des.no (tim.des.no [194.63.250.121])
	by mx1.freebsd.org (Postfix) with ESMTP id 435D513C442
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 12:48:22 +0000 (UTC)
	(envelope-from des@des.no)
Received: from tim.des.no (localhost [127.0.0.1])
	by spam.des.no (Postfix) with ESMTP id D406A2085;
	Fri,  7 Dec 2007 13:48:12 +0100 (CET)
X-Spam-Tests: AWL
X-Spam-Learn: disabled
X-Spam-Score: -0.1/3.0
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on tim.des.no
Received: from ds4.des.no (des.no [80.203.243.180])
	by smtp.des.no (Postfix) with ESMTP id 585322082;
	Fri,  7 Dec 2007 13:48:12 +0100 (CET)
Received: by ds4.des.no (Postfix, from userid 1001)
	id 4114584499; Fri,  7 Dec 2007 13:48:12 +0100 (CET)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: Bjorn Gronvall <bg@sics.se>
References: <1196788555.47558b4bab0ab@imp.free.fr>
	<1196953310.47580ede28676@imp.free.fr>
	<20071206175608.594685d9@ibook.sics.se>
Date: Fri, 07 Dec 2007 13:48:12 +0100
In-Reply-To: <20071206175608.594685d9@ibook.sics.se> (Bjorn Gronvall's message
	of "Thu\, 6 Dec 2007 17\:56\:08 +0100")
Message-ID: <86hciuu0vn.fsf@ds4.des.no>
User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: FSCK doesn't correct file size when INCORRECT BLOCK COUNT is
	found
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 12:48:22 -0000

Bjorn Gronvall <bg@sics.se> writes:
> Filesystems in general and UFS with soft updates in particular rely on
> disks providing accurate response to writes. When write caching is
> enabled the disk will lie and tell the operating system that the write
> has completed successfully, in reality the data is only cached in disk
> RAM. When the power disappears the data will be gone forever.

No.  This used to be the case with some cheaper disks which ignored the
ATA "flush cache" command to score higher on benchmarks, but I doubt
you'll find any disks on the market that still do that (at least from
reputable manufacturers).  ZFS makes extensive use of the "flush cache"
command to ensure file system integrity (and in particular to ensure
that the intent log is written to disk so it can be replayed in case of
a crash).

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 12:49:02 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5ABD416A46B
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 12:49:02 +0000 (UTC)
	(envelope-from des@des.no)
Received: from tim.des.no (tim.des.no [194.63.250.121])
	by mx1.freebsd.org (Postfix) with ESMTP id 1988713C4FB
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 12:49:01 +0000 (UTC)
	(envelope-from des@des.no)
Received: from tim.des.no (localhost [127.0.0.1])
	by spam.des.no (Postfix) with ESMTP id D34F22084;
	Fri,  7 Dec 2007 13:48:53 +0100 (CET)
X-Spam-Tests: AWL
X-Spam-Learn: disabled
X-Spam-Score: -0.1/3.0
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on tim.des.no
Received: from ds4.des.no (des.no [80.203.243.180])
	by smtp.des.no (Postfix) with ESMTP id C430E2082;
	Fri,  7 Dec 2007 13:48:53 +0100 (CET)
Received: by ds4.des.no (Postfix, from userid 1001)
	id AE22384499; Fri,  7 Dec 2007 13:48:53 +0100 (CET)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: julien.bellang@free.fr
References: <1196788551.47558b47ee317@imp.free.fr>
Date: Fri, 07 Dec 2007 13:48:53 +0100
In-Reply-To: <1196788551.47558b47ee317@imp.free.fr> (julien bellang's message
	of "Tue\, 04 Dec 2007 18\:15\:51 +0100")
Message-ID: <86d4tiu0ui.fsf@ds4.des.no>
User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: FSCK failed does'nt correct file size when INCORRECT BLOCK
	COUNT is found
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 12:49:02 -0000

julien.bellang@free.fr writes:
> I'm working on a system installed in an environnement where power is cut =
off
> many time a week. This system is based on i386 FreeBSD 6.2 OS.

Get a UPS.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 12:54:09 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 713FA16A591
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 12:54:09 +0000 (UTC)
	(envelope-from des@des.no)
Received: from tim.des.no (tim.des.no [194.63.250.121])
	by mx1.freebsd.org (Postfix) with ESMTP id 26F1D13C4D1
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 12:54:08 +0000 (UTC)
	(envelope-from des@des.no)
Received: from tim.des.no (localhost [127.0.0.1])
	by spam.des.no (Postfix) with ESMTP id 66A52207F;
	Fri,  7 Dec 2007 13:54:00 +0100 (CET)
X-Spam-Tests: AWL
X-Spam-Learn: disabled
X-Spam-Score: -0.1/3.0
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on tim.des.no
Received: from ds4.des.no (des.no [80.203.243.180])
	by smtp.des.no (Postfix) with ESMTP id 58755207E;
	Fri,  7 Dec 2007 13:54:00 +0100 (CET)
Received: by ds4.des.no (Postfix, from userid 1001)
	id 40AEF844A7; Fri,  7 Dec 2007 13:54:00 +0100 (CET)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: "Vlad GALU" <dudu@dudu.ro>
References: <fjbb3v$n60$1@ger.gmane.org>
	<ad79ad6b0712070347s4a5d5bb2rc7adfdc54b107dac@mail.gmail.com>
Date: Fri, 07 Dec 2007 13:54:00 +0100
In-Reply-To: <ad79ad6b0712070347s4a5d5bb2rc7adfdc54b107dac@mail.gmail.com>
	(Vlad GALU's message of "Fri\, 7 Dec 2007 13\:47\:25 +0200")
Message-ID: <868x46u0lz.fsf@ds4.des.no>
User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org, Ivan Voras <ivoras@freebsd.org>
Subject: Re: readv: parallel or sequential?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 12:54:09 -0000

"Vlad GALU" <dudu@dudu.ro> writes:
> Ivan Voras <ivoras@freebsd.org> writes:
> > Does this mean that, in effect, readv() is just a loop of read()
> > calls (minus syscall overhead)?
> read() is just a particular case of readv() (with only one iovec
> struct, plus the full buffer size), they both call kern_readv(), so
> the effect is the same. I assume the manpage means that the iovec
> structures are filled sequentially rather than in parallel.

Interestingly, Linux does it the other way around - a device driver can
implement readv() and writev(), but if it doesn't, the kernel will fall
back to a default implementation which calls the driver's read() or
write() method once for each iov.

But to return to what Ivan was asking, I think what the man page is
trying to say is that you can't use readv() to e.g. read individual
network packets into separate buffers (unless each packet just happens
to fit exactly within each buffer).

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 12:58:53 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B1F9C16A418
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 12:58:53 +0000 (UTC)
	(envelope-from rees@citi.umich.edu)
Received: from citi.umich.edu (unknown [IPv6:2001:468:e9c:3060::4])
	by mx1.freebsd.org (Postfix) with ESMTP id 7B6ED13C44B
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 12:58:53 +0000 (UTC)
	(envelope-from rees@citi.umich.edu)
Received: from citi.umich.edu (dsl093-001-248.det1.dsl.speakeasy.net
	[66.93.1.248])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "Jim Rees", Issuer "CITI Production KCA" (verified OK))
	by citi.umich.edu (Postfix) with ESMTP id A1FBD44E2;
	Fri,  7 Dec 2007 07:58:52 -0500 (EST)
Date: Fri, 7 Dec 2007 07:58:53 -0500
From: Jim Rees <rees@freebsd.org>
To: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= <des@des.no>
Message-ID: <20071207125852.GC6665@citi.umich.edu>
References: <1196788551.47558b47ee317@imp.free.fr> <86d4tiu0ui.fsf@ds4.des.no>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <86d4tiu0ui.fsf@ds4.des.no>
Cc: freebsd-fs@freebsd.org
Subject: Re: FSCK failed does'nt correct file size when INCORRECT BLOCK
	COUNT is found
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 12:58:53 -0000

Dag-Erling Sm�rgrav wrote:

  Get a UPS.

We should strive to prevent data corruption in the face of unexpected system
shutdown.  Having a user who loses power several times a week seems useful
for testing, especially when he is willing to delve into fsck sources and
figure out what's going on.  My recommendation would be to turn off caching
in the disk and report back in a couple of weeks.

From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 13:17:22 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 66E3816A421
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 13:17:22 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 1DE5D13C458
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 13:17:21 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1J0d4r-00062v-Dn
	for freebsd-fs@freebsd.org; Fri, 07 Dec 2007 13:17:13 +0000
Received: from lara.cc.fer.hr ([161.53.72.113])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 07 Dec 2007 13:17:13 +0000
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 07 Dec 2007 13:17:13 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Fri, 07 Dec 2007 14:22:22 +0100
Lines: 35
Message-ID: <fjbh4h$f5s$1@ger.gmane.org>
References: <fjbb3v$n60$1@ger.gmane.org>	<ad79ad6b0712070347s4a5d5bb2rc7adfdc54b107dac@mail.gmail.com>
	<868x46u0lz.fsf@ds4.des.no>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enig09F7468F9E992596CC57A5F2"
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Thunderbird 2.0.0.6 (X11/20070801)
In-Reply-To: <868x46u0lz.fsf@ds4.des.no>
X-Enigmail-Version: 0.95.3
Sender: news <news@ger.gmane.org>
Subject: Re: readv: parallel or sequential?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 13:17:22 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig09F7468F9E992596CC57A5F2
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Dag-Erling Sm=C3=B8rgrav wrote:

> But to return to what Ivan was asking, I think what the man page is
> trying to say is that you can't use readv() to e.g. read individual
> network packets into separate buffers (unless each packet just happens
> to fit exactly within each buffer).

What about streaming protocols like TCP? If, for example, I know I have
a N-byte header, N2-byte body, couldn't readv handle it with two iovecs?

But that's not why I started the discussion. I'm looking for a way to do
"scattered" async IO on files (the intention: feed an array of offsets,
lengths and buffers into the kernel, let it perform the requests in
parallel, if it can) and started with this man page.


--------------enig09F7468F9E992596CC57A5F2
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)

iD8DBQFHWUkkldnAQVacBcgRAii1AKCZZHwHUcbv0Ofl55O3TTmmJFG4zQCg1fNb
becsoQeragiV7qhZn2M/dfk=
=xFh/
-----END PGP SIGNATURE-----

--------------enig09F7468F9E992596CC57A5F2--


From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 13:21:41 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8BF9416A46E;
	Fri,  7 Dec 2007 13:21:41 +0000 (UTC)
	(envelope-from phk@critter.freebsd.dk)
Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222])
	by mx1.freebsd.org (Postfix) with ESMTP id 4C3D513C45D;
	Fri,  7 Dec 2007 13:21:41 +0000 (UTC)
	(envelope-from phk@critter.freebsd.dk)
Received: from critter.freebsd.dk (unknown [192.168.61.3])
	by phk.freebsd.dk (Postfix) with ESMTP id 8F41517105;
	Fri,  7 Dec 2007 13:21:39 +0000 (UTC)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.14.1/8.14.1) with ESMTP id lB7DLcb4005180;
	Fri, 7 Dec 2007 13:21:39 GMT (envelope-from phk@critter.freebsd.dk)
To: Ivan Voras <ivoras@freebsd.org>
From: "Poul-Henning Kamp" <phk@phk.freebsd.dk>
In-Reply-To: Your message of "Fri, 07 Dec 2007 14:22:22 +0100."
	<fjbh4h$f5s$1@ger.gmane.org> 
Date: Fri, 07 Dec 2007 13:21:38 +0000
Message-ID: <5179.1197033698@critter.freebsd.dk>
Sender: phk@critter.freebsd.dk
Cc: freebsd-fs@freebsd.org
Subject: Re: readv: parallel or sequential? 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 13:21:41 -0000

In message <fjbh4h$f5s$1@ger.gmane.org>, Ivan Voras writes:
>This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
>--------------enig09F7468F9E992596CC57A5F2
>Content-Type: text/plain; charset=UTF-8
>Content-Transfer-Encoding: quoted-printable
>
>Dag-Erling Sm=C3=B8rgrav wrote:
>
>> But to return to what Ivan was asking, I think what the man page is
>> trying to say is that you can't use readv() to e.g. read individual
>> network packets into separate buffers (unless each packet just happens
>> to fit exactly within each buffer).
>
>What about streaming protocols like TCP? If, for example, I know I have
>a N-byte header, N2-byte body, couldn't readv handle it with two iovecs?

yes.

>But that's not why I started the discussion. I'm looking for a way to do
>"scattered" async IO on files (the intention: feed an array of offsets,
>lengths and buffers into the kernel, let it perform the requests in
>parallel, if it can) and started with this man page.

You want AIO

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 13:34:18 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1644616A417
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 13:34:18 +0000 (UTC)
	(envelope-from bg@sics.se)
Received: from letter.sics.se (letter.sics.se [193.10.64.6])
	by mx1.freebsd.org (Postfix) with ESMTP id CF01613C468
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 13:34:17 +0000 (UTC)
	(envelope-from bg@sics.se)
Received: from sics.se (ibook.sics.se [193.10.66.104])
	by letter.sics.se (Postfix) with ESMTP id A1342400D3;
	Fri,  7 Dec 2007 14:34:15 +0100 (CET)
Date: Fri, 7 Dec 2007 14:33:48 +0100
From: Bjorn Gronvall <bg@sics.se>
To: Dag-Erling =?ISO-8859-1?Q?Sm=F8rgrav?= <des@des.no>
Message-ID: <20071207143348.17470be3@ibook.sics.se>
In-Reply-To: <86hciuu0vn.fsf@ds4.des.no>
References: <1196788555.47558b4bab0ab@imp.free.fr>
	<1196953310.47580ede28676@imp.free.fr>
	<20071206175608.594685d9@ibook.sics.se> <86hciuu0vn.fsf@ds4.des.no>
Organization: SICS.SE
X-Mailer: Claws Mail 2.9.1 (GTK+ 2.10.6; i386-portbld-freebsd6.2)
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Cc: freebsd-fs@freebsd.org
Subject: Re: FSCK doesn't correct file size when INCORRECT BLOCK COUNT is
 found
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 13:34:18 -0000

On Fri, 07 Dec 2007 13:48:12 +0100
Dag-Erling Sm�rgrav <des@des.no> wrote:

Hi Dag-Erling,

> Bjorn Gronvall <bg@sics.se> writes:
> > Filesystems in general and UFS with soft updates in particular rely on
> > disks providing accurate response to writes. When write caching is
> > enabled the disk will lie and tell the operating system that the write
> > has completed successfully, in reality the data is only cached in disk
> > RAM. When the power disappears the data will be gone forever.
> 
> No.  This used to be the case with some cheaper disks which ignored the
> ATA "flush cache" command to score higher on benchmarks, but I doubt
> you'll find any disks on the market that still do that (at least from
> reputable manufacturers).

Agreed, but the software must also be written to actually make use of
the more recent "flush cache" feature. I know that the GEOM journal
can make use of this feature but does UFS with soft updates use it?

> ZFS makes extensive use of the "flush cache"
> command to ensure file system integrity (and in particular to ensure
> that the intent log is written to disk so it can be replayed in case of
> a crash).

ZFS is a more recent beast than UFS and was probably designed with the
"flush cache" feature in mind right from the very beginning.

Cheers,
/b


--
  _     _                                           ,_______________.
Bjorn Gronvall (Bj�rn Gr�nvall)                    /_______________/|
Swedish Institute of Computer Science              |               ||
PO Box 1263, S-164 29 Kista, Sweden                | Schroedingers ||
Email: bg@sics.se, Phone +46 -8 633 15 25          |      Cat      |/
Cellular +46 -70 768 06 35, Fax +46 -8 751 72 30   '---------------'

From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 13:55:44 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AD3AC16A418;
	Fri,  7 Dec 2007 13:55:44 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from raven.bwct.de (raven.bwct.de [85.159.14.73])
	by mx1.freebsd.org (Postfix) with ESMTP id 1DF8813C467;
	Fri,  7 Dec 2007 13:55:43 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely5.cicely.de ([10.1.1.7])
	by raven.bwct.de (8.13.4/8.13.4) with ESMTP id lB7Db8uF068221;
	Fri, 7 Dec 2007 14:37:08 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14])
	by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id lB7Db20j031404
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 7 Dec 2007 14:37:02 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (localhost [127.0.0.1])
	by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id lB7Db1qa020746;
	Fri, 7 Dec 2007 14:37:01 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: (from ticso@localhost)
	by cicely12.cicely.de (8.13.4/8.13.3/Submit) id lB7Db1Dr020745;
	Fri, 7 Dec 2007 14:37:01 +0100 (CET) (envelope-from ticso)
Date: Fri, 7 Dec 2007 14:37:01 +0100
From: Bernd Walter <ticso@cicely12.cicely.de>
To: Ivan Voras <ivoras@freebsd.org>
Message-ID: <20071207133700.GO10459@cicely12.cicely.de>
References: <fjbb3v$n60$1@ger.gmane.org>
	<ad79ad6b0712070347s4a5d5bb2rc7adfdc54b107dac@mail.gmail.com>
	<868x46u0lz.fsf@ds4.des.no> <fjbh4h$f5s$1@ger.gmane.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <fjbh4h$f5s$1@ger.gmane.org>
X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha
User-Agent: Mutt/1.5.9i
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=0.024,
	BAYES_00=-2.599 autolearn=ham version=3.1.7
X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on cicely12.cicely.de
Cc: freebsd-fs@freebsd.org
Subject: Re: readv: parallel or sequential?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ticso@cicely.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 13:55:44 -0000

On Fri, Dec 07, 2007 at 02:22:22PM +0100, Ivan Voras wrote:
> Dag-Erling Sm�rgrav wrote:
> 
> > But to return to what Ivan was asking, I think what the man page is
> > trying to say is that you can't use readv() to e.g. read individual
> > network packets into separate buffers (unless each packet just happens
> > to fit exactly within each buffer).
> 
> What about streaming protocols like TCP? If, for example, I know I have
> a N-byte header, N2-byte body, couldn't readv handle it with two iovecs?
> 
> But that's not why I started the discussion. I'm looking for a way to do
> "scattered" async IO on files (the intention: feed an array of offsets,
> lengths and buffers into the kernel, let it perform the requests in
> parallel, if it can) and started with this man page.

I wonder if the kernel can read a single file in parallel, because
disk heads can't be on multiple positions at the same time.
ZFS does fill read cache in parallel if it knowns that there are enough
spindels, but in every other case the FS doesn't know about multiple
spindels.
In case of ZFS you don't have to care much about it in you application
because the next sequentiel fileread will use the previously parallel
prefilled cache.

-- 
B.Walter                http://www.bwct.de      http://www.fizon.de
bernd@bwct.de           info@bwct.de            support@fizon.de

From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 14:11:53 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8F54616A418
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 14:11:53 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 44D9013C4CC
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 14:11:53 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1J0dvO-00041n-3y
	for freebsd-fs@freebsd.org; Fri, 07 Dec 2007 14:11:30 +0000
Received: from lara.cc.fer.hr ([161.53.72.113])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 07 Dec 2007 14:11:30 +0000
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 07 Dec 2007 14:11:30 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Fri, 07 Dec 2007 15:16:30 +0100
Lines: 40
Message-ID: <fjbk9g$pua$1@ger.gmane.org>
References: <fjbb3v$n60$1@ger.gmane.org>	<ad79ad6b0712070347s4a5d5bb2rc7adfdc54b107dac@mail.gmail.com>	<868x46u0lz.fsf@ds4.des.no>
	<fjbh4h$f5s$1@ger.gmane.org>
	<20071207133700.GO10459@cicely12.cicely.de>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enig51AF23534A84605A6ACA3557"
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Thunderbird 2.0.0.6 (X11/20070801)
In-Reply-To: <20071207133700.GO10459@cicely12.cicely.de>
X-Enigmail-Version: 0.95.3
Sender: news <news@ger.gmane.org>
Subject: Re: readv: parallel or sequential?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 14:11:53 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig51AF23534A84605A6ACA3557
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Bernd Walter wrote:

> I wonder if the kernel can read a single file in parallel, because
> disk heads can't be on multiple positions at the same time.

They can be in case of RAID0 and similar schemes.

> ZFS does fill read cache in parallel if it knowns that there are enough=

> spindels, but in every other case the FS doesn't know about multiple
> spindels.
> In case of ZFS you don't have to care much about it in you application
> because the next sequentiel fileread will use the previously parallel
> prefilled cache.

Yes, ZFS is supposed to be doing marvelous things with IO prediction and
scheduling, but I think even basic "ladder" scheduling done in FreeBSD
could in theory help in tight spots with multiple requests.


--------------enig51AF23534A84605A6ACA3557
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)

iD8DBQFHWVXEldnAQVacBcgRAmfeAKDWpfk14QJvaSWHgOHFZM0L/5k4AwCdFhsX
RTEJji9qv9pHDC07XEGEtpg=
=IAO/
-----END PGP SIGNATURE-----

--------------enig51AF23534A84605A6ACA3557--


From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 14:45:33 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BB6F716A417
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 14:45:33 +0000 (UTC)
	(envelope-from julien.bellang@free.fr)
Received: from smtp1-g19.free.fr (smtp1-g19.free.fr [212.27.42.27])
	by mx1.freebsd.org (Postfix) with ESMTP id 7E86913C4EA
	for <freebsd-fs@freebsd.org>; Fri,  7 Dec 2007 14:45:33 +0000 (UTC)
	(envelope-from julien.bellang@free.fr)
Received: from smtp1-g19.free.fr (localhost.localdomain [127.0.0.1])
	by smtp1-g19.free.fr (Postfix) with ESMTP id 6E5EE1AB2B1;
	Fri,  7 Dec 2007 15:45:32 +0100 (CET)
Received: from [127.0.0.1] (vil35-2-82-227-204-7.fbx.proxad.net [82.227.204.7])
	by smtp1-g19.free.fr (Postfix) with ESMTP id 0BD351AB2FC;
	Fri,  7 Dec 2007 15:45:31 +0100 (CET)
Message-ID: <47595C8C.6060203@free.fr>
Date: Fri, 07 Dec 2007 15:45:32 +0100
From: julien <julien.bellang@free.fr>
User-Agent: Thunderbird 2.0.0.9 (Windows/20071031)
MIME-Version: 1.0
To: Jim Rees <rees@freebsd.org>
References: <1196788551.47558b47ee317@imp.free.fr> <86d4tiu0ui.fsf@ds4.des.no>
	<20071207125852.GC6665@citi.umich.edu>
In-Reply-To: <20071207125852.GC6665@citi.umich.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
X-Antivirus: avast! (VPS 071206-0, 06/12/2007), Outbound message
X-Antivirus-Status: Clean
Cc: freebsd-fs@freebsd.org, =?ISO-8859-1?Q?Dag-Erling_Sm=F8rgrav?= <des@des.no>
Subject: Re: FSCK failed does'nt correct file size when INCORRECT BLOCK COUNT
 is found
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 14:45:33 -0000

I can't get a UPS in my environment.

I already tested with the write cache desactivated, but the problem 
still the same, I obtain files with holes and incorrect size and 
blockcount. The only difference is that there is less holes and 
performance are falling down.

The problem is really easy to reproduce, you have just to copy several 
big files and shutdown the power in the midle of the copy.


Jim Rees a �crit :
> Dag-Erling Sm�rgrav wrote:
>
>   Get a UPS.
>
> We should strive to prevent data corruption in the face of unexpected system
> shutdown.  Having a user who loses power several times a week seems useful
> for testing, especially when he is willing to delve into fsck sources and
> figure out what's going on.  My recommendation would be to turn off caching
> in the disk and report back in a couple of weeks.
>
>
>   


From owner-freebsd-fs@FreeBSD.ORG  Fri Dec  7 17:49:26 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9C26B16A474;
	Fri,  7 Dec 2007 17:49:26 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from raven.bwct.de (raven.bwct.de [85.159.14.73])
	by mx1.freebsd.org (Postfix) with ESMTP id 0DDBF13C4E1;
	Fri,  7 Dec 2007 17:49:25 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely5.cicely.de ([10.1.1.7])
	by raven.bwct.de (8.13.4/8.13.4) with ESMTP id lB7HnO3j074856;
	Fri, 7 Dec 2007 18:49:24 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14])
	by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id lB7HnFR8033545
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 7 Dec 2007 18:49:16 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (localhost [127.0.0.1])
	by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id lB7HnFME021305;
	Fri, 7 Dec 2007 18:49:15 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: (from ticso@localhost)
	by cicely12.cicely.de (8.13.4/8.13.3/Submit) id lB7HnFYS021304;
	Fri, 7 Dec 2007 18:49:15 +0100 (CET) (envelope-from ticso)
Date: Fri, 7 Dec 2007 18:49:15 +0100
From: Bernd Walter <ticso@cicely12.cicely.de>
To: Ivan Voras <ivoras@freebsd.org>
Message-ID: <20071207174914.GQ10459@cicely12.cicely.de>
References: <fjbb3v$n60$1@ger.gmane.org>
	<ad79ad6b0712070347s4a5d5bb2rc7adfdc54b107dac@mail.gmail.com>
	<868x46u0lz.fsf@ds4.des.no> <fjbh4h$f5s$1@ger.gmane.org>
	<20071207133700.GO10459@cicely12.cicely.de>
	<fjbk9g$pua$1@ger.gmane.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <fjbk9g$pua$1@ger.gmane.org>
X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha
User-Agent: Mutt/1.5.9i
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=0.023,
	BAYES_00=-2.599 autolearn=ham version=3.1.7
X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on cicely12.cicely.de
Cc: freebsd-fs@freebsd.org
Subject: Re: readv: parallel or sequential?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ticso@cicely.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Dec 2007 17:49:26 -0000

On Fri, Dec 07, 2007 at 03:16:30PM +0100, Ivan Voras wrote:
> Bernd Walter wrote:
> 
> > I wonder if the kernel can read a single file in parallel, because
> > disk heads can't be on multiple positions at the same time.
> 
> They can be in case of RAID0 and similar schemes.

Yes, but how can it now that it is on a RAID0 and taking advantage of
multiple spindles instead of making it worse?
The FS has to do sensible things for single spindle as well.
And normaly disks are fastest when reading linear and with disk read
caches this doesn't even have to be interleaved.
I don't see any potential for parallell access within the same file
beside some special constructed cases maybe.
Granted if you issue many access in parallel you allow the disk queue
to sort them in the most effective way, but most FS do a hard job
getting single files almost linear, so there is no seek time win at all.
I assume the best is the application sorting the readv entries in an
increasing order.

> > ZFS does fill read cache in parallel if it knowns that there are enough
> > spindels, but in every other case the FS doesn't know about multiple
> > spindels.
> > In case of ZFS you don't have to care much about it in you application
> > because the next sequentiel fileread will use the previously parallel
> > prefilled cache.
> 
> Yes, ZFS is supposed to be doing marvelous things with IO prediction and
> scheduling, but I think even basic "ladder" scheduling done in FreeBSD
> could in theory help in tight spots with multiple requests.

At least there are some workloads with very good results.
A friend recently measured almost twice the speed when reading a big
file on a two disk ZFS mirror compared to single disk raw speed.

-- 
B.Walter                http://www.bwct.de      http://www.fizon.de
bernd@bwct.de           info@bwct.de            support@fizon.de

From owner-freebsd-fs@FreeBSD.ORG  Sat Dec  8 00:03:11 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9763E16A420
	for <fs@freebsd.org>; Sat,  8 Dec 2007 00:03:11 +0000 (UTC)
	(envelope-from victorloureirolima@gmail.com)
Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.183])
	by mx1.freebsd.org (Postfix) with ESMTP id 4AFC513C447
	for <fs@freebsd.org>; Sat,  8 Dec 2007 00:03:11 +0000 (UTC)
	(envelope-from victorloureirolima@gmail.com)
Received: by py-out-1112.google.com with SMTP id u77so1988203pyb
	for <fs@freebsd.org>; Fri, 07 Dec 2007 16:03:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	bh=GGNy31w5bJDgoYCXB3SpGqTcd16W8pYF+ISVMxeO4zs=;
	b=FwncVyj0hdtu4E6vTS+oI8pZ/afQDiHz3ksZhCdYFUFqHTmZ8vu0y+7NppBcc+8BllQ7JfezrxnJzjWyID7ID9x1YiWdAF6aaNHTBjaYWSydtNQ3oIe9ThsSBHnLcBaAUkfy8CL+g5tGZEM96GF9aKqnPaTNfuSCVatAytmDfYA=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=tAsw0Di2+a13jZLoxv6hmwLRPjzceivTTimIr/dHZaH2K1OK+UxloyRnMtwKZm6IdylRstBpzXqWp5S0Xjyu5mRfaAJmAPhAdN4/sLJ+6nSn7t4Ofd63Gwm4hWW6JucSKf9DQD4DO913lR1fF/8ZJyw4zQzOogYonUOugVpcgc8=
Received: by 10.35.33.15 with SMTP id l15mr4016058pyj.1197070561952;
	Fri, 07 Dec 2007 15:36:01 -0800 (PST)
Received: by 10.35.125.7 with HTTP; Fri, 7 Dec 2007 15:36:01 -0800 (PST)
Message-ID: <ac00e00a0712071536y32e17551i9cb5861dc1653e30@mail.gmail.com>
Date: Fri, 7 Dec 2007 21:36:01 -0200
From: "Victor Loureiro Lima" <victorloureirolima@gmail.com>
To: ticso@cicely.de
In-Reply-To: <20071206220755.GH10459@cicely12.cicely.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20071206210848.GA63825@transwarp.tao.org.uk>
	<20071206220755.GH10459@cicely12.cicely.de>
Cc: Josef Karthauser <joe@tao.org.uk>, fs@freebsd.org
Subject: Re: -14% available on /tmp
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Dec 2007 00:03:11 -0000

Okay, and what about this?

root@zion# df -H
Filesystem                        Size    Used   Avail Capacity  Mounted on
/dev/ad4s1a                       520M    341M    137M    71%    /
devfs                             1.0k    1.0k      0B   100%    /dev
/dev/ad4s1e                       520M     25M    453M     5%    /tmp
/dev/ad4s1f                       120G     71G     39G    65%    /usr
/dev/ad4s1d                       2.0G    2.0G   -162M   109%    /var

Has anyone seen this a partition (/var)? Any pointers on how to fix this!?

cheers,
victor

From owner-freebsd-fs@FreeBSD.ORG  Sat Dec  8 00:29:04 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 38D4816A417
	for <fs@freebsd.org>; Sat,  8 Dec 2007 00:29:04 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from raven.bwct.de (raven.bwct.de [85.159.14.73])
	by mx1.freebsd.org (Postfix) with ESMTP id A14CC13C43E
	for <fs@freebsd.org>; Sat,  8 Dec 2007 00:29:03 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely5.cicely.de ([10.1.1.7])
	by raven.bwct.de (8.13.4/8.13.4) with ESMTP id lB80T1Af086815;
	Sat, 8 Dec 2007 01:29:01 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14])
	by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id lB80StUc036373
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 8 Dec 2007 01:28:56 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (localhost [127.0.0.1])
	by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id lB80St1w022183;
	Sat, 8 Dec 2007 01:28:55 +0100 (CET)
	(envelope-from ticso@cicely12.cicely.de)
Received: (from ticso@localhost)
	by cicely12.cicely.de (8.13.4/8.13.3/Submit) id lB80Ss3a022182;
	Sat, 8 Dec 2007 01:28:54 +0100 (CET) (envelope-from ticso)
Date: Sat, 8 Dec 2007 01:28:54 +0100
From: Bernd Walter <ticso@cicely12.cicely.de>
To: Victor Loureiro Lima <victorloureirolima@gmail.com>
Message-ID: <20071208002854.GT10459@cicely12.cicely.de>
References: <20071206210848.GA63825@transwarp.tao.org.uk>
	<20071206220755.GH10459@cicely12.cicely.de>
	<ac00e00a0712071536y32e17551i9cb5861dc1653e30@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <ac00e00a0712071536y32e17551i9cb5861dc1653e30@mail.gmail.com>
X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha
User-Agent: Mutt/1.5.9i
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=0.034,
	BAYES_00=-2.599 autolearn=ham version=3.1.7
X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on cicely12.cicely.de
Cc: Josef Karthauser <joe@tao.org.uk>, ticso@cicely.de, fs@freebsd.org
Subject: Re: -14% available on /tmp
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ticso@cicely.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Dec 2007 00:29:04 -0000

On Fri, Dec 07, 2007 at 09:36:01PM -0200, Victor Loureiro Lima wrote:
> Okay, and what about this?
> 
> root@zion# df -H
> Filesystem                        Size    Used   Avail Capacity  Mounted on
> /dev/ad4s1a                       520M    341M    137M    71%    /
> devfs                             1.0k    1.0k      0B   100%    /dev
> /dev/ad4s1e                       520M     25M    453M     5%    /tmp
> /dev/ad4s1f                       120G     71G     39G    65%    /usr
> /dev/ad4s1d                       2.0G    2.0G   -162M   109%    /var
> 
> Has anyone seen this a partition (/var)? Any pointers on how to fix this!?

This is expected if you overfilled your FS because it has some space
reserved for tuning which was exhausted by root processes - see tunfs(8)
-m option for details.
There is nothing wrong with your filesystem and the situation can be
"fixed" just by deleting files.

The original post however had a negative used value, which is not expected
and is a summary corruption in the filesystem.

-- 
B.Walter                http://www.bwct.de      http://www.fizon.de
bernd@bwct.de           info@bwct.de            support@fizon.de