From owner-freebsd-fs@FreeBSD.ORG Sun Dec 2 03:57:13 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB2A816A417; Sun, 2 Dec 2007 03:57:11 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au [211.29.132.194]) by mx1.freebsd.org (Postfix) with ESMTP id 40B3313C4D5; Sun, 2 Dec 2007 03:57:11 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lB23v5b1010753 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 2 Dec 2007 14:57:08 +1100 Date: Sun, 2 Dec 2007 14:57:05 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Don Lewis In-Reply-To: <200712012214.lB1MEl2Q015881@gw.catspoiler.org> Message-ID: <20071202132955.M18602@delplex.bde.org> References: <200712012214.lB1MEl2Q015881@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org Subject: Re: File remove problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2007 03:57:13 -0000 On Sat, 1 Dec 2007, Don Lewis wrote: > On 2 Dec, Bruce Evans wrote: > >> Here is a non-hackish patch which explains why ignoring MNT_RDONLY in >> the above or in ffs_mount() helps. It just fixes the confusion between >> IN_MODIFIED and IN_CHANGE in critical places. >> >> % Index: ffs_softdep.c [All settings here, but not yet in ufs_inactive() and ffs_truncate(), which are critical, and in other places which might not be critical.] >> Without this change, soft updates depends on IN_CHANGE being converted >> to IN_MODIFIED by ufs_itimes(), but this conversion doesn't happen >> when MNT_RDONLY is set. With soft updates, changes are often delayed >> until sync time, and when the sync is for mount-update it is done after >> setting MNT_RDONLY so the above doesn't work. > > ufs_itimes() should probably also be looking at fs_ronly instead of > MNT_RDONLY, *but* all the paths leading from userland to ufs_itimes() > would need to be checked to verify that they check MNT_RDONLY to prevent > new file system write operations from happening while the remount is in > progress. Yes, that is probably why MNT_RDONLY is (ab)used now. I found old (Y2002) private mail from mckusick that explains a previous change in this area, a change that mostly avoided the problem but has been lost: % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v % Working file: ufs_vnops.c % ---------------------------- % revision 1.182 % date: 2002/01/15 07:17:12; author: mckusick; state: Exp; lines: +4 -5 % When downgrading a filesystem from read-write to read-only, operations % involving file removal or file update were not always being fully % committed to disk. The result was lost files or corrupted file data. % This change ensures that the filesystem is properly synced to disk % before the filesystem is down-graded. % % This delta also fixes a long standing bug in which a file open for % reading has been unlinked. When the last open reference to the file % is closed, the inode is reclaimed by the filesystem. Previously, % if the filesystem had been down-graded to read-only, the inode could % not be reclaimed, and thus was lost and had to be later recovered % by fsck. With this change, such files are found at the time of the % down-grade. Normally they will result in the filesystem down-grade % failing with `device busy'. If a forcible down-grade is done, then % the affected files will be revoked causing the inode to be released % and the open file descriptors to begin failing on attempts to read. % % Submitted by: "Sam Leffler" % ---------------------------- % % Index: ufs_vnops.c % =================================================================== % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v % retrieving revision 1.181 % retrieving revision 1.182 % diff -u -2 -r1.181 -r1.182 % --- ufs_vnops.c 22 Nov 2001 15:33:12 -0000 1.181 % +++ ufs_vnops.c 15 Jan 2002 07:17:12 -0000 1.182 % @@ -37,5 +37,5 @@ % * % * @(#)ufs_vnops.c 8.27 (Berkeley) 5/27/95 % - * $FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.181 2001/11/22 15:33:12 guido Exp $ % + * $FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.182 2002/01/15 07:17:12 mckusick Exp $ % */ % % @@ -159,11 +159,10 @@ % if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) % return; % + if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp)) % + ip->i_flag |= IN_LAZYMOD; % + else % + ip->i_flag |= IN_MODIFIED; % if ((vp->v_mount->mnt_flag & MNT_RDONLY) == 0) { % vfs_timestamp(&ts); % - if ((vp->v_type == VBLK || vp->v_type == VCHR) && % - !DOINGSOFTDEP(vp)) % - ip->i_flag |= IN_LAZYMOD; % - else % - ip->i_flag |= IN_MODIFIED; % if (ip->i_flag & IN_ACCESS) { % ip->i_atime = ts.tv_sec; This is in ufs_itimes(). Note that it moves the setting of the modified flag before the check of MNT_RDONLY, so that when the wrong or incomplete flags are set earlier and the wrong flags aren't converted to the modified flag before MNT_RDONLY is set, then we only lose the timestamps but not critical updates here. % Index: ffs_inode.c % =================================================================== % RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_inode.c,v % retrieving revision 1.73 % retrieving revision 1.74 % diff -u -2 -r1.73 -r1.74 % --- ffs_inode.c 13 Dec 2001 05:07:48 -0000 1.73 % +++ ffs_inode.c 15 Jan 2002 07:17:12 -0000 1.74 % @@ -32,5 +32,5 @@ % * % * @(#)ffs_inode.c 8.13 (Berkeley) 4/21/95 % - * $FreeBSD: src/sys/ufs/ffs/ffs_inode.c,v 1.73 2001/12/13 05:07:48 mckusick Exp $ % + * $FreeBSD: src/sys/ufs/ffs/ffs_inode.c,v 1.74 2002/01/15 07:17:12 mckusick Exp $ % */ % % @@ -88,7 +88,7 @@ % return (0); % ip->i_flag &= ~(IN_LAZYMOD | IN_MODIFIED); % - if (vp->v_mount->mnt_flag & MNT_RDONLY) % - return (0); % fs = ip->i_fs; % + if (fs->fs_ronly) % + return (0); % /* % * Ensure that uid and gid are correct. This is a temporary This fixes loss of the critical updates a little later in ffs_update(). % @@ -153,4 +153,6 @@ % oip = VTOI(ovp); % fs = oip->i_fs; % + if (fs->fs_ronly) % + panic("ffs_truncate: read-only filesystem"); % if (length < 0) % return (EINVAL); This is a sanity check in ffs_truncate(). I think all callers except the one in ufs_inactive() automatically pass the check, since they are higher level so they do a correct check of MNT_RDONLY. The call in ufs_inactive() used to be unconditional (except for the (i_nlink == 0) condition of course). In -current it is conditional on MNT_ONLY. kib's patch changes it to be conditional on fs_ronly, but I hope it can become unconditional again -- it should be an error to reach ufs_inactive() with a partially deleted file after syncing before changing fs_ronly to 0. ffs_update() should panic instead of returning 0 when (fs->fs_ronly) too, so that bugs get noticed. mckusick's explanation says that "[fs_ronly is the only believable flag]. Thus the killing of IN_MODIFIED has to happen a few lines later [in] ffs_update()". It should be safe to blindly ignore all modification flags except IN_ACCESS in ufs_itimes(), since ffs_update() will kill the completely invalid ones. 4.4BSD-Lite blindly ignored all modification flags in ITIMES(), and checks the wrong read-only flag (MNT_RDONLY) in open code that duplicates ITIMES() plus adds the wrong r/o check. When I converted ITIMES() to ufs_itimes(), I centralized this wrong r/o check. This was mainly a cleanup, but I think it fixes wrong setting of atimes (IN_ATIME is set without checking any r/o flags, and IN_ATIME was sometimes converted into a timestamp before ffs_update killed it, so applications could see atime changing even on purely r/o mounted file systems). The fix in ufs_vnops.c was lost relatively recently as part of related changes to fix IN_ACCESS for non-exclusively-locked reads: % Index: ufs_vnops.c % =================================================================== % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v % retrieving revision 1.279 % retrieving revision 1.280 % diff -u -2 -r1.279 -r1.280 % --- ufs_vnops.c 2 Oct 2006 02:08:31 -0000 1.279 % +++ ufs_vnops.c 10 Oct 2006 09:20:54 -0000 1.280 % @@ -36,5 +36,5 @@ % % #include % -__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.279 2006/10/02 02:08:31 tegge Exp $"); % +__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.280 2006/10/10 09:20:54 kib Exp $"); % % #include "opt_mac.h" % @@ -129,29 +129,47 @@ % struct inode *ip; % struct timespec ts; % + int mnt_locked; % % ip = VTOI(vp); % + mnt_locked = 0; % + if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) { % + VI_LOCK(vp); % + goto out; % + } % + MNT_ILOCK(vp->v_mount); /* For reading of mnt_kern_flags. */ % + mnt_locked = 1; % + VI_LOCK(vp); % if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) % - return; % + goto out_unl; % + % % ip->i_flag |= IN_LAZYMOD; % - else % + else if (((vp->v_mount->mnt_kern_flag & % + (MNTK_SUSPENDED | MNTK_SUSPEND)) == 0) || % + (ip->i_flag & (IN_CHANGE | IN_UPDATE))) % ip->i_flag |= IN_MODIFIED; % - if ((vp->v_mount->mnt_flag & MNT_RDONLY) == 0) { % - vfs_timestamp(&ts); % - if (ip->i_flag & IN_ACCESS) { % - DIP_SET(ip, i_atime, ts.tv_sec); % - DIP_SET(ip, i_atimensec, ts.tv_nsec); % - } % - if (ip->i_flag & IN_UPDATE) { % - DIP_SET(ip, i_mtime, ts.tv_sec); % - DIP_SET(ip, i_mtimensec, ts.tv_nsec); % - ip->i_modrev++; % - } % - if (ip->i_flag & IN_CHANGE) { % - DIP_SET(ip, i_ctime, ts.tv_sec); % - DIP_SET(ip, i_ctimensec, ts.tv_nsec); % - } % + else if (ip->i_flag & IN_ACCESS) % + ip->i_flag |= IN_LAZYACCESS; % + vfs_timestamp(&ts); % + if (ip->i_flag & IN_ACCESS) { % + DIP_SET(ip, i_atime, ts.tv_sec); % + DIP_SET(ip, i_atimensec, ts.tv_nsec); % + } % + if (ip->i_flag & IN_UPDATE) { % + DIP_SET(ip, i_mtime, ts.tv_sec); % + DIP_SET(ip, i_mtimensec, ts.tv_nsec); % + ip->i_modrev++; % + } % + if (ip->i_flag & IN_CHANGE) { % + DIP_SET(ip, i_ctime, ts.tv_sec); % + DIP_SET(ip, i_ctimensec, ts.tv_nsec); % } % + % + out: % ip->i_flag &= ~(IN_ACCESS | IN_CHANGE | IN_UPDATE); % + out_unl: % + VI_UNLOCK(vp); % + if (mnt_locked) % + MNT_IUNLOCK(vp->v_mount); % } % Now we trust MNT_RDONLY again, so that when the wrong or incomplete flags are set earlier and the wrong flags aren't converted to the modified flag before MNT_RDONLY is set, then we only lose both timestamps and critical updates here. I think the best quick fix would be to trust fs_ronly here, except for IN_ACCESS. Then wrong IN_CHANGE | IN_UPDATE flags would just cause wrong updates of i_ctime and i_mtime, and an extra i/o to write these changes, but missing IN_MODIFIED flags would be fixed up as in rev.1.182 and associated correct IN_CHANGE | IN_UPDATE flags wouldn't be incorrectly discarded. IN_ACCESS needs special handling even in the non-snapshot cases so that read() doesn't race mount-update (at best, read()s might keep dirtying inodes, so there would be a problem setting fs_ronly atomically with completing the sync). Most other file systems are primitive or broken in this area. ext2fs is at the level of ufs_vnops.c 1.182. msdosfs is at level before 4.4BSD-Lite (it still uses its clone of ITIMES() and looks more like the Net/2 ffs than the 4.4BSD one). But most other file systems are Giant-locked, so they don't have the IN_ACCESS races. Bruce From owner-freebsd-fs@FreeBSD.ORG Sun Dec 2 06:14:39 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8372016A420; Sun, 2 Dec 2007 06:14:39 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197]) by mx1.freebsd.org (Postfix) with ESMTP id 2CC8C13C448; Sun, 2 Dec 2007 06:14:39 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=deviant.kiev.zoral.com.ua) by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1IyhrP-0004Dy-2L; Sun, 02 Dec 2007 07:59:23 +0200 Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.1/8.14.1) with ESMTP id lB25xK8m085144; Sun, 2 Dec 2007 07:59:20 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id lB25xJfN085143; Sun, 2 Dec 2007 07:59:19 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 2 Dec 2007 07:59:19 +0200 From: Kostik Belousov To: Don Lewis Message-ID: <20071202055919.GR83121@deviant.kiev.zoral.com.ua> References: <20071201215706.B12006@besplex.bde.org> <200712012207.lB1M7oNg015468@gw.catspoiler.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="+Yg8W10oK6rlW0RR" Content-Disposition: inline In-Reply-To: <200712012207.lB1M7oNg015468@gw.catspoiler.org> User-Agent: Mutt/1.4.2.3i X-Scanner-Signature: efe3bc1504fa577d936f03857ce6b673 X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 1838 [Dec 01 2007] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: freebsd-fs@freebsd.org Subject: Re: File remove problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2007 06:14:39 -0000 --+Yg8W10oK6rlW0RR Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Dec 01, 2007 at 02:07:50PM -0800, Don Lewis wrote: > On 1 Dec, Bruce Evans wrote: > > On Sat, 1 Dec 2007, Kostik Belousov wrote: >=20 > >> +static int > >> +ffs_isronly(struct ufsmount *ump) > >> +{ > >> + struct fs *fs =3D ump->um_fs; > >> + > >> + return (fs->fs_ronly); > >> +} > >> + > >=20 > > Could be ump->um_fs->fs_ronly. >=20 > That's the change that I would have made. A #include for > would have to be added, which some might argue would be a layering > violation. I'd prefer to avoid the extra indirection. I would argue that the ufs already knows too much about the ffs. But, this seems to be the first explicit reference to the ffs from the ufs code. With your approval, see below. diff --git a/sys/ufs/ufs/ufs_inode.c b/sys/ufs/ufs/ufs_inode.c index 448f436..22e29e9 100644 --- a/sys/ufs/ufs/ufs_inode.c +++ b/sys/ufs/ufs/ufs_inode.c @@ -60,6 +60,7 @@ __FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.69 20= 07/06/22 13:22:37 kib E #ifdef UFS_GJOURNAL #include #endif +#include =20 /* * Last reference to an inode. If necessary, write or delete it. @@ -90,8 +91,7 @@ ufs_inactive(ap) ufs_gjournal_close(vp); #endif if ((ip->i_effnlink =3D=3D 0 && DOINGSOFTDEP(vp)) || - (ip->i_nlink <=3D 0 && - (vp->v_mount->mnt_flag & MNT_RDONLY) =3D=3D 0)) { + (ip->i_nlink <=3D 0 && !VFSTOUFS(mp)->um_fs->fs_ronly)) { loop: if (vn_start_secondary_write(vp, &mp, V_NOWAIT) !=3D 0) { /* Cannot delete file while file system is suspended */ @@ -121,7 +121,7 @@ ufs_inactive(ap) } if (ip->i_effnlink =3D=3D 0 && DOINGSOFTDEP(vp)) softdep_releasefile(ip); - if (ip->i_nlink <=3D 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) =3D=3D 0) { + if (ip->i_nlink <=3D 0 && !VFSTOUFS(mp)->um_fs->fs_ronly) { #ifdef QUOTA if (!getinoquota(ip)) (void)chkiq(ip, -1, NOCRED, FORCE); --+Yg8W10oK6rlW0RR Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHUkm3C3+MBN1Mb4gRAiKBAJ9eJ0tNa94jZv9Aav5edNLWaiQsdwCg3GCV O1SqQzn7h0lN0eNywv/0qYg= =L+Fh -----END PGP SIGNATURE----- --+Yg8W10oK6rlW0RR-- From owner-freebsd-fs@FreeBSD.ORG Sun Dec 2 08:35:41 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3B5C16A419; Sun, 2 Dec 2007 08:35:41 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au [211.29.132.182]) by mx1.freebsd.org (Postfix) with ESMTP id 3387113C458; Sun, 2 Dec 2007 08:35:41 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lB28ZaGK023561 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 2 Dec 2007 19:35:38 +1100 Date: Sun, 2 Dec 2007 19:35:36 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Kostik Belousov In-Reply-To: <20071202055919.GR83121@deviant.kiev.zoral.com.ua> Message-ID: <20071202183809.I1560@besplex.bde.org> References: <20071201215706.B12006@besplex.bde.org> <200712012207.lB1M7oNg015468@gw.catspoiler.org> <20071202055919.GR83121@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, Don Lewis Subject: Re: File remove problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2007 08:35:41 -0000 On Sun, 2 Dec 2007, Kostik Belousov wrote: > On Sat, Dec 01, 2007 at 02:07:50PM -0800, Don Lewis wrote: >> On 1 Dec, Bruce Evans wrote: >>> On Sat, 1 Dec 2007, Kostik Belousov wrote: >> >>>> +static int >>>> +ffs_isronly(struct ufsmount *ump) >>>> +{ >>>> + struct fs *fs = ump->um_fs; >>>> + >>>> + return (fs->fs_ronly); >>>> +} >>>> + >>> >>> Could be ump->um_fs->fs_ronly. >> >> That's the change that I would have made. A #include for >> would have to be added, which some might argue would be a layering >> violation. I'd prefer to avoid the extra indirection. > > I would argue that the ufs already knows too much about the ffs. But, > this seems to be the first explicit reference to the ffs from the ufs > code. With your approval, see below. It's more like the fourth: - ufs_itimes() is a layering violation. However, with both ffs and ufs needing to set timestamps (for ffs, only in ffs_update()), and with both ffs and ufs both needing to set IN_* all over the place, it isn't clear which layer timstamps belong in. - ufs_vnops.c now includes ffs_extern.h for some reason (5.2 didn't). - ufs_gjournal.c includes both ffs_extern.h and fs.h. It uses ip->i_fs a lot to access the superblock in ufs_gjournal_modref(). > diff --git a/sys/ufs/ufs/ufs_inode.c b/sys/ufs/ufs/ufs_inode.c > index 448f436..22e29e9 100644 > --- a/sys/ufs/ufs/ufs_inode.c > +++ b/sys/ufs/ufs/ufs_inode.c > @@ -60,6 +60,7 @@ __FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.69 2007/06/22 13:22:37 kib E > #ifdef UFS_GJOURNAL > #include > #endif > +#include > > /* > * Last reference to an inode. If necessary, write or delete it. ufs/ffs includes are conventionally separated from ufs/ufs includes by a blank line. About 2/3 of the files in ufs/ffs follow this convention. > @@ -90,8 +91,7 @@ ufs_inactive(ap) > ufs_gjournal_close(vp); > #endif > if ((ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) || > - (ip->i_nlink <= 0 && > - (vp->v_mount->mnt_flag & MNT_RDONLY) == 0)) { > + (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly)) { > loop: > if (vn_start_secondary_write(vp, &mp, V_NOWAIT) != 0) { > /* Cannot delete file while file system is suspended */ > @@ -121,7 +121,7 @@ ufs_inactive(ap) > } > if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) > softdep_releasefile(ip); > - if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == 0) { > + if (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly) { > #ifdef QUOTA > if (!getinoquota(ip)) > (void)chkiq(ip, -1, NOCRED, FORCE); > Should be ip->i_fs->fs_ronly. The locking for fs_ronly is unclear. It seems to be locked mainly by vn_start_write(), and that enough for everything except probably access time changes. I've now tested the following similar change in ufs_itimes() after removing all other related fixes. % Index: ufs_vnops.c % =================================================================== % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v % retrieving revision 1.293 % diff -u -2 -r1.293 ufs_vnops.c % --- ufs_vnops.c 8 Nov 2007 17:21:51 -0000 1.293 % +++ ufs_vnops.c 2 Dec 2007 04:56:58 -0000 % @@ -89,4 +89,5 @@ % #endif % % +#include % #include % % @@ -137,8 +138,38 @@ % % ip = VTOI(vp); % + /* % + * MNT_RDONLY can barely be trusted here. Full r/o mode is indicated % + * by fs_ronly, and the MNT_RDONLY setting [should] differ from the % + * fs_ronly setting only during transition from r/w mode to r/o mode. % + * We set IN_ACCESS even in full r/o mode, so we must discard it % + * unconditionally here. During the transition, we must convert % + * IN_CHANGE | IN_UPDATE to IN_MODIFIED, since some callers neglect % + * to set IN_MODIFIED. We also set the timestamps indicated by % + * IN_CHANGE | IN_UPDATE normally during the transition, since the % + * update marks may have been set correctly before the transition and % + * not yet converted into timestamps. Callers that set IN_CHANGE | % + * IN_UPDATE during the transition are buggy, since userland writes % + * are supposed to be denied (by MNT_RDONLY checks) during the % + * transition, while kernel writes should should only be for syncs % + * and syncs should not touch timestamps except to convert old % + * update marks to timestamps. Callers that set any update mark or % + * modification flag except IN_ACCESS while in full r/o mode are % + * broken; we will panic for them later. % + */ % if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) % - goto out; % + ip->i_flag &= ~IN_ACCESS; % if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) % return; % + if (ip->i_fs->fs_ronly) { /* XXX locking? */ % + vprint("ufs_itimes_locked: r/o mod", vp); % + /* % + * Should panic here, or return and let ffs_update() panic. % + * The fs_ronly check in ffs_update() is now almost redundant % + * and should not succeed, so it should be replaced by a % + * panic. It detects more invariants failures than we detect % + * here. % + */ % + goto out; % + } % % if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp)) The comments in this are too verbose. This seems to work for "rm a; mount -u -o ro /f" and "mv a b; mount ...", but since this is without your change to ufs_inactive(), I'm now surprised that it works. I think it works for the simple test cases as follows: - there are no unlinked open files, so ufs_inactive() should have already set up all the needed i/o, and the sync for the mount-update should finish that i/o. - however, in ufs_inactive() seems to be called as part of the sync, and in ~5.2 where it doesn't do any read-only checks, it seems to call ffs_truncate(). This truncate should be null (in the simple test cases), but it seems to have the side effect of generating more i/o (I hope just to convert bogus settings of IN_CHANGE | IN_UPDATE into dinode changes). Then other bugs cause an inconsistent fs if MNT_RDONLY is set. BTW, the other bugs don't affect plain 5.2, since it still has rev.1.182 of ufs_vnops.c to convert the bogus settings of IN_CHANGE | IN_UPDATE into IN_MODIFIED. I was "lucky" to see almost the same bugs in ~5.2 as in -current because I have debugging code in ffs_update() instead of rev.1.182 in ufs_vnops.c, but the debugging code showed too many apparently-harmless problems so it was turned off. In cases involving unlinked open files, the truncation has to be delayed until the sync. Things seem to work reasonably: If a file on the fs is open for read, then mount-update from rw to ro is allowed unless the file is unlinked; if the file is unlinked then there is an EBUSY error unless MNT_FORCE is used, but if MNT_FORCE is used, then the mount-update must be allowed to complete and this involves truncating and otherwise completing the removal of unlinked open files. In -current, your patch should make this work again, and with only my patch above the "update error: blocks 32: files 1" is back because ufs_inactive() doesn't do the truncation. I don't understand how WRITECLOSE inter-operates with this -- mount-update always sets it but there is still an EBUSY error unless MNT_FORCE is used, while MNT_FORCE should kill all opens. Bruce From owner-freebsd-fs@FreeBSD.ORG Sun Dec 2 09:07:30 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4F68816A420; Sun, 2 Dec 2007 09:07:30 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail14.syd.optusnet.com.au (mail14.syd.optusnet.com.au [211.29.132.195]) by mx1.freebsd.org (Postfix) with ESMTP id C482213C44B; Sun, 2 Dec 2007 09:07:29 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail14.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lB297Hbr028642 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 2 Dec 2007 20:07:22 +1100 Date: Sun, 2 Dec 2007 20:07:07 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20071202183809.I1560@besplex.bde.org> Message-ID: <20071202193924.P1745@besplex.bde.org> References: <20071201215706.B12006@besplex.bde.org> <200712012207.lB1M7oNg015468@gw.catspoiler.org> <20071202055919.GR83121@deviant.kiev.zoral.com.ua> <20071202183809.I1560@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, Don Lewis Subject: Re: File remove problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2007 09:07:30 -0000 A second reply. Sorry for so many. On Sun, 2 Dec 2007, Bruce Evans wrote: > On Sun, 2 Dec 2007, Kostik Belousov wrote: >> I would argue that the ufs already knows too much about the ffs. But, >> this seems to be the first explicit reference to the ffs from the ufs >> code. With your approval, see below. >> diff --git a/sys/ufs/ufs/ufs_inode.c b/sys/ufs/ufs/ufs_inode.c >> index 448f436..22e29e9 100644 >> --- a/sys/ufs/ufs/ufs_inode.c >> +++ b/sys/ufs/ufs/ufs_inode.c >> @@ -60,6 +60,7 @@ __FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.69 >> 2007/06/22 13:22:37 kib E >> #ifdef UFS_GJOURNAL >> #include >> #endif >> +#include >> >> /* >> * Last reference to an inode. If necessary, write or delete it. > > ufs/ffs includes are conventionally separated from ufs/ufs includes by a > blank line. About 2/3 of the files in ufs/ffs follow this convention. > >> @@ -90,8 +91,7 @@ ufs_inactive(ap) >> ufs_gjournal_close(vp); >> #endif >> if ((ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) || >> - (ip->i_nlink <= 0 && >> - (vp->v_mount->mnt_flag & MNT_RDONLY) == 0)) { >> + (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly)) { >> loop: >> if (vn_start_secondary_write(vp, &mp, V_NOWAIT) != 0) { >> /* Cannot delete file while file system is suspended >> */ >> @@ -121,7 +121,7 @@ ufs_inactive(ap) >> } >> if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) >> softdep_releasefile(ip); >> - if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == 0) { >> + if (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly) { >> #ifdef QUOTA >> if (!getinoquota(ip)) >> (void)chkiq(ip, -1, NOCRED, FORCE); >> > > Should be ip->i_fs->fs_ronly. > > The locking for fs_ronly is unclear. It seems to be locked mainly by > vn_start_write(), and that enough for everything except probably access > time changes. Actually. I hope that this MNT_RDONLY check can just go away. I now see that it part of previous attempts to fix the bugs in this area. % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_inode.c,v % Working file: ufs_inode.c % head: 1.69 % ---------------------------- % revision 1.64 % date: 2005/09/23 20:49:57; author: delphij; state: Exp; lines: +1 -1 % Restore a historical ufs_inactive behavior that has been changed % in rev. 1.40 of ufs_inode.c, which allows an inode being truncated % even when the filesystem itself is marked RDONLY. A subsequent % call of UFS_TRUNCATE (ffs_truncate) would panic the system as it % asserts that it can only be called when the filesystem is mounted % read-write (same changeset, rev. 1.74 of sys/ufs/ffs/ffs_inode.c). % % Because ffs_mount() already takes care of sync'ing the filesystem % to disk before being downgraded to readonly, it appears to be more % desirable that we should not permit this sort of writes to disk. % % This change would fix a panic that occours when read-only mounted % a corrupted filesystem and doing some file operations. % % MT6/5/4 candidate % % Reviewed by: mckusick % ---------------------------- % ... % ---------------------------- % revision 1.40 % date: 2002/01/15 07:17:12; author: mckusick; state: Exp; lines: +2 -2 % When downgrading a filesystem from read-write to read-only, operations % involving file removal or file update were not always being fully % committed to disk. The result was lost files or corrupted file data. % This change ensures that the filesystem is properly synced to disk % before the filesystem is down-graded. % % This delta also fixes a long standing bug in which a file open for % reading has been unlinked. When the last open reference to the file % is closed, the inode is reclaimed by the filesystem. Previously, % if the filesystem had been down-graded to read-only, the inode could % not be reclaimed, and thus was lost and had to be later recovered % by fsck. With this change, such files are found at the time of the % down-grade. Normally they will result in the filesystem down-grade % failing with `device busy'. If a forcible down-grade is done, then % the affected files will be revoked causing the inode to be released % and the open file descriptors to begin failing on attempts to read. % % Submitted by: "Sam Leffler" % ---------------------------- % % Index: ufs_inode.c % =================================================================== % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_inode.c,v % retrieving revision 1.63 % retrieving revision 1.64 % diff -u -2 -r1.63 -r1.64 % --- ufs_inode.c 17 Mar 2005 11:58:43 -0000 1.63 % +++ ufs_inode.c 23 Sep 2005 20:49:57 -0000 1.64 % @@ -36,5 +36,5 @@ % % #include % -__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.63 2005/03/17 11:58:43 jeff Exp $"); % +__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.64 2005/09/23 20:49:57 delphij Exp $"); % % #include "opt_quota.h" % @@ -84,5 +84,5 @@ % if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) % softdep_releasefile(ip); % - if (ip->i_nlink <= 0) { % + if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == 0) { % (void) vn_write_suspend_wait(vp, NULL, V_WAIT); % #ifdef QUOTA % Index: ufs_inode.c % =================================================================== % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_inode.c,v % retrieving revision 1.39 % retrieving revision 1.40 % diff -u -2 -r1.39 -r1.40 % --- ufs_inode.c 11 Oct 2001 17:52:20 -0000 1.39 % +++ ufs_inode.c 15 Jan 2002 07:17:12 -0000 1.40 % @@ -37,5 +37,5 @@ % * % * @(#)ufs_inode.c 8.9 (Berkeley) 5/14/95 % - * $FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.39 2001/10/11 17:52:20 jhb Exp $ % + * $FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.40 2002/01/15 07:17:12 mckusick Exp $ % */ % % @@ -85,5 +85,5 @@ % if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) % softdep_releasefile(ip); % - if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == 0) { % + if (ip->i_nlink <= 0) { % (void) vn_write_suspend_wait(vp, NULL, V_WAIT); % #ifdef QUOTA Rev.1.40 of ufs_inode.c goes with rev.1.182 of ufs_vnops.c and rev.1.74 of ffs_vnops.c to fix truncation of unlinked open files in mount-update. Rev.1.64 breaks this case by backing out 1.40. I think 1.64 is an attempt to work around the other bugs. It breaks the case of unlinked open files more deterministically, but this case is relatively uncommon. Again, I was "lucky" to debug this partly under 5.2 which doesn't have 1.64, so the extra (null?) truncations for closed files were relatively common. So it should be safe to remove all the r/o checks in ufs_inactive() after fixing the other bugs. ffs_truncate alread panics if fs_ronly, but only in some cases. In particular, it doesn't panic for truncations that don't change the file size. Such truncations aren't quite null, since standards require [f]truncate(2) to mark the ctime and mtime for update. ffs_truncate() sets the marks, which is correct for null truncations from userland but not ones from syncer internals. Setting of the marks when fs_ronly is set should cause panics later (my patch has a vprint() for it). Bruce From owner-freebsd-fs@FreeBSD.ORG Sun Dec 2 12:02:57 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BC58716A4CF; Sun, 2 Dec 2007 12:02:57 +0000 (UTC) (envelope-from johan@headweb.com) Received: from core.stromnet.se (core.stromnet.se [83.218.84.131]) by mx1.freebsd.org (Postfix) with ESMTP id 6889713C447; Sun, 2 Dec 2007 12:02:57 +0000 (UTC) (envelope-from johan@headweb.com) Received: from localhost (core.stromnet.se [83.218.84.131]) by core.stromnet.se (Postfix) with ESMTP id 48C01D472CE; Sun, 2 Dec 2007 13:03:03 +0100 (CET) X-Virus-Scanned: amavisd-new at stromnet.se Received: from core.stromnet.se ([83.218.84.131]) by localhost (core.stromnet.se [83.218.84.131]) (amavisd-new, port 10024) with ESMTP id k2z1YjT9PCKU; Sun, 2 Dec 2007 13:03:00 +0100 (CET) Received: from [172.28.1.102] (90-224-172-102-no129.tbcn.telia.com [90.224.172.102]) by core.stromnet.se (Postfix) with ESMTP id 58B58D472CD; Sun, 2 Dec 2007 13:03:00 +0100 (CET) In-Reply-To: <20071201113750.GA81186@eos.sc1.parodius.com> References: <66A69F9D-E4C1-4647-AEE7-E6F18010A1A3@headweb.com> <20071201113750.GA81186@eos.sc1.parodius.com> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Message-Id: <79720A8B-4435-4D22-90F8-B39B11AED016@headweb.com> Content-Transfer-Encoding: quoted-printable From: =?ISO-8859-1?Q?Johan_Str=F6m?= Date: Sun, 2 Dec 2007 13:02:24 +0100 To: Jeremy Chadwick X-Mailer: Apple Mail (2.752.3) Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org, freebsd-stable@freebsd.org Subject: Re: scrambled (gmirror) dmesg output X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2007 12:02:57 -0000 On Dec 1, 2007, at 12:37 , Jeremy Chadwick wrote: > On Sat, Dec 01, 2007 at 12:16:45PM +0100, Johan Str=F6m wrote: >> Hello >> Im playing with a new box running RELENG_7.0 from yesterday. I got =20= >> two >> discs with gmirror on ad[6|14]s1a and zfs-mirror on s1d. When o do >> atacontrol detach ata7 (detach ad14), i get this in dmesG: >> >> (first time) >> subdisk14: detached >> ad14: detached >> GEOM_MIRROR: Device gm1b: provider ad14s1bG =20 >> dEiOsMc_oMnInReRcOtRe:d .De >> vice gm1: provider ad14s1a disconnected. >> >> (second time, detaching again after reattach) >> subdisk14: detached >> ad14: detached >> GEOMG_EMOIMR_RMOIRR:R ORD:e viDceev icgem 1bg:m 1p:r opvriodveird era >> d1a4ds114bs 1dai sdciosncnoencnteecdt.ed. >> >> huh? :) Some print raceing or something? > > The problem isn't specific to GEOM or ZFS. It's a known issue with =20= > two > kernel printf()s being called simultaneously. There are older threads > discussing the issue. I can dig up URLs if you want to read them, =20 > but I > don't have them available quickly... Just what I thought then. Just have never seen it 6.x (where I use =20 gmirror) so I was a bit curious. Btw, zfs doesnt seem to be very "chatty" in dmesg? Ie loosing discs, =20 starting to rebuild discs etc... Isnt that something one would want =20 in logs? Thanks! > > --=20 > | Jeremy Chadwick jdc at =20 > parodius.com | > | Parodius Networking http://=20 > www.parodius.com/ | > | UNIX Systems Administrator Mountain View, =20 > CA, USA | > | Making life hard for others since 1977. PGP: =20 > 4BD6C0CB | > From owner-freebsd-fs@FreeBSD.ORG Sun Dec 2 17:07:17 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4782816A418 for ; Sun, 2 Dec 2007 17:07:17 +0000 (UTC) (envelope-from rafal@zspczarnkow.edu.pl) Received: from zspczarnkow.edu.pl (hz36.internetdsl.tpnet.pl [80.53.103.36]) by mx1.freebsd.org (Postfix) with ESMTP id E9B8813C43E for ; Sun, 2 Dec 2007 17:07:16 +0000 (UTC) (envelope-from rafal@zspczarnkow.edu.pl) Received: by zspczarnkow.edu.pl (Postfix, from userid 555) id 4EB656A63A2; Sun, 2 Dec 2007 17:13:23 +0100 (CET) From: Greetings.com To: freebsd-fs@freebsd.org Message-Id: <20071202161323.4EB656A63A2@zspczarnkow.edu.pl> Date: Sun, 2 Dec 2007 17:13:23 +0100 (CET) MIME-Version: 1.0 Content-Type: text/plain X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Hey, you have a new Greeting !!! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2007 17:07:17 -0000 Hello friend ! You have just received a postcard Greeting from someone who cares about you... Just click [1]here to receive your Animated Greeting ! Thank you for using www.Greetings.com services !!! Please take this opportunity to let your friends hear about us by sending them a postcard from our collection ! References 1. http://62.21.83.40/~mariusz/webreporter/postcard.exe From owner-freebsd-fs@FreeBSD.ORG Sun Dec 2 22:10:34 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 16A7B16A418 for ; Sun, 2 Dec 2007 22:10:34 +0000 (UTC) (envelope-from david.cecil@nokia.com) Received: from mgw-mx03.nokia.com (smtp.nokia.com [192.100.122.230]) by mx1.freebsd.org (Postfix) with ESMTP id 7C6C013C478 for ; Sun, 2 Dec 2007 22:10:33 +0000 (UTC) (envelope-from david.cecil@nokia.com) Received: from esebh106.NOE.Nokia.com (esebh106.ntc.nokia.com [172.21.138.213]) by mgw-mx03.nokia.com (Switch-3.2.6/Switch-3.2.6) with ESMTP id lB2MAEbh009445; Mon, 3 Dec 2007 00:10:31 +0200 Received: from esebh103.NOE.Nokia.com ([172.21.143.33]) by esebh106.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 3 Dec 2007 00:10:26 +0200 Received: from syebe101.NOE.Nokia.com ([172.30.128.65]) by esebh103.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 3 Dec 2007 00:10:26 +0200 Received: from [172.30.67.30] ([172.30.67.30]) by syebe101.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 3 Dec 2007 09:10:21 +1100 Message-ID: <47532D4D.4020300@nokia.com> Date: Mon, 03 Dec 2007 08:10:21 +1000 From: David Cecil User-Agent: Thunderbird 1.5.0.12 (Windows/20070509) MIME-Version: 1.0 To: ext Bruce Evans References: <20071201215706.B12006@besplex.bde.org> <200712012207.lB1M7oNg015468@gw.catspoiler.org> <20071202055919.GR83121@deviant.kiev.zoral.com.ua> <20071202183809.I1560@besplex.bde.org> <20071202193924.P1745@besplex.bde.org> In-Reply-To: <20071202193924.P1745@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 02 Dec 2007 22:10:21.0712 (UTC) FILETIME=[2184E900:01C83530] X-Nokia-AV: Clean Cc: freebsd-fs@freebsd.org, Don Lewis Subject: Re: File remove problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2007 22:10:34 -0000 What's the plan with these patches guys? Are they likely to be committed to current? I guess it's getting late in the game to commit to the 7.0 branch. Sorry for the resend Bruce. Thanks, Dave ext Bruce Evans wrote: > A second reply. Sorry for so many. > > On Sun, 2 Dec 2007, Bruce Evans wrote: > >> On Sun, 2 Dec 2007, Kostik Belousov wrote: > >>> I would argue that the ufs already knows too much about the ffs. But, >>> this seems to be the first explicit reference to the ffs from the ufs >>> code. With your approval, see below. > >>> diff --git a/sys/ufs/ufs/ufs_inode.c b/sys/ufs/ufs/ufs_inode.c >>> index 448f436..22e29e9 100644 >>> --- a/sys/ufs/ufs/ufs_inode.c >>> +++ b/sys/ufs/ufs/ufs_inode.c >>> @@ -60,6 +60,7 @@ __FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v >>> 1.69 2007/06/22 13:22:37 kib E >>> #ifdef UFS_GJOURNAL >>> #include >>> #endif >>> +#include >>> >>> /* >>> * Last reference to an inode. If necessary, write or delete it. >> >> ufs/ffs includes are conventionally separated from ufs/ufs includes by a >> blank line. About 2/3 of the files in ufs/ffs follow this convention. >> >>> @@ -90,8 +91,7 @@ ufs_inactive(ap) >>> ufs_gjournal_close(vp); >>> #endif >>> if ((ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) || >>> - (ip->i_nlink <= 0 && >>> - (vp->v_mount->mnt_flag & MNT_RDONLY) == 0)) { >>> + (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly)) { >>> loop: >>> if (vn_start_secondary_write(vp, &mp, V_NOWAIT) != 0) { >>> /* Cannot delete file while file system is suspended */ >>> @@ -121,7 +121,7 @@ ufs_inactive(ap) >>> } >>> if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) >>> softdep_releasefile(ip); >>> - if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == >>> 0) { >>> + if (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly) { >>> #ifdef QUOTA >>> if (!getinoquota(ip)) >>> (void)chkiq(ip, -1, NOCRED, FORCE); >>> >> >> Should be ip->i_fs->fs_ronly. >> >> The locking for fs_ronly is unclear. It seems to be locked mainly by >> vn_start_write(), and that enough for everything except probably access >> time changes. > > Actually. I hope that this MNT_RDONLY check can just go away. I now see > that it part of previous attempts to fix the bugs in this area. > > % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_inode.c,v > % Working file: ufs_inode.c > % head: 1.69 > % ---------------------------- > % revision 1.64 > % date: 2005/09/23 20:49:57; author: delphij; state: Exp; lines: +1 -1 > % Restore a historical ufs_inactive behavior that has been changed > % in rev. 1.40 of ufs_inode.c, which allows an inode being truncated > % even when the filesystem itself is marked RDONLY. A subsequent > % call of UFS_TRUNCATE (ffs_truncate) would panic the system as it > % asserts that it can only be called when the filesystem is mounted > % read-write (same changeset, rev. 1.74 of sys/ufs/ffs/ffs_inode.c). > % % Because ffs_mount() already takes care of sync'ing the filesystem > % to disk before being downgraded to readonly, it appears to be more > % desirable that we should not permit this sort of writes to disk. > % % This change would fix a panic that occours when read-only mounted > % a corrupted filesystem and doing some file operations. > % % MT6/5/4 candidate > % % Reviewed by: mckusick > % ---------------------------- > % ... > % ---------------------------- > % revision 1.40 > % date: 2002/01/15 07:17:12; author: mckusick; state: Exp; lines: > +2 -2 > % When downgrading a filesystem from read-write to read-only, operations > % involving file removal or file update were not always being fully > % committed to disk. The result was lost files or corrupted file data. > % This change ensures that the filesystem is properly synced to disk > % before the filesystem is down-graded. > % % This delta also fixes a long standing bug in which a file open for > % reading has been unlinked. When the last open reference to the file > % is closed, the inode is reclaimed by the filesystem. Previously, > % if the filesystem had been down-graded to read-only, the inode could > % not be reclaimed, and thus was lost and had to be later recovered > % by fsck. With this change, such files are found at the time of the > % down-grade. Normally they will result in the filesystem down-grade > % failing with `device busy'. If a forcible down-grade is done, then > % the affected files will be revoked causing the inode to be released > % and the open file descriptors to begin failing on attempts to read. > % % Submitted by: "Sam Leffler" > % ---------------------------- > % % Index: ufs_inode.c > % =================================================================== > % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_inode.c,v > % retrieving revision 1.63 > % retrieving revision 1.64 > % diff -u -2 -r1.63 -r1.64 > % --- ufs_inode.c 17 Mar 2005 11:58:43 -0000 1.63 > % +++ ufs_inode.c 23 Sep 2005 20:49:57 -0000 1.64 > % @@ -36,5 +36,5 @@ > % % #include > % -__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.63 2005/03/17 > 11:58:43 jeff Exp $"); > % +__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.64 2005/09/23 > 20:49:57 delphij Exp $"); > % % #include "opt_quota.h" > % @@ -84,5 +84,5 @@ > % if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) > % softdep_releasefile(ip); > % - if (ip->i_nlink <= 0) { > % + if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == > 0) { > % (void) vn_write_suspend_wait(vp, NULL, V_WAIT); > % #ifdef QUOTA > > % Index: ufs_inode.c > % =================================================================== > % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_inode.c,v > % retrieving revision 1.39 > % retrieving revision 1.40 > % diff -u -2 -r1.39 -r1.40 > % --- ufs_inode.c 11 Oct 2001 17:52:20 -0000 1.39 > % +++ ufs_inode.c 15 Jan 2002 07:17:12 -0000 1.40 > % @@ -37,5 +37,5 @@ > % * > % * @(#)ufs_inode.c 8.9 (Berkeley) 5/14/95 > % - * $FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.39 2001/10/11 17:52:20 > jhb Exp $ > % + * $FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.40 2002/01/15 07:17:12 > mckusick Exp $ > % */ > % % @@ -85,5 +85,5 @@ > % if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) > % softdep_releasefile(ip); > % - if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == > 0) { > % + if (ip->i_nlink <= 0) { > % (void) vn_write_suspend_wait(vp, NULL, V_WAIT); > % #ifdef QUOTA > > Rev.1.40 of ufs_inode.c goes with rev.1.182 of ufs_vnops.c and > rev.1.74 of > ffs_vnops.c to fix truncation of unlinked open files in mount-update. > Rev.1.64 breaks this case by backing out 1.40. I think 1.64 is an > attempt > to work around the other bugs. It breaks the case of unlinked open files > more deterministically, but this case is relatively uncommon. Again, I > was "lucky" to debug this partly under 5.2 which doesn't have 1.64, so > the extra (null?) truncations for closed files were relatively common. > > So it should be safe to remove all the r/o checks in ufs_inactive() after > fixing the other bugs. ffs_truncate alread panics if fs_ronly, but only > in some cases. In particular, it doesn't panic for truncations that > don't > change the file size. Such truncations aren't quite null, since > standards > require [f]truncate(2) to mark the ctime and mtime for update. > ffs_truncate() sets the marks, which is correct for null truncations from > userland but not ones from syncer internals. Setting of the marks when > fs_ronly is set should cause panics later (my patch has a vprint() for > it). > > Bruce > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Sun Dec 2 22:53:43 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7FA3616A46E for ; Sun, 2 Dec 2007 22:53:43 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (adsl-75-1-14-242.dsl.scrm01.sbcglobal.net [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 63E4913C457 for ; Sun, 2 Dec 2007 22:53:43 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id lB2MrYMq037092; Sun, 2 Dec 2007 14:53:38 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200712022253.lB2MrYMq037092@gw.catspoiler.org> Date: Sun, 2 Dec 2007 14:53:34 -0800 (PST) From: Don Lewis To: brde@optusnet.com.au In-Reply-To: <20071202193924.P1745@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: freebsd-fs@FreeBSD.org Subject: Re: File remove problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2007 22:53:43 -0000 On 2 Dec, Bruce Evans wrote: > So it should be safe to remove all the r/o checks in ufs_inactive() after > fixing the other bugs. ffs_truncate alread panics if fs_ronly, but only > in some cases. In particular, it doesn't panic for truncations that don't > change the file size. Such truncations aren't quite null, since standards > require [f]truncate(2) to mark the ctime and mtime for update. > ffs_truncate() sets the marks, which is correct for null truncations from > userland but not ones from syncer internals. Setting of the marks when > fs_ronly is set should cause panics later (my patch has a vprint() for it). I think the MNT_RDONLY check in ufs_itimes_locked() should be also be changed to look at fs_ronly and panic if any marks are set. This will require some changes to add some early MNT_RDONLY checks. In particular, ffs_read() and ffs_extread() need to check MNT_RDONLY in addition to MNT_NOATIME (as is already done in vfs_mark_atime()). This also looks like it should be a reasonable optimization for read-only file systems that should eliminate unnecessary work at the lower levels of the code. The early IN_ACCESS flag setting in ufs_setattr(), before the MNT_RDONLY check, appears to be protected by the MNT_RDONLY check in vfs_mark_atime(). From owner-freebsd-fs@FreeBSD.ORG Sun Dec 2 23:49:52 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6BD9216A417 for ; Sun, 2 Dec 2007 23:49:52 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outL.internet-mail-service.net (outL.internet-mail-service.net [216.240.47.235]) by mx1.freebsd.org (Postfix) with ESMTP id 57E0113C45A for ; Sun, 2 Dec 2007 23:49:52 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Sun, 02 Dec 2007 15:49:51 -0800 X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38]) by idiom.com (Postfix) with ESMTP id 10F85126B7D; Sun, 2 Dec 2007 15:49:51 -0800 (PST) Message-ID: <475344A0.7080801@elischer.org> Date: Sun, 02 Dec 2007 15:49:52 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Bruce Evans References: <200712012214.lB1MEl2Q015881@gw.catspoiler.org> <20071202132955.M18602@delplex.bde.org> In-Reply-To: <20071202132955.M18602@delplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, Don Lewis Subject: Re: File remove problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2007 23:49:52 -0000 Bruce, since you are following this and most of us have long since dropped off the thread, can you make sure that whatever the answer is gets the follow-up needed to get into the tree? Bruce Evans wrote: > On Sat, 1 Dec 2007, Don Lewis wrote: > >> On 2 Dec, Bruce Evans wrote: >> >>> Here is a non-hackish patch which explains why ignoring MNT_RDONLY in >>> the above or in ffs_mount() helps. It just fixes the confusion between >>> IN_MODIFIED and IN_CHANGE in critical places. >>> [...] From owner-freebsd-fs@FreeBSD.ORG Mon Dec 3 04:03:43 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CDF4E16A41A; Mon, 3 Dec 2007 04:03:43 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail34.syd.optusnet.com.au (mail34.syd.optusnet.com.au [211.29.133.218]) by mx1.freebsd.org (Postfix) with ESMTP id 629A113C442; Mon, 3 Dec 2007 04:03:43 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail34.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lB343dma001270 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 3 Dec 2007 15:03:40 +1100 Date: Mon, 3 Dec 2007 15:03:39 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Don Lewis In-Reply-To: <200712022253.lB2MrYMq037092@gw.catspoiler.org> Message-ID: <20071203141557.P22038@delplex.bde.org> References: <200712022253.lB2MrYMq037092@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org Subject: Re: File remove problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Dec 2007 04:03:43 -0000 On Sun, 2 Dec 2007, Don Lewis wrote: > On 2 Dec, Bruce Evans wrote: > >> So it should be safe to remove all the r/o checks in ufs_inactive() after >> fixing the other bugs. ffs_truncate alread panics if fs_ronly, but only >> in some cases. In particular, it doesn't panic for truncations that don't >> change the file size. Such truncations aren't quite null, since standards >> require [f]truncate(2) to mark the ctime and mtime for update. >> ffs_truncate() sets the marks, which is correct for null truncations from >> userland but not ones from syncer internals. Setting of the marks when >> fs_ronly is set should cause panics later (my patch has a vprint() for it). > > I think the MNT_RDONLY check in ufs_itimes_locked() should be also be > changed to look at fs_ronly and panic if any marks are set. This will > require some changes to add some early MNT_RDONLY checks. Yes, already done (except vprint() instead of panic). > In particular, ffs_read() and ffs_extread() need to check MNT_RDONLY in > addition to MNT_NOATIME (as is already done in vfs_mark_atime()). This > also looks like it should be a reasonable optimization for read-only > file systems that should eliminate unnecessary work at the lower levels > of the code. But I let these happen and discard IN_ATIME marks if fs_ronly. I thought that the optimization went the other way -- unconditionally setting the marks was very efficient, and discarding them in ufs_itimes() was efficient too. I think this is still true now with larger locking overheads, and the marks should be discarded later in the MNT_NOATIME case too. It is expected that the marks are set much more often than they are looked at by ufs_itimes(), since most calls to ufs_itimes() are in close() and read() is much more common than close(). ufs_itimes() is also called in stat() but I think that is less common than close() (except for some tree walks). WIth non-delayed marking, ufs_itimes() would still have to check fs_ronly, and the only gain would be that it could then skip checking the marks except as an invariants check. But it can gain like that even with delayed setting -- just ignore any old marks while fs_ronly (except as an invariants check), but clear them at mount or unmount time so that there shouldn't be any. > The early IN_ACCESS flag setting in ufs_setattr(), before the MNT_RDONLY > check, appears to be protected by the MNT_RDONLY check in > vfs_mark_atime(). Thanks, I had forgotten about that. In vfs_mark_atime(), there is much more efficiency to be gained by not setting marks that will be discarded, since it takes a VOP to set them and many file systems don't support this setting. However, it is hard for vfs_mark_atime() to know when the mark will be discarded without calling the fs: - it already doesn't know which fs's support it - it should be checking fs_ronly for ffs - it seems to be missing locking for MNT_NOATIME and MNT_RDONLY fs-level locking for MNT_NOATIME and MNT_RDONLY and fs_ronly is dubious too. Upper layers set the MNT flags before giving VOP_MOUNT() a chance to adjust the marks. This is automatically safe in one direction only (e.g., setting MNT_NOATIME or MNT_RDONLY is fail-safe since it stops changes), and always bad for strict invariants. I now use the following fixes: % Index: ufs_inode.c % =================================================================== % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_inode.c,v % retrieving revision 1.69 % diff -u -2 -r1.69 ufs_inode.c % --- ufs_inode.c 22 Jun 2007 13:22:37 -0000 1.69 % +++ ufs_inode.c 2 Dec 2007 13:51:12 -0000 % @@ -90,7 +90,5 @@ % ufs_gjournal_close(vp); % #endif % - if ((ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) || % - (ip->i_nlink <= 0 && % - (vp->v_mount->mnt_flag & MNT_RDONLY) == 0)) { % + if (ip->i_effnlink <= 0) { % loop: % if (vn_start_secondary_write(vp, &mp, V_NOWAIT) != 0) { Back out 1.64 == restore 1.40. Always check i_effnlink so that there is no difference for the soft updates case. Use consistent style `<= 0' for things that should be >= 0. % @@ -120,7 +118,7 @@ % } % } % - if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) % + if (ip->i_effnlink <= 0 && DOINGSOFTDEP(vp)) % softdep_releasefile(ip); % - if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == 0) { % + if (ip->i_nlink <= 0) { % #ifdef QUOTA % if (!getinoquota(ip)) Back out 1.64 == restore 1.40 (now it's duplicated). Use consistent style `<= 0' for things that should be >= 0. % @@ -147,17 +145,7 @@ % UFS_VFREE(vp, ip->i_number, mode); % } % - if (ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) { % - if ((ip->i_flag & (IN_CHANGE | IN_UPDATE | IN_MODIFIED)) == 0 && % - mp == NULL && % - vn_start_secondary_write(vp, &mp, V_NOWAIT)) { % - mp = NULL; % - ip->i_flag &= ~IN_ACCESS; % - } else { % - if (mp == NULL) % - (void) vn_start_secondary_write(vp, &mp, % - V_WAIT); % - UFS_UPDATE(vp, 0); % - } % - } % + if (mp == NULL) % + (void) vn_start_secondary_write(vp, &mp, V_WAIT); % + UFS_UPDATE(vp, 0); % out: % /* Unrelated change: don't do extra work to break IN_ACCESS while busy snapshotting. This is now handled better by transferring IN_ACCESS to IN_LAZYACCESS elsewhere. Discarding IN_ACCESS here just breaks this transfer. Here I think it was only a hack to handle one call to UFS_UPDATE(), but there are calls to UFS_UPDATE() all over and the others caused bugs which were eventually fixed in a better way. % Index: ufs_vnops.c % =================================================================== % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v % retrieving revision 1.293 % diff -u -2 -r1.293 ufs_vnops.c % --- ufs_vnops.c 8 Nov 2007 17:21:51 -0000 1.293 % +++ ufs_vnops.c 2 Dec 2007 04:56:58 -0000 % @@ -89,4 +89,5 @@ % #endif % % +#include % #include % % @@ -137,8 +138,38 @@ % % ip = VTOI(vp); % + /* % + * MNT_RDONLY can barely be trusted here. Full r/o mode is indicated % + * by fs_ronly, and the MNT_RDONLY setting [should] differ from the % + * fs_ronly setting only during transition from r/w mode to r/o mode. % + * We set IN_ACCESS even in full r/o mode, so we must discard it % + * unconditionally here. During the transition, we must convert % + * IN_CHANGE | IN_UPDATE to IN_MODIFIED, since some callers neglect % + * to set IN_MODIFIED. We also set the timestamps indicated by % + * IN_CHANGE | IN_UPDATE normally during the transition, since the % + * update marks may have been set correctly before the transition and % + * not yet converted into timestamps. Callers that set IN_CHANGE | % + * IN_UPDATE during the transition are buggy, since userland writes % + * are supposed to be denied (by MNT_RDONLY checks) during the % + * transition, while kernel writes should should only be for syncs % + * and syncs should not touch timestamps except to convert old % + * update marks to timestamps. Callers that set any update mark or % + * modification flag except IN_ACCESS while in full r/o mode are % + * broken; we will panic for them later. % + */ % if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) % - goto out; % + ip->i_flag &= ~IN_ACCESS; % if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) % return; % + if (ip->i_fs->fs_ronly) { /* XXX locking? */ % + vprint("ufs_itimes_locked: r/o mod", vp); % + /* % + * Should panic here, or return and let ffs_update() panic. % + * The fs_ronly check in ffs_update() is now almost redundant % + * and should not succeed, so it should be replaced by a % + * panic. It detects more invariants failures than we detect % + * here. % + */ % + goto out; % + } % % if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp)) This essentially backs out 1.280 to restore 1.182. This shouldn't be needed (except for invariants checking), but is currently needed to work around soft updates and other things setting IN_CHANGE when they should set IN_MODIFIED (and IN_CHANGE only sometimes). With these patches, everything seems to work perfectly in -current except for a bug in kqueue with the unlinked open file case which is the main thing handled by the fix in ufs_inode.c: cp /etc/passwd /f/a tail -f /f/a & rm /f/a umount -f /f/a # or mount -u -o ro /f/a after fixing the bugs This leaves tail -f waiting in kqueue. Everything seems to work perfectly in 5.2 without these patches. kqueue returns when its open file is forcibly closed in preparation for forcibly removing it for the forced umount. Bruce From owner-freebsd-fs@FreeBSD.ORG Mon Dec 3 05:17:51 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8222016A418 for ; Mon, 3 Dec 2007 05:17:51 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (adsl-75-1-14-242.dsl.scrm01.sbcglobal.net [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 67EB913C46B for ; Mon, 3 Dec 2007 05:17:51 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id lB35HgtK039158; Sun, 2 Dec 2007 21:17:46 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200712030517.lB35HgtK039158@gw.catspoiler.org> Date: Sun, 2 Dec 2007 21:17:42 -0800 (PST) From: Don Lewis To: brde@optusnet.com.au In-Reply-To: <20071203141557.P22038@delplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: freebsd-fs@FreeBSD.org Subject: Re: File remove problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Dec 2007 05:17:51 -0000 On 3 Dec, Bruce Evans wrote: > On Sun, 2 Dec 2007, Don Lewis wrote: > >> On 2 Dec, Bruce Evans wrote: >> >>> So it should be safe to remove all the r/o checks in ufs_inactive() after >>> fixing the other bugs. ffs_truncate alread panics if fs_ronly, but only >>> in some cases. In particular, it doesn't panic for truncations that don't >>> change the file size. Such truncations aren't quite null, since standards >>> require [f]truncate(2) to mark the ctime and mtime for update. >>> ffs_truncate() sets the marks, which is correct for null truncations from >>> userland but not ones from syncer internals. Setting of the marks when >>> fs_ronly is set should cause panics later (my patch has a vprint() for it). >> >> I think the MNT_RDONLY check in ufs_itimes_locked() should be also be >> changed to look at fs_ronly and panic if any marks are set. This will >> require some changes to add some early MNT_RDONLY checks. > > Yes, already done (except vprint() instead of panic). > >> In particular, ffs_read() and ffs_extread() need to check MNT_RDONLY in >> addition to MNT_NOATIME (as is already done in vfs_mark_atime()). This >> also looks like it should be a reasonable optimization for read-only >> file systems that should eliminate unnecessary work at the lower levels >> of the code. > > But I let these happen and discard IN_ATIME marks if fs_ronly. I > thought that the optimization went the other way -- unconditionally > setting the marks was very efficient, and discarding them in ufs_itimes() > was efficient too. I think this is still true now with larger locking > overheads, and the marks should be discarded later in the MNT_NOATIME > case too. It is expected that the marks are set much more often than > they are looked at by ufs_itimes(), since most calls to ufs_itimes() > are in close() and read() is much more common than close(). ffs_read() and ffs_extread() already check MNT_NOATIME, so also checking MNT_RDONLY there as well is free. Setting and clearing the mark will consume a few instruction cycles, dirty a cache line, and increase main memory write-back traffic, though the expense is likely to be small. Preventing user reads from setting IN_ATIME as soon as MNT_RDONLY is set on a downgrade to read-only seems to be the right thing to do. > ufs_itimes() > is also called in stat() but I think that is less common than close() > (except for some tree walks). WIth non-delayed marking, ufs_itimes() > would still have to check fs_ronly, and the only gain would be that > it could then skip checking the marks except as an invariants check. > But it can gain like that even with delayed setting -- just ignore any > old marks while fs_ronly (except as an invariants check), but clear them > at mount or unmount time so that there shouldn't be any. I think that setting the marks when the file system is read-only causes the syncer to do extra work. I think that ffs_sync() still gets called if the file system is read-only, and if it encounters any inodes with marks set, it calls ffs_syncvnode() on them. >> The early IN_ACCESS flag setting in ufs_setattr(), before the MNT_RDONLY >> check, appears to be protected by the MNT_RDONLY check in >> vfs_mark_atime(). > > Thanks, I had forgotten about that. In vfs_mark_atime(), there is much > more efficiency to be gained by not setting marks that will be discarded, > since it takes a VOP to set them and many file systems don't support > this setting. However, it is hard for vfs_mark_atime() to know when the > mark will be discarded without calling the fs: > > - it already doesn't know which fs's support it > - it should be checking fs_ronly for ffs I think that MNT_RDONLY is correct here. We want to stop new atime updates as soon as the downgrade starts, just like we stop new user-initiated writes. > - it seems to be missing locking for MNT_NOATIME and MNT_RDONLY > > fs-level locking for MNT_NOATIME and MNT_RDONLY and fs_ronly is dubious > too. Upper layers set the MNT flags before giving VOP_MOUNT() a chance > to adjust the marks. This is automatically safe in one direction only > (e.g., setting MNT_NOATIME or MNT_RDONLY is fail-safe since it stops > changes), and always bad for strict invariants. Maybe a reasonable way to handle this would be to set the flags before calling VOP_MOUNT() when they are being changed from 0 to 1, and clear them after calling VOP_MOUNT() when changing them from 1 to 0. Adding explicit locking sounds painful ... > I now use the following fixes: > % Index: ufs_vnops.c > % =================================================================== > % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v > % retrieving revision 1.293 > % diff -u -2 -r1.293 ufs_vnops.c > % --- ufs_vnops.c 8 Nov 2007 17:21:51 -0000 1.293 > % +++ ufs_vnops.c 2 Dec 2007 04:56:58 -0000 > % @@ -89,4 +89,5 @@ > % #endif > % > % +#include > % #include > % > % @@ -137,8 +138,38 @@ > % > % ip = VTOI(vp); > % + /* > % + * MNT_RDONLY can barely be trusted here. Full r/o mode is indicated > % + * by fs_ronly, and the MNT_RDONLY setting [should] differ from the > % + * fs_ronly setting only during transition from r/w mode to r/o mode. > % + * We set IN_ACCESS even in full r/o mode, so we must discard it > % + * unconditionally here. During the transition, we must convert > % + * IN_CHANGE | IN_UPDATE to IN_MODIFIED, since some callers neglect > % + * to set IN_MODIFIED. We also set the timestamps indicated by > % + * IN_CHANGE | IN_UPDATE normally during the transition, since the > % + * update marks may have been set correctly before the transition and > % + * not yet converted into timestamps. Callers that set IN_CHANGE | > % + * IN_UPDATE during the transition are buggy, since userland writes > % + * are supposed to be denied (by MNT_RDONLY checks) during the > % + * transition, while kernel writes should should only be for syncs > % + * and syncs should not touch timestamps except to convert old > % + * update marks to timestamps. Callers that set any update mark or > % + * modification flag except IN_ACCESS while in full r/o mode are > % + * broken; we will panic for them later. > % + */ > % if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) > % - goto out; > % + ip->i_flag &= ~IN_ACCESS; IN_ACCESS might have been set before the downgrade request. As written, this change will toss out the timestamp update. I think it would be better to use fs_ronly here, but it would be more efficient to check MNT_RDONLY in ffs_*read() and eliminate these two lines of code. From owner-freebsd-fs@FreeBSD.ORG Mon Dec 3 10:11:49 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 50FFE16A417; Mon, 3 Dec 2007 10:11:49 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au [211.29.132.182]) by mx1.freebsd.org (Postfix) with ESMTP id F1F6013C474; Mon, 3 Dec 2007 10:11:48 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lB3ABaa5006347 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 3 Dec 2007 21:11:46 +1100 Date: Mon, 3 Dec 2007 21:11:36 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Don Lewis In-Reply-To: <200712030517.lB35HgtK039158@gw.catspoiler.org> Message-ID: <20071203202947.N1698@besplex.bde.org> References: <200712030517.lB35HgtK039158@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org Subject: Re: File remove problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Dec 2007 10:11:49 -0000 On Sun, 2 Dec 2007, Don Lewis wrote: > On 3 Dec, Bruce Evans wrote: >> On Sun, 2 Dec 2007, Don Lewis wrote: >> >>> In particular, ffs_read() and ffs_extread() need to check MNT_RDONLY in >>> addition to MNT_NOATIME (as is already done in vfs_mark_atime()). This >>> also looks like it should be a reasonable optimization for read-only >>> file systems that should eliminate unnecessary work at the lower levels >>> of the code. >> >> But I let these happen and discard IN_ATIME marks if fs_ronly. I >> thought that the optimization went the other way -- unconditionally >> setting the marks was very efficient, and discarding them in ufs_itimes() >> was efficient too. I think this is still true now with larger locking >> overheads, and the marks should be discarded later in the MNT_NOATIME >> case too. It is expected that the marks are set much more often than >> they are looked at by ufs_itimes(), since most calls to ufs_itimes() >> are in close() and read() is much more common than close(). > > ffs_read() and ffs_extread() already check MNT_NOATIME, so also checking > MNT_RDONLY there as well is free. Setting and clearing the mark will > consume a few instruction cycles, dirty a cache line, and increase main > memory write-back traffic, though the expense is likely to be small. The check can also avoid the new vnode locking for useless settings of IN_ATIME. But what locks the MNT flags? Nothing directly I think. Here we must not care if we read a stale value, and ufs_itimes() must not care if the value changed just after we read it. > Preventing user reads from setting IN_ATIME as soon as MNT_RDONLY is set > on a downgrade to read-only seems to be the right thing to do. Either reads or ufs_itimes() must use MNT_RDONLY to prevent changes to the inode after MNT_RDONLY is set early in the r/w to r/o transition. Checking MNT_RDONLY gives the more correct behaviour of not having to discard even IN_ATIME settings that were made before the transition began. I don't understand how unmount (apparently) works so well without setting MNT_RDONLY to prevent further writes like the transition does. >> ufs_itimes() >> is also called in stat() but I think that is less common than close() >> (except for some tree walks). WIth non-delayed marking, ufs_itimes() >> would still have to check fs_ronly, and the only gain would be that >> it could then skip checking the marks except as an invariants check. >> But it can gain like that even with delayed setting -- just ignore any >> old marks while fs_ronly (except as an invariants check), but clear them >> at mount or unmount time so that there shouldn't be any. > > I think that setting the marks when the file system is read-only causes > the syncer to do extra work. I think that ffs_sync() still gets called > if the file system is read-only, and if it encounters any inodes with > marks set, it calls ffs_syncvnode() on them. I think VOP_SYNC() actually isn't called on r/o file systems. Callers check MNT_RDONLY or possibly dirty block list pointers. msdosfs had a buf that would have caused panics if msdosfs_sync() were called on an fs that had ever been mounted r/w but is currently r/o. >>> The early IN_ACCESS flag setting in ufs_setattr(), before the MNT_RDONLY >>> check, appears to be protected by the MNT_RDONLY check in >>> vfs_mark_atime(). >> >> Thanks, I had forgotten about that. In vfs_mark_atime(), there is much >> more efficiency to be gained by not setting marks that will be discarded, >> since it takes a VOP to set them and many file systems don't support >> this setting. However, it is hard for vfs_mark_atime() to know when the >> mark will be discarded without calling the fs: >> >> - it already doesn't know which fs's support it >> - it should be checking fs_ronly for ffs > > I think that MNT_RDONLY is correct here. We want to stop new atime > updates as soon as the downgrade starts, just like we stop new > user-initiated writes. Right. Same as for normal read accesses or delayed killing of IN_ACCESS in ufs_itimes(). >> - it seems to be missing locking for MNT_NOATIME and MNT_RDONLY >> >> fs-level locking for MNT_NOATIME and MNT_RDONLY and fs_ronly is dubious >> too. Upper layers set the MNT flags before giving VOP_MOUNT() a chance >> to adjust the marks. This is automatically safe in one direction only >> (e.g., setting MNT_NOATIME or MNT_RDONLY is fail-safe since it stops >> changes), and always bad for strict invariants. > > Maybe a reasonable way to handle this would be to set the > flags before calling VOP_MOUNT() when they are being changed from 0 to > 1, and clear them after calling VOP_MOUNT() when changing them from 1 to > 0. Adding explicit locking sounds painful ... This already happens for MNT_RDONLY. ffs_mount() has dead code which obfuscates this and other things by setting all the generic flags again. It gets the timing of the setting of MNT_RDONLY backwards by delaying the setting until the end of the transition from r/w to r/o, but this has no effect since MNT_RDONLY is set throughout the transition. It only gets the timing of the clearing of MNT_RDONLY right by delaying it until the end of the transition from r/o to r/w. Some flags have the wrong sense for this to be right. E.g., early clearing of MNT_ASYNC is safe, but early setting of it is not. tegge fixed some races from this. Bruce From owner-freebsd-fs@FreeBSD.ORG Mon Dec 3 11:07:00 2007 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 749E016A4EB for ; Mon, 3 Dec 2007 11:07:00 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 6485013C46A for ; Mon, 3 Dec 2007 11:07:00 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id lB3B70PT005570 for ; Mon, 3 Dec 2007 11:07:00 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id lB3B6xNS005566 for freebsd-fs@FreeBSD.org; Mon, 3 Dec 2007 11:06:59 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 3 Dec 2007 11:06:59 GMT Message-Id: <200712031106.lB3B6xNS005566@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Dec 2007 11:07:00 -0000 Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o kern/114856 fs [ntfs] [patch] Bug in NTFS allows bogus file modes. o kern/116170 fs Kernel panic when mounting /tmp o kern/118322 fs [panic] Sometimes (seldom), "panic:page fault" happens 5 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/114847 fs [ntfs] [patch] dirmask support for NTFS ala MSDOSFS o bin/118249 fs mv(1): moving a directory changes its mtime 2 problems total. From owner-freebsd-fs@FreeBSD.ORG Tue Dec 4 16:59:12 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5375016A419 for ; Tue, 4 Dec 2007 16:59:12 +0000 (UTC) (envelope-from julien.bellang@free.fr) Received: from postfix2-g20.free.fr (postfix2-g20.free.fr [212.27.60.43]) by mx1.freebsd.org (Postfix) with ESMTP id E672813C46E for ; Tue, 4 Dec 2007 16:59:11 +0000 (UTC) (envelope-from julien.bellang@free.fr) Received: from smtp8-g19.free.fr (smtp8-g19.free.fr [212.27.42.65]) by postfix2-g20.free.fr (Postfix) with ESMTP id 05C302000FE1 for ; Tue, 4 Dec 2007 15:21:23 +0100 (CET) Received: from smtp8-g19.free.fr (localhost [127.0.0.1]) by smtp8-g19.free.fr (Postfix) with ESMTP id EA33E17F598 for ; Tue, 4 Dec 2007 17:22:12 +0100 (CET) Received: from imp7-g19.free.fr (imp7-g19.free.fr [212.27.42.38]) by smtp8-g19.free.fr (Postfix) with ESMTP id DC6B717F576 for ; Tue, 4 Dec 2007 17:22:12 +0100 (CET) Received: by imp7-g19.free.fr (Postfix, from userid 33) id C60C73F5B; Tue, 4 Dec 2007 17:14:41 +0100 (CET) Received: from 194.3.231.254 ([194.3.231.254]) by imp.free.fr (IMP) with HTTP for ; Tue, 04 Dec 2007 17:14:41 +0100 Message-ID: <1196784881.47557cf18ae9f@imp.free.fr> Date: Tue, 04 Dec 2007 17:14:41 +0100 From: julien.bellang@free.fr To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit User-Agent: Internet Messaging Program (IMP) 3.2.5 X-Originating-IP: 194.3.231.254 Subject: (no subject) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Dec 2007 16:59:12 -0000 Hi, I'm working on a system installed in an environnement where power is cut off many time a week. This system is based on i386 FreeBSD 6.2 OS. I'm using FS UFS2 with SoftUpdate Activated. After such power shutdown, when I restart I've got some corrupted files that FSCK_UFS doesn't entirely resolve. For these files FSCK resolves the following error : /dev/ad0s1f: INCORRECT BLOCK COUNT I=3132417 (512992 should be 459392) (CORRECTED) But actually these file still inconsistency in my point of view as the file size field doesn't reflect the number of block reference in its inode. Regards to fsck_ffs sources, It seems that FSCK checks the validity of block pointer (!= 0) in the inode block list only for directory inode but not for regular file. In my case, as the number of block adress to check in the inode is deduced from the file size, and the file size is greater than the number of really allocated blocks I obtain many NULL block pointer. Does anyone have an idea why the NULL pointer are accepted by FSCK for regular file and it doesn't try to adjust the file size ? file fsck_ffs/inode.c, iblocks(), line 208 line 153: static int 154: iblock(struct inodesc *idesc, long ilevel, off_t isize) { . . . 197: if (IBLK(bp, i)) { idesc->id_blkno = IBLK(bp, i); if (ilevel == 0) n = (*func)(idesc); else n = iblock(idesc, ilevel, isize); if (n & STOP) { bp->b_flags &= ~B_INUSE; return (n); } } else { 208: if (idesc->id_type == DATA && isize > 0) { /* An empty block in a directory XXX */ getpathname(pathbuf, idesc->id_number, idesc->id_number); pfatal("DIRECTORY %s: CONTAINS EMPTY BLOCKS", pathbuf); if (reply("ADJUST LENGTH") == 1) { dp = ginode(idesc->id_number); DIP_SET(dp, di_size, DIP(dp, di_size) - isize); isize = 0; printf( "YOU MUST RERUN FSCK AFTERWARDS\n"); rerun = 1; inodirty(); bp->b_flags &= ~B_INUSE; return(STOP); } } } . . . } Thanks, Julien From owner-freebsd-fs@FreeBSD.ORG Tue Dec 4 17:23:27 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6BECF16A41B for ; Tue, 4 Dec 2007 17:23:27 +0000 (UTC) (envelope-from julien.bellang@free.fr) Received: from smtp8-g19.free.fr (smtp8-g19.free.fr [212.27.42.65]) by mx1.freebsd.org (Postfix) with ESMTP id 0CD9A13C4D3 for ; Tue, 4 Dec 2007 17:23:26 +0000 (UTC) (envelope-from julien.bellang@free.fr) Received: from smtp8-g19.free.fr (localhost [127.0.0.1]) by smtp8-g19.free.fr (Postfix) with ESMTP id 3725717F54D for ; Tue, 4 Dec 2007 18:23:24 +0100 (CET) Received: from imp7-g19.free.fr (imp7-g19.free.fr [212.27.42.38]) by smtp8-g19.free.fr (Postfix) with ESMTP id F41EE17F5CE for ; Tue, 4 Dec 2007 18:23:23 +0100 (CET) Received: by imp7-g19.free.fr (Postfix, from userid 33) id 3FC9F3F77; Tue, 4 Dec 2007 18:15:52 +0100 (CET) Received: from 194.3.231.254 ([194.3.231.254]) by imp.free.fr (IMP) with HTTP for ; Tue, 04 Dec 2007 18:15:51 +0100 Message-ID: <1196788551.47558b47ee317@imp.free.fr> Date: Tue, 04 Dec 2007 18:15:51 +0100 From: julien.bellang@free.fr To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit User-Agent: Internet Messaging Program (IMP) 3.2.5 X-Originating-IP: 194.3.231.254 Subject: FSCK failed does'nt correct file size when INCORRECT BLOCK COUNT is found X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Dec 2007 17:23:27 -0000 Hi, I'm working on a system installed in an environnement where power is cut off many time a week. This system is based on i386 FreeBSD 6.2 OS. I'm using FS UFS2 with SoftUpdate Activated. After such power shutdown, when I restart I've got some corrupted files that FSCK_UFS doesn't entirely resolve. For these files FSCK resolves the following error : /dev/ad0s1f: INCORRECT BLOCK COUNT I=3132417 (512992 should be 459392) (CORRECTED) But actually these file still inconsistency in my point of view as the file size field doesn't reflect the number of block reference in its inode. Regards to fsck_ffs sources, It seems that FSCK checks the validity of block pointer (!= 0) in the inode block list only for directory inode but not for regular file. In my case, as the number of block adress to check in the inode is deduced from the file size, and the file size is greater than the number of really allocated blocks I obtain many NULL block pointer. Does anyone have an idea why the NULL pointer are accepted by FSCK for regular file and it doesn't try to adjust the file size ? file fsck_ffs/inode.c, iblocks(), line 208 line 153: static int 154: iblock(struct inodesc *idesc, long ilevel, off_t isize) { . . . 197: if (IBLK(bp, i)) { idesc->id_blkno = IBLK(bp, i); if (ilevel == 0) n = (*func)(idesc); else n = iblock(idesc, ilevel, isize); if (n & STOP) { bp->b_flags &= ~B_INUSE; return (n); } } else { 208: if (idesc->id_type == DATA && isize > 0) { /* An empty block in a directory XXX */ getpathname(pathbuf, idesc->id_number, idesc->id_number); pfatal("DIRECTORY %s: CONTAINS EMPTY BLOCKS", pathbuf); if (reply("ADJUST LENGTH") == 1) { dp = ginode(idesc->id_number); DIP_SET(dp, di_size, DIP(dp, di_size) - isize); isize = 0; printf( "YOU MUST RERUN FSCK AFTERWARDS\n"); rerun = 1; inodirty(); bp->b_flags &= ~B_INUSE; return(STOP); } } } . . . } Thanks, Julien From owner-freebsd-fs@FreeBSD.ORG Tue Dec 4 17:23:29 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7189416A47B for ; Tue, 4 Dec 2007 17:23:29 +0000 (UTC) (envelope-from julien.bellang@free.fr) Received: from smtp8-g19.free.fr (smtp8-g19.free.fr [212.27.42.65]) by mx1.freebsd.org (Postfix) with ESMTP id 2F29513C4EB for ; Tue, 4 Dec 2007 17:23:29 +0000 (UTC) (envelope-from julien.bellang@free.fr) Received: from smtp8-g19.free.fr (localhost [127.0.0.1]) by smtp8-g19.free.fr (Postfix) with ESMTP id 3FFC617F521 for ; Tue, 4 Dec 2007 18:23:28 +0100 (CET) Received: from imp7-g19.free.fr (imp7-g19.free.fr [212.27.42.38]) by smtp8-g19.free.fr (Postfix) with ESMTP id 9465C17F59C for ; Tue, 4 Dec 2007 18:23:27 +0100 (CET) Received: by imp7-g19.free.fr (Postfix, from userid 33) id 07CA03F2E; Tue, 4 Dec 2007 18:15:55 +0100 (CET) Received: from 194.3.231.254 ([194.3.231.254]) by imp.free.fr (IMP) with HTTP for ; Tue, 04 Dec 2007 18:15:55 +0100 Message-ID: <1196788555.47558b4bab0ab@imp.free.fr> Date: Tue, 04 Dec 2007 18:15:55 +0100 From: julien.bellang@free.fr To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit User-Agent: Internet Messaging Program (IMP) 3.2.5 X-Originating-IP: 194.3.231.254 Subject: FSCK failed does'nt corrected file size when INCORRECT BLOCK COUNT is found X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Dec 2007 17:23:29 -0000 Hi, I'm working on a system installed in an environnement where power is cut off many time a week. This system is based on i386 FreeBSD 6.2 OS. I'm using FS UFS2 with SoftUpdate Activated. After such power shutdown, when I restart I've got some corrupted files that FSCK_UFS doesn't entirely resolve. For these files FSCK resolves the following error : /dev/ad0s1f: INCORRECT BLOCK COUNT I=3132417 (512992 should be 459392) (CORRECTED) But actually these file still inconsistency in my point of view as the file size field doesn't reflect the number of block reference in its inode. Regards to fsck_ffs sources, It seems that FSCK checks the validity of block pointer (!= 0) in the inode block list only for directory inode but not for regular file. In my case, as the number of block adress to check in the inode is deduced from the file size, and the file size is greater than the number of really allocated blocks I obtain many NULL block pointer. Does anyone have an idea why the NULL pointer are accepted by FSCK for regular file and it doesn't try to adjust the file size ? file fsck_ffs/inode.c, iblocks(), line 208 line 153: static int 154: iblock(struct inodesc *idesc, long ilevel, off_t isize) { . . . 197: if (IBLK(bp, i)) { idesc->id_blkno = IBLK(bp, i); if (ilevel == 0) n = (*func)(idesc); else n = iblock(idesc, ilevel, isize); if (n & STOP) { bp->b_flags &= ~B_INUSE; return (n); } } else { 208: if (idesc->id_type == DATA && isize > 0) { /* An empty block in a directory XXX */ getpathname(pathbuf, idesc->id_number, idesc->id_number); pfatal("DIRECTORY %s: CONTAINS EMPTY BLOCKS", pathbuf); if (reply("ADJUST LENGTH") == 1) { dp = ginode(idesc->id_number); DIP_SET(dp, di_size, DIP(dp, di_size) - isize); isize = 0; printf( "YOU MUST RERUN FSCK AFTERWARDS\n"); rerun = 1; inodirty(); bp->b_flags &= ~B_INUSE; return(STOP); } } } . . . } Thanks, Julien From owner-freebsd-fs@FreeBSD.ORG Thu Dec 6 15:09:47 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E39BF16A41A for ; Thu, 6 Dec 2007 15:09:47 +0000 (UTC) (envelope-from julien.bellang@free.fr) Received: from smtp8-g19.free.fr (smtp8-g19.free.fr [212.27.42.65]) by mx1.freebsd.org (Postfix) with ESMTP id 7F6A113C469 for ; Thu, 6 Dec 2007 15:09:45 +0000 (UTC) (envelope-from julien.bellang@free.fr) Received: from smtp8-g19.free.fr (localhost [127.0.0.1]) by smtp8-g19.free.fr (Postfix) with ESMTP id 7646A17F6C2 for ; Thu, 6 Dec 2007 16:09:44 +0100 (CET) Received: from imp7-g19.free.fr (imp7-g19.free.fr [212.27.42.38]) by smtp8-g19.free.fr (Postfix) with ESMTP id 689F717F6C1 for ; Thu, 6 Dec 2007 16:09:44 +0100 (CET) Received: by imp7-g19.free.fr (Postfix, from userid 33) id 780353FD0; Thu, 6 Dec 2007 16:01:50 +0100 (CET) Received: from 194.3.231.254 ([194.3.231.254]) by imp.free.fr (IMP) with HTTP for ; Thu, 06 Dec 2007 16:01:50 +0100 Message-ID: <1196953310.47580ede28676@imp.free.fr> Date: Thu, 06 Dec 2007 16:01:50 +0100 From: julien.bellang@free.fr To: freebsd-fs@freebsd.org References: <1196788555.47558b4bab0ab@imp.free.fr> In-Reply-To: <1196788555.47558b4bab0ab@imp.free.fr> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit User-Agent: Internet Messaging Program (IMP) 3.2.5 X-Originating-IP: 194.3.231.254 Subject: Re: FSCK doesn't correct file size when INCORRECT BLOCK COUNT is found X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2007 15:09:48 -0000 Hi, Finally I understood why FSCK_UFS failed to correct corrupted files which had been in write progress whereas the power was shut down. 1) First some information about the file corrupted : In my case the Files System has the following characteristics - the write cache is activated on the hard drive - the SoftUpdate option is activated - the FS is mount with the default option noasync In this condition, when the power is cut off as a file is being written, at restart the file is corrupted in the following way : - in the inode metadata : the FILESIZE and the BLOCKCOUNT field are corresponding to the final value waited for the file - in the inode the list of BLOCKS is not up to date (seem pretty normal as the written process was not achieved), and the list is made of holes (EMPTY BLOCK, null block reference) that are not necessarily at the end of the list. 2) How FSCK treats these files : When computing on such a file, when FSCK finds a hole in a regular file's inode Block list, it skips it and doesn't increment the block count. BUT if the inode isn't associated to a directory, FSCK DOESN'T consider this hole as a default. Indeed, I think it may be possible that an application voluntary creates such a file. However after having checking the inode block list, FSCK's function checkinode() finds that the Block count calculated doesn't correspond to the inode BLOCKCOUNT field and only proposes to correct this field and doesn't correct the SIZE field. The problem for the end user, is that as the file seems to have the right Size, he's not able to know that the write process was not actually ended normally (I'm exactly in this situation) and thus that he will use a corrupted file. 3) A proposed solution : I'm working on a workaround in FSCK (and it seems to work fine in my case) that truncates the file with hole as soon as the first hole is discovered, and modifies the inode SIZE and BLOCKCOUNT field in consequence. However I have in mind that such a patch may be a problem for File that were voluntary created with hole. Maybe the solution is to pass a new option to FSCK, or only truncate the File if the BLOCKCOUNT is inconsistent. Is anyone interesting by this work and could react to this analyse ? Especially, I'm interesting to know in which known cases applications or system may be able to generate regular File containing hole. Thanks, Julien Bellanger ______________________________________________________ Selon julien.bellang@free.fr: > > Hi, > > I'm working on a system installed in an environnement where power is cut off > many time a week. This system is based on i386 FreeBSD 6.2 OS. > > I'm using FS UFS2 with SoftUpdate Activated. > > After such power shutdown, when I restart I've got some corrupted files that > FSCK_UFS doesn't entirely resolve. > > For these files FSCK resolves the following error : > /dev/ad0s1f: INCORRECT BLOCK COUNT I=3132417 (512992 should be 459392) > (CORRECTED) > > But actually these file still inconsistency in my point of view as the file > size > field doesn't reflect the number of block reference in its inode. > > Regards to fsck_ffs sources, It seems that FSCK checks the validity of block > pointer (!= 0) in the inode block list only for directory inode but not for > regular file. > In my case, as the number of block adress to check in the inode is deduced > from > the file size, and the file size is greater than the number of really > allocated > blocks I obtain many NULL block pointer. > > Does anyone have an idea why the NULL pointer are accepted by FSCK for > regular > file and it doesn't try to adjust the file size ? > > file fsck_ffs/inode.c, iblocks(), line 208 > > line 153: static int > 154: iblock(struct inodesc *idesc, long ilevel, off_t isize) > { > . > . > . > 197: if (IBLK(bp, i)) { > idesc->id_blkno = IBLK(bp, i); > if (ilevel == 0) > n = (*func)(idesc); > else > n = iblock(idesc, ilevel, isize); > if (n & STOP) { > bp->b_flags &= ~B_INUSE; > return (n); > } > } else { > 208: if (idesc->id_type == DATA && isize > 0) { > /* An empty block in a directory XXX */ > getpathname(pathbuf, idesc->id_number, > idesc->id_number); > pfatal("DIRECTORY %s: CONTAINS EMPTY BLOCKS", > pathbuf); > if (reply("ADJUST LENGTH") == 1) { > dp = ginode(idesc->id_number); > DIP_SET(dp, di_size, > DIP(dp, di_size) - isize); > isize = 0; > printf( > "YOU MUST RERUN FSCK > AFTERWARDS\n"); > rerun = 1; > inodirty(); > bp->b_flags &= ~B_INUSE; > return(STOP); > } > } > } > . > . > . > } > > Thanks, > > Julien > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Thu Dec 6 17:29:55 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB9A016A417 for ; Thu, 6 Dec 2007 17:29:55 +0000 (UTC) (envelope-from bg@sics.se) Received: from letter.sics.se (letter.sics.se [193.10.64.6]) by mx1.freebsd.org (Postfix) with ESMTP id A338513C447 for ; Thu, 6 Dec 2007 17:29:55 +0000 (UTC) (envelope-from bg@sics.se) Received: from sics.se (ibook.sics.se [193.10.66.104]) by letter.sics.se (Postfix) with ESMTP id 0040A4022B; Thu, 6 Dec 2007 17:56:34 +0100 (CET) Date: Thu, 6 Dec 2007 17:56:08 +0100 From: Bjorn Gronvall To: julien.bellang@free.fr Message-ID: <20071206175608.594685d9@ibook.sics.se> In-Reply-To: <1196953310.47580ede28676@imp.free.fr> References: <1196788555.47558b4bab0ab@imp.free.fr> <1196953310.47580ede28676@imp.free.fr> Organization: SICS.SE X-Mailer: Claws Mail 2.9.1 (GTK+ 2.10.6; i386-portbld-freebsd6.2) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-fs@freebsd.org Subject: Re: FSCK doesn't correct file size when INCORRECT BLOCK COUNT is found X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2007 17:29:55 -0000 On Thu, 06 Dec 2007 16:01:50 +0100 julien.bellang@free.fr wrote: Hi Julien, > 1) First some information about the file corrupted : > In my case the Files System has the following characteristics > - the write cache is activated on the hard drive > - the SoftUpdate option is activated > - the FS is mount with the default option noasync Filesystems in general and UFS with soft updates in particular rely on disks providing accurate response to writes. When write caching is enabled the disk will lie and tell the operating system that the write has completed successfully, in reality the data is only cached in disk RAM. When the power disappears the data will be gone forever. In order to avoid this problem you can turn off write caching, this way the software knows if the write completed successfully or not. Alternatively you may power your disks from batteries, multiple power supplies with UPS:es or come up with some other hardware solution. Cheers, /b -- _ _ ,_______________. Bjorn Gronvall (Björn Grönvall) /_______________/| Swedish Institute of Computer Science | || PO Box 1263, S-164 29 Kista, Sweden | Schroedingers || Email: bg@sics.se, Phone +46 -8 633 15 25 | Cat |/ Cellular +46 -70 768 06 35, Fax +46 -8 751 72 30 '---------------' From owner-freebsd-fs@FreeBSD.ORG Thu Dec 6 21:40:44 2007 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1D64216A418 for ; Thu, 6 Dec 2007 21:40:44 +0000 (UTC) (envelope-from joe@tao.org.uk) Received: from mailhost.tao.org.uk (tao.uscs.susx.ac.uk [139.184.131.101]) by mx1.freebsd.org (Postfix) with ESMTP id DDFA313C461 for ; Thu, 6 Dec 2007 21:40:43 +0000 (UTC) (envelope-from joe@tao.org.uk) Received: by mailhost.tao.org.uk (Postfix, from userid 1000) id 85DD575A7; Thu, 6 Dec 2007 21:08:48 +0000 (GMT) Date: Thu, 6 Dec 2007 21:08:48 +0000 From: Josef Karthauser To: fs@freebsd.org Message-ID: <20071206210848.GA63825@transwarp.tao.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.13 (2006-08-11) X-taoresearch-MailScanner-Information: Please contact Tao Research for more information X-taoresearch-MailScanner: Found to be clean X-MailScanner-From: joe@tao.org.uk Cc: Subject: -14% available on /tmp X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2007 21:40:44 -0000 One of my servers is reporting: # df | grep tmp /dev/mirror/boot0e 507630 -64328 531348 -14% /tmp How weird is that? I wonder what is going on. The kernel is dated: 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #68: Mon Oct 2 14:36:13 BST 2006 Joe From owner-freebsd-fs@FreeBSD.ORG Thu Dec 6 21:53:54 2007 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4594316A417 for ; Thu, 6 Dec 2007 21:53:54 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (unknown [IPv6:2001:5c0:8fff:fffe::214d]) by mx1.freebsd.org (Postfix) with ESMTP id 0EF2F13C461 for ; Thu, 6 Dec 2007 21:53:54 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from gjp by noop.in-addr.com with local (Exim 4.54 (FreeBSD)) id 1J0OfI-0004Ja-H7; Thu, 06 Dec 2007 16:53:52 -0500 Date: Thu, 6 Dec 2007 16:53:52 -0500 From: Gary Palmer To: Josef Karthauser Message-ID: <20071206215352.GA986@in-addr.com> References: <20071206210848.GA63825@transwarp.tao.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071206210848.GA63825@transwarp.tao.org.uk> Cc: fs@freebsd.org Subject: Re: -14% available on /tmp X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2007 21:53:54 -0000 On Thu, Dec 06, 2007 at 09:08:48PM +0000, Josef Karthauser wrote: > One of my servers is reporting: > > # df | grep tmp > /dev/mirror/boot0e 507630 -64328 531348 -14% /tmp > > How weird is that? I wonder what is going on. > The kernel is dated: > > 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #68: Mon Oct 2 14:36:13 BST 2006 http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/disks.html#DISK-MORE-THAN-FULL Not sure why its -14% rather than the more normal -8%, but I suspect thats whats happened. Gary From owner-freebsd-fs@FreeBSD.ORG Thu Dec 6 22:10:40 2007 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 797D716A417; Thu, 6 Dec 2007 22:10:40 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (cl-162.ewr-01.us.sixxs.net [IPv6:2001:4830:1200:a1::2]) by mx1.freebsd.org (Postfix) with ESMTP id D3AD813C43E; Thu, 6 Dec 2007 22:10:39 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (localhost [127.0.0.1]) by lor.one-eyed-alien.net (8.14.1/8.13.8) with ESMTP id lB6MAcjw070704; Thu, 6 Dec 2007 16:10:38 -0600 (CST) (envelope-from brooks@lor.one-eyed-alien.net) Received: (from brooks@localhost) by lor.one-eyed-alien.net (8.14.1/8.13.8/Submit) id lB6MAcEK070703; Thu, 6 Dec 2007 16:10:38 -0600 (CST) (envelope-from brooks) Date: Thu, 6 Dec 2007 16:10:38 -0600 From: Brooks Davis To: Gary Palmer Message-ID: <20071206221038.GA70675@lor.one-eyed-alien.net> References: <20071206210848.GA63825@transwarp.tao.org.uk> <20071206215352.GA986@in-addr.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="bg08WKrSYDhXBjb5" Content-Disposition: inline In-Reply-To: <20071206215352.GA986@in-addr.com> User-Agent: Mutt/1.5.16 (2007-06-09) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (lor.one-eyed-alien.net [127.0.0.1]); Thu, 06 Dec 2007 16:10:38 -0600 (CST) Cc: Josef Karthauser , fs@freebsd.org Subject: Re: -14% available on /tmp X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2007 22:10:40 -0000 --bg08WKrSYDhXBjb5 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Dec 06, 2007 at 04:53:52PM -0500, Gary Palmer wrote: > On Thu, Dec 06, 2007 at 09:08:48PM +0000, Josef Karthauser wrote: > > One of my servers is reporting: > >=20 > > # df | grep tmp > > /dev/mirror/boot0e 507630 -64328 531348 -14% /tmp > >=20 > > How weird is that? I wonder what is going on. > > The kernel is dated: > >=20 > > 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #68: Mon Oct 2 14:36:13 BST 2006 >=20 > http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/disks.html#DISK-MORE= -THAN-FULL >=20 > Not sure why its -14% rather than the more normal -8%, but I suspect thats > whats happened. I've also seen the occasional corrupted fs where the counts were seriously = out of whack. A fsck (or since it's just /tmp a newfs might be in order. -- Brooks --bg08WKrSYDhXBjb5 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFHWHNdXY6L6fI4GtQRAi67AJ0ZKpil+01oxGJScEujXafmUNRmswCfexDE 2EIgi5Imo6xxpNpyyWdNyp8= =fKBU -----END PGP SIGNATURE----- --bg08WKrSYDhXBjb5-- From owner-freebsd-fs@FreeBSD.ORG Thu Dec 6 22:25:41 2007 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D252A16A41A for ; Thu, 6 Dec 2007 22:25:41 +0000 (UTC) (envelope-from ticso@cicely12.cicely.de) Received: from raven.bwct.de (raven.bwct.de [85.159.14.73]) by mx1.freebsd.org (Postfix) with ESMTP id 6456713C46A for ; Thu, 6 Dec 2007 22:25:41 +0000 (UTC) (envelope-from ticso@cicely12.cicely.de) Received: from cicely5.cicely.de ([10.1.1.7]) by raven.bwct.de (8.13.4/8.13.4) with ESMTP id lB6M84Xm046882; Thu, 6 Dec 2007 23:08:04 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14]) by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id lB6M7utG024781 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 6 Dec 2007 23:07:57 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (localhost [127.0.0.1]) by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id lB6M7ua7018406; Thu, 6 Dec 2007 23:07:56 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: (from ticso@localhost) by cicely12.cicely.de (8.13.4/8.13.3/Submit) id lB6M7uu4018405; Thu, 6 Dec 2007 23:07:56 +0100 (CET) (envelope-from ticso) Date: Thu, 6 Dec 2007 23:07:56 +0100 From: Bernd Walter To: Josef Karthauser Message-ID: <20071206220755.GH10459@cicely12.cicely.de> References: <20071206210848.GA63825@transwarp.tao.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071206210848.GA63825@transwarp.tao.org.uk> X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha User-Agent: Mutt/1.5.9i X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=0.026, BAYES_00=-2.599 autolearn=ham version=3.1.7 X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on cicely12.cicely.de Cc: fs@freebsd.org Subject: Re: -14% available on /tmp X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ticso@cicely.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2007 22:25:41 -0000 On Thu, Dec 06, 2007 at 09:08:48PM +0000, Josef Karthauser wrote: > One of my servers is reporting: > > # df | grep tmp > /dev/mirror/boot0e 507630 -64328 531348 -14% /tmp > > How weird is that? I wonder what is going on. Have seen this a few times as well. It wasn't corrected automatically for whatever reason, but a manual fsck always fixed it. I'm not sure if it happend because of a crash or under normal load, because I always noticed it after some kind of unintended reboot. IIRC it was always a power failure, but I'm not 100% sure. -- B.Walter http://www.bwct.de http://www.fizon.de bernd@bwct.de info@bwct.de support@fizon.de From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 06:43:45 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3F0F116A41A for ; Fri, 7 Dec 2007 06:43:45 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (adsl-75-1-14-242.dsl.scrm01.sbcglobal.net [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 01F9613C455 for ; Fri, 7 Dec 2007 06:43:44 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id lB76hZYm063956; Thu, 6 Dec 2007 22:43:39 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200712070643.lB76hZYm063956@gw.catspoiler.org> Date: Thu, 6 Dec 2007 22:43:35 -0800 (PST) From: Don Lewis To: julien.bellang@free.fr In-Reply-To: <1196788551.47558b47ee317@imp.free.fr> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: freebsd-fs@FreeBSD.org Subject: Re: FSCK failed does'nt correct file size when INCORRECT BLOCK COUNT is found X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 06:43:45 -0000 On 4 Dec, julien.bellang@free.fr wrote: > > Hi, > > I'm working on a system installed in an environnement where power is cut off > many time a week. This system is based on i386 FreeBSD 6.2 OS. > > I'm using FS UFS2 with SoftUpdate Activated. If your disks have write-cacheing enabled, you are likely to encounter file system corruption caused by the loss of power that can't be fixed automatically by fsck, which will require manual intervention. The reason is that soft updates attempts to write data to the disk in an order that guarantees that the file system is always in a consistent state so that it can always be properly cleaned up after a crash. This strategy is defeated by the write caching by the disk, which causes the disk to immediately tell soft updates that data has been written, even if the data is only saved to the disk's write cache. This may allow soft updates to write another set of data to disk that should not actually be written before the previous set of data. If the disk then writes the second set of data to the media before the first set of data, and a power failure occurs before the disk has written the first set of data, the file system is then corrupted. You can turn off write caching by putting the following into /boot/loader.conf: hw.ata.wc=0 though it will greatly decrease your system's disk write performance. Powering the system using an UPS that can initiate a clean system shutdown on power failure may be a better option. > After such power shutdown, when I restart I've got some corrupted files that > FSCK_UFS doesn't entirely resolve. > > For these files FSCK resolves the following error : > /dev/ad0s1f: INCORRECT BLOCK COUNT I=3132417 (512992 should be 459392) > (CORRECTED) > > But actually these file still inconsistency in my point of view as the file size > field doesn't reflect the number of block reference in its inode. > > Regards to fsck_ffs sources, It seems that FSCK checks the validity of block > pointer (!= 0) in the inode block list only for directory inode but not for > regular file. > In my case, as the number of block adress to check in the inode is deduced from > the file size, and the file size is greater than the number of really allocated > blocks I obtain many NULL block pointer. > > Does anyone have an idea why the NULL pointer are accepted by FSCK for regular > file and it doesn't try to adjust the file size ? Regular files are allowed to be sparse (have holes where no data is stored and no blocks are allocated). This is indicated by NULL block pointers for the file offsets that correspond to the holes. Sparse files are easy to create: % dd if=/dev/zero of=/tmp/sparsefile bs=512 oseek=1000000 count=1 1+0 records in 1+0 records out 512 bytes transferred in 0.000132 secs (3876324 bytes/sec) % ls -ls /tmp/sparsefile 64 -rw-r--r-- 1 dl wheel 512000512 Dec 6 22:26 /tmp/sparsefile From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 09:49:51 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6EA2016A417 for ; Fri, 7 Dec 2007 09:49:51 +0000 (UTC) (envelope-from antik@bsd.ee) Received: from zzz.ee (kalah.zzz.ee [194.204.30.253]) by mx1.freebsd.org (Postfix) with ESMTP id 2ACEE13C465 for ; Fri, 7 Dec 2007 09:49:51 +0000 (UTC) (envelope-from antik@bsd.ee) Received: by zzz.ee (Postfix, from userid 3019) id EB31219867F; Fri, 7 Dec 2007 11:31:40 +0200 (EET) X-Spam-Checker-Version: SpamAssassin on spamassassin.zzz.ee X-Spam-Level: X-Spam-Guessed-Language: X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,BAYES_50 X-Spam-Checker-URL: http://info.zzz.ee Received: from andrei.demo (adsl215.uninet.ee [194.204.62.215]) by zzz.ee (Postfix) with ESMTP id D0139198685 for ; Fri, 7 Dec 2007 11:31:32 +0200 (EET) From: Andrei Kolu To: freebsd-fs@freebsd.org Date: Fri, 7 Dec 2007 11:31:31 +0200 User-Agent: KMail/1.9.7 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200712071131.31773.antik@bsd.ee> Subject: raidtest: Cannot open 'raidtest.data' device: Operation not permitted X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 09:49:51 -0000 # uname -a FreeBSD test.demo 7.0-BETA4 FreeBSD 7.0-BETA4 #0: Sun Dec 2 16:34:41 UTC 2007 root@myers.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 3ware device driver for 9000 series storage controllers, version: 3.70.05.001 twa0: <3ware 9000 series Storage Controller> port 0x3000-0x30ff mem 0xd8000000-0xd9ffffff,0xda300000-0xda300fff irq 16 at device 0.0 on pci7 twa0: [ITHREAD] twa0: INFO: (0x04: 0x0053): Battery capacity test is overdue: twa0: INFO: (0x15: 0x1300): Controller details:: Model 9650SE-8LPML, 8 ports, Firmware FE9X 3.08.02.007, BIOS BE9X 3.08.00.002 raidtest-1.1 = up-to-date with port # set mediasize=`diskinfo /dev/da0 | awk '{print $3}'` # set sectorsize=`diskinfo /dev/da0 | awk '{print $2}'` # raidtest genfile -s $mediasize -S $sectorsize -n 50000 # raidtest test -d /dev/da0 -n 10 raidtest: Cannot open 'raidtest.data' device: Operation not permitted # echo $mediasize 1919932170240 # echo $sectorsize 512 Or anyone can recommend other raid performance testing utility? Andrei From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 11:34:41 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1635B16A417 for ; Fri, 7 Dec 2007 11:34:41 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id C3B9013C43E for ; Fri, 7 Dec 2007 11:34:40 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1J0bTV-0005eL-4g for freebsd-fs@freebsd.org; Fri, 07 Dec 2007 11:34:33 +0000 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 07 Dec 2007 11:34:33 +0000 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 07 Dec 2007 11:34:33 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Fri, 07 Dec 2007 12:39:56 +0100 Lines: 40 Message-ID: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig417C87F845D917AA452969FD" X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Thunderbird 2.0.0.6 (X11/20070801) X-Enigmail-Version: 0.95.3 Sender: news Subject: readv: parallel or sequential? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 11:34:41 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig417C87F845D917AA452969FD Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, I found this in readv(2): For readv() and preadv(), the iovec structure is defined as: struct iovec { void *iov_base; /* Base address. */ size_t iov_len; /* Length. */ }; Each iovec entry specifies the base address and length of an area in mem- ory where data should be placed. The readv() system call will alway= s fill an area completely before proceeding to the next. Does this mean that, in effect, readv() is just a loop of read() calls (minus syscall overhead)? --------------enig417C87F845D917AA452969FD Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) iD8DBQFHWTETldnAQVacBcgRAhuZAKDyooMshocd2zxSBR9RKgz8kMrQcgCdFAon EGLoGWs7t9y9eUKAoojvAjw= =vs5U -----END PGP SIGNATURE----- --------------enig417C87F845D917AA452969FD-- From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 12:02:12 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ADCD116A417 for ; Fri, 7 Dec 2007 12:02:12 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 6ED4F13C458 for ; Fri, 7 Dec 2007 12:02:12 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id 3DDAB17105; Fri, 7 Dec 2007 11:37:32 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.1/8.14.1) with ESMTP id lB7BbVHt004110; Fri, 7 Dec 2007 11:37:31 GMT (envelope-from phk@critter.freebsd.dk) To: Ivan Voras From: "Poul-Henning Kamp" In-Reply-To: Your message of "Fri, 07 Dec 2007 12:39:56 +0100." Date: Fri, 07 Dec 2007 11:37:31 +0000 Message-ID: <4109.1197027451@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: freebsd-fs@freebsd.org Subject: Re: readv: parallel or sequential? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 12:02:12 -0000 In message , Ivan Voras writes: >Does this mean that, in effect, readv() is just a loop of read() calls >(minus syscall overhead)? It's more correct to say that read() is just a readv() with a single iovec. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 12:12:39 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 10EE416A420 for ; Fri, 7 Dec 2007 12:12:39 +0000 (UTC) (envelope-from dudu@dudu.ro) Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.235]) by mx1.freebsd.org (Postfix) with ESMTP id CE2C413C4CC for ; Fri, 7 Dec 2007 12:12:38 +0000 (UTC) (envelope-from dudu@dudu.ro) Received: by nz-out-0506.google.com with SMTP id l8so190977nzf for ; Fri, 07 Dec 2007 04:12:38 -0800 (PST) Received: by 10.142.214.5 with SMTP id m5mr2078467wfg.1197028045435; Fri, 07 Dec 2007 03:47:25 -0800 (PST) Received: by 10.143.12.4 with HTTP; Fri, 7 Dec 2007 03:47:25 -0800 (PST) Message-ID: Date: Fri, 7 Dec 2007 13:47:25 +0200 From: "Vlad GALU" To: "Ivan Voras" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: Cc: freebsd-fs@freebsd.org Subject: Re: readv: parallel or sequential? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 12:12:39 -0000 On 12/7/07, Ivan Voras wrote: > Hi, > > I found this in readv(2): > > For readv() and preadv(), the iovec structure is defined as: > > struct iovec { > void *iov_base; /* Base address. */ > size_t iov_len; /* Length. */ > }; > > Each iovec entry specifies the base address and length of an area > in mem- > ory where data should be placed. The readv() system call will always > fill an area completely before proceeding to the next. > > Does this mean that, in effect, readv() is just a loop of read() calls > (minus syscall overhead)? > read() is just a particular case of readv() (with only one iovec struct, plus the full buffer size), they both call kern_readv(), so the effect is the same. I assume the manpage means that the iovec structures are filled sequentially rather than in parallel. > > -- Mahnahmahnah! From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 12:48:22 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 968BA16A420 for ; Fri, 7 Dec 2007 12:48:22 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 435D513C442 for ; Fri, 7 Dec 2007 12:48:22 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id D406A2085; Fri, 7 Dec 2007 13:48:12 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.1/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id 585322082; Fri, 7 Dec 2007 13:48:12 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id 4114584499; Fri, 7 Dec 2007 13:48:12 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Bjorn Gronvall References: <1196788555.47558b4bab0ab@imp.free.fr> <1196953310.47580ede28676@imp.free.fr> <20071206175608.594685d9@ibook.sics.se> Date: Fri, 07 Dec 2007 13:48:12 +0100 In-Reply-To: <20071206175608.594685d9@ibook.sics.se> (Bjorn Gronvall's message of "Thu\, 6 Dec 2007 17\:56\:08 +0100") Message-ID: <86hciuu0vn.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: FSCK doesn't correct file size when INCORRECT BLOCK COUNT is found X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 12:48:22 -0000 Bjorn Gronvall writes: > Filesystems in general and UFS with soft updates in particular rely on > disks providing accurate response to writes. When write caching is > enabled the disk will lie and tell the operating system that the write > has completed successfully, in reality the data is only cached in disk > RAM. When the power disappears the data will be gone forever. No. This used to be the case with some cheaper disks which ignored the ATA "flush cache" command to score higher on benchmarks, but I doubt you'll find any disks on the market that still do that (at least from reputable manufacturers). ZFS makes extensive use of the "flush cache" command to ensure file system integrity (and in particular to ensure that the intent log is written to disk so it can be replayed in case of a crash). DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 12:49:02 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5ABD416A46B for ; Fri, 7 Dec 2007 12:49:02 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 1988713C4FB for ; Fri, 7 Dec 2007 12:49:01 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id D34F22084; Fri, 7 Dec 2007 13:48:53 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.1/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id C430E2082; Fri, 7 Dec 2007 13:48:53 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id AE22384499; Fri, 7 Dec 2007 13:48:53 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: julien.bellang@free.fr References: <1196788551.47558b47ee317@imp.free.fr> Date: Fri, 07 Dec 2007 13:48:53 +0100 In-Reply-To: <1196788551.47558b47ee317@imp.free.fr> (julien bellang's message of "Tue\, 04 Dec 2007 18\:15\:51 +0100") Message-ID: <86d4tiu0ui.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: FSCK failed does'nt correct file size when INCORRECT BLOCK COUNT is found X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 12:49:02 -0000 julien.bellang@free.fr writes: > I'm working on a system installed in an environnement where power is cut = off > many time a week. This system is based on i386 FreeBSD 6.2 OS. Get a UPS. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 12:54:09 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 713FA16A591 for ; Fri, 7 Dec 2007 12:54:09 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 26F1D13C4D1 for ; Fri, 7 Dec 2007 12:54:08 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id 66A52207F; Fri, 7 Dec 2007 13:54:00 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.1/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id 58755207E; Fri, 7 Dec 2007 13:54:00 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id 40AEF844A7; Fri, 7 Dec 2007 13:54:00 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: "Vlad GALU" References: Date: Fri, 07 Dec 2007 13:54:00 +0100 In-Reply-To: (Vlad GALU's message of "Fri\, 7 Dec 2007 13\:47\:25 +0200") Message-ID: <868x46u0lz.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, Ivan Voras Subject: Re: readv: parallel or sequential? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 12:54:09 -0000 "Vlad GALU" writes: > Ivan Voras writes: > > Does this mean that, in effect, readv() is just a loop of read() > > calls (minus syscall overhead)? > read() is just a particular case of readv() (with only one iovec > struct, plus the full buffer size), they both call kern_readv(), so > the effect is the same. I assume the manpage means that the iovec > structures are filled sequentially rather than in parallel. Interestingly, Linux does it the other way around - a device driver can implement readv() and writev(), but if it doesn't, the kernel will fall back to a default implementation which calls the driver's read() or write() method once for each iov. But to return to what Ivan was asking, I think what the man page is trying to say is that you can't use readv() to e.g. read individual network packets into separate buffers (unless each packet just happens to fit exactly within each buffer). DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 12:58:53 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B1F9C16A418 for ; Fri, 7 Dec 2007 12:58:53 +0000 (UTC) (envelope-from rees@citi.umich.edu) Received: from citi.umich.edu (unknown [IPv6:2001:468:e9c:3060::4]) by mx1.freebsd.org (Postfix) with ESMTP id 7B6ED13C44B for ; Fri, 7 Dec 2007 12:58:53 +0000 (UTC) (envelope-from rees@citi.umich.edu) Received: from citi.umich.edu (dsl093-001-248.det1.dsl.speakeasy.net [66.93.1.248]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "Jim Rees", Issuer "CITI Production KCA" (verified OK)) by citi.umich.edu (Postfix) with ESMTP id A1FBD44E2; Fri, 7 Dec 2007 07:58:52 -0500 (EST) Date: Fri, 7 Dec 2007 07:58:53 -0500 From: Jim Rees To: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= Message-ID: <20071207125852.GC6665@citi.umich.edu> References: <1196788551.47558b47ee317@imp.free.fr> <86d4tiu0ui.fsf@ds4.des.no> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <86d4tiu0ui.fsf@ds4.des.no> Cc: freebsd-fs@freebsd.org Subject: Re: FSCK failed does'nt correct file size when INCORRECT BLOCK COUNT is found X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 12:58:53 -0000 Dag-Erling Smørgrav wrote: Get a UPS. We should strive to prevent data corruption in the face of unexpected system shutdown. Having a user who loses power several times a week seems useful for testing, especially when he is willing to delve into fsck sources and figure out what's going on. My recommendation would be to turn off caching in the disk and report back in a couple of weeks. From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 13:17:22 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 66E3816A421 for ; Fri, 7 Dec 2007 13:17:22 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 1DE5D13C458 for ; Fri, 7 Dec 2007 13:17:21 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1J0d4r-00062v-Dn for freebsd-fs@freebsd.org; Fri, 07 Dec 2007 13:17:13 +0000 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 07 Dec 2007 13:17:13 +0000 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 07 Dec 2007 13:17:13 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Fri, 07 Dec 2007 14:22:22 +0100 Lines: 35 Message-ID: References: <868x46u0lz.fsf@ds4.des.no> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig09F7468F9E992596CC57A5F2" X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Thunderbird 2.0.0.6 (X11/20070801) In-Reply-To: <868x46u0lz.fsf@ds4.des.no> X-Enigmail-Version: 0.95.3 Sender: news Subject: Re: readv: parallel or sequential? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 13:17:22 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig09F7468F9E992596CC57A5F2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Dag-Erling Sm=C3=B8rgrav wrote: > But to return to what Ivan was asking, I think what the man page is > trying to say is that you can't use readv() to e.g. read individual > network packets into separate buffers (unless each packet just happens > to fit exactly within each buffer). What about streaming protocols like TCP? If, for example, I know I have a N-byte header, N2-byte body, couldn't readv handle it with two iovecs? But that's not why I started the discussion. I'm looking for a way to do "scattered" async IO on files (the intention: feed an array of offsets, lengths and buffers into the kernel, let it perform the requests in parallel, if it can) and started with this man page. --------------enig09F7468F9E992596CC57A5F2 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) iD8DBQFHWUkkldnAQVacBcgRAii1AKCZZHwHUcbv0Ofl55O3TTmmJFG4zQCg1fNb becsoQeragiV7qhZn2M/dfk= =xFh/ -----END PGP SIGNATURE----- --------------enig09F7468F9E992596CC57A5F2-- From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 13:21:41 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8BF9416A46E; Fri, 7 Dec 2007 13:21:41 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 4C3D513C45D; Fri, 7 Dec 2007 13:21:41 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id 8F41517105; Fri, 7 Dec 2007 13:21:39 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.1/8.14.1) with ESMTP id lB7DLcb4005180; Fri, 7 Dec 2007 13:21:39 GMT (envelope-from phk@critter.freebsd.dk) To: Ivan Voras From: "Poul-Henning Kamp" In-Reply-To: Your message of "Fri, 07 Dec 2007 14:22:22 +0100." Date: Fri, 07 Dec 2007 13:21:38 +0000 Message-ID: <5179.1197033698@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: freebsd-fs@freebsd.org Subject: Re: readv: parallel or sequential? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 13:21:41 -0000 In message , Ivan Voras writes: >This is an OpenPGP/MIME signed message (RFC 2440 and 3156) >--------------enig09F7468F9E992596CC57A5F2 >Content-Type: text/plain; charset=UTF-8 >Content-Transfer-Encoding: quoted-printable > >Dag-Erling Sm=C3=B8rgrav wrote: > >> But to return to what Ivan was asking, I think what the man page is >> trying to say is that you can't use readv() to e.g. read individual >> network packets into separate buffers (unless each packet just happens >> to fit exactly within each buffer). > >What about streaming protocols like TCP? If, for example, I know I have >a N-byte header, N2-byte body, couldn't readv handle it with two iovecs? yes. >But that's not why I started the discussion. I'm looking for a way to do >"scattered" async IO on files (the intention: feed an array of offsets, >lengths and buffers into the kernel, let it perform the requests in >parallel, if it can) and started with this man page. You want AIO -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 13:34:18 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1644616A417 for ; Fri, 7 Dec 2007 13:34:18 +0000 (UTC) (envelope-from bg@sics.se) Received: from letter.sics.se (letter.sics.se [193.10.64.6]) by mx1.freebsd.org (Postfix) with ESMTP id CF01613C468 for ; Fri, 7 Dec 2007 13:34:17 +0000 (UTC) (envelope-from bg@sics.se) Received: from sics.se (ibook.sics.se [193.10.66.104]) by letter.sics.se (Postfix) with ESMTP id A1342400D3; Fri, 7 Dec 2007 14:34:15 +0100 (CET) Date: Fri, 7 Dec 2007 14:33:48 +0100 From: Bjorn Gronvall To: Dag-Erling =?ISO-8859-1?Q?Sm=F8rgrav?= Message-ID: <20071207143348.17470be3@ibook.sics.se> In-Reply-To: <86hciuu0vn.fsf@ds4.des.no> References: <1196788555.47558b4bab0ab@imp.free.fr> <1196953310.47580ede28676@imp.free.fr> <20071206175608.594685d9@ibook.sics.se> <86hciuu0vn.fsf@ds4.des.no> Organization: SICS.SE X-Mailer: Claws Mail 2.9.1 (GTK+ 2.10.6; i386-portbld-freebsd6.2) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-fs@freebsd.org Subject: Re: FSCK doesn't correct file size when INCORRECT BLOCK COUNT is found X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 13:34:18 -0000 On Fri, 07 Dec 2007 13:48:12 +0100 Dag-Erling Smørgrav wrote: Hi Dag-Erling, > Bjorn Gronvall writes: > > Filesystems in general and UFS with soft updates in particular rely on > > disks providing accurate response to writes. When write caching is > > enabled the disk will lie and tell the operating system that the write > > has completed successfully, in reality the data is only cached in disk > > RAM. When the power disappears the data will be gone forever. > > No. This used to be the case with some cheaper disks which ignored the > ATA "flush cache" command to score higher on benchmarks, but I doubt > you'll find any disks on the market that still do that (at least from > reputable manufacturers). Agreed, but the software must also be written to actually make use of the more recent "flush cache" feature. I know that the GEOM journal can make use of this feature but does UFS with soft updates use it? > ZFS makes extensive use of the "flush cache" > command to ensure file system integrity (and in particular to ensure > that the intent log is written to disk so it can be replayed in case of > a crash). ZFS is a more recent beast than UFS and was probably designed with the "flush cache" feature in mind right from the very beginning. Cheers, /b -- _ _ ,_______________. Bjorn Gronvall (Björn Grönvall) /_______________/| Swedish Institute of Computer Science | || PO Box 1263, S-164 29 Kista, Sweden | Schroedingers || Email: bg@sics.se, Phone +46 -8 633 15 25 | Cat |/ Cellular +46 -70 768 06 35, Fax +46 -8 751 72 30 '---------------' From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 13:55:44 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AD3AC16A418; Fri, 7 Dec 2007 13:55:44 +0000 (UTC) (envelope-from ticso@cicely12.cicely.de) Received: from raven.bwct.de (raven.bwct.de [85.159.14.73]) by mx1.freebsd.org (Postfix) with ESMTP id 1DF8813C467; Fri, 7 Dec 2007 13:55:43 +0000 (UTC) (envelope-from ticso@cicely12.cicely.de) Received: from cicely5.cicely.de ([10.1.1.7]) by raven.bwct.de (8.13.4/8.13.4) with ESMTP id lB7Db8uF068221; Fri, 7 Dec 2007 14:37:08 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14]) by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id lB7Db20j031404 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 7 Dec 2007 14:37:02 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (localhost [127.0.0.1]) by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id lB7Db1qa020746; Fri, 7 Dec 2007 14:37:01 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: (from ticso@localhost) by cicely12.cicely.de (8.13.4/8.13.3/Submit) id lB7Db1Dr020745; Fri, 7 Dec 2007 14:37:01 +0100 (CET) (envelope-from ticso) Date: Fri, 7 Dec 2007 14:37:01 +0100 From: Bernd Walter To: Ivan Voras Message-ID: <20071207133700.GO10459@cicely12.cicely.de> References: <868x46u0lz.fsf@ds4.des.no> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha User-Agent: Mutt/1.5.9i X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=0.024, BAYES_00=-2.599 autolearn=ham version=3.1.7 X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on cicely12.cicely.de Cc: freebsd-fs@freebsd.org Subject: Re: readv: parallel or sequential? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ticso@cicely.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 13:55:44 -0000 On Fri, Dec 07, 2007 at 02:22:22PM +0100, Ivan Voras wrote: > Dag-Erling Smørgrav wrote: > > > But to return to what Ivan was asking, I think what the man page is > > trying to say is that you can't use readv() to e.g. read individual > > network packets into separate buffers (unless each packet just happens > > to fit exactly within each buffer). > > What about streaming protocols like TCP? If, for example, I know I have > a N-byte header, N2-byte body, couldn't readv handle it with two iovecs? > > But that's not why I started the discussion. I'm looking for a way to do > "scattered" async IO on files (the intention: feed an array of offsets, > lengths and buffers into the kernel, let it perform the requests in > parallel, if it can) and started with this man page. I wonder if the kernel can read a single file in parallel, because disk heads can't be on multiple positions at the same time. ZFS does fill read cache in parallel if it knowns that there are enough spindels, but in every other case the FS doesn't know about multiple spindels. In case of ZFS you don't have to care much about it in you application because the next sequentiel fileread will use the previously parallel prefilled cache. -- B.Walter http://www.bwct.de http://www.fizon.de bernd@bwct.de info@bwct.de support@fizon.de From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 14:11:53 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8F54616A418 for ; Fri, 7 Dec 2007 14:11:53 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 44D9013C4CC for ; Fri, 7 Dec 2007 14:11:53 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1J0dvO-00041n-3y for freebsd-fs@freebsd.org; Fri, 07 Dec 2007 14:11:30 +0000 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 07 Dec 2007 14:11:30 +0000 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 07 Dec 2007 14:11:30 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Fri, 07 Dec 2007 15:16:30 +0100 Lines: 40 Message-ID: References: <868x46u0lz.fsf@ds4.des.no> <20071207133700.GO10459@cicely12.cicely.de> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig51AF23534A84605A6ACA3557" X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Thunderbird 2.0.0.6 (X11/20070801) In-Reply-To: <20071207133700.GO10459@cicely12.cicely.de> X-Enigmail-Version: 0.95.3 Sender: news Subject: Re: readv: parallel or sequential? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 14:11:53 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig51AF23534A84605A6ACA3557 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Bernd Walter wrote: > I wonder if the kernel can read a single file in parallel, because > disk heads can't be on multiple positions at the same time. They can be in case of RAID0 and similar schemes. > ZFS does fill read cache in parallel if it knowns that there are enough= > spindels, but in every other case the FS doesn't know about multiple > spindels. > In case of ZFS you don't have to care much about it in you application > because the next sequentiel fileread will use the previously parallel > prefilled cache. Yes, ZFS is supposed to be doing marvelous things with IO prediction and scheduling, but I think even basic "ladder" scheduling done in FreeBSD could in theory help in tight spots with multiple requests. --------------enig51AF23534A84605A6ACA3557 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) iD8DBQFHWVXEldnAQVacBcgRAmfeAKDWpfk14QJvaSWHgOHFZM0L/5k4AwCdFhsX RTEJji9qv9pHDC07XEGEtpg= =IAO/ -----END PGP SIGNATURE----- --------------enig51AF23534A84605A6ACA3557-- From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 14:45:33 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB6F716A417 for ; Fri, 7 Dec 2007 14:45:33 +0000 (UTC) (envelope-from julien.bellang@free.fr) Received: from smtp1-g19.free.fr (smtp1-g19.free.fr [212.27.42.27]) by mx1.freebsd.org (Postfix) with ESMTP id 7E86913C4EA for ; Fri, 7 Dec 2007 14:45:33 +0000 (UTC) (envelope-from julien.bellang@free.fr) Received: from smtp1-g19.free.fr (localhost.localdomain [127.0.0.1]) by smtp1-g19.free.fr (Postfix) with ESMTP id 6E5EE1AB2B1; Fri, 7 Dec 2007 15:45:32 +0100 (CET) Received: from [127.0.0.1] (vil35-2-82-227-204-7.fbx.proxad.net [82.227.204.7]) by smtp1-g19.free.fr (Postfix) with ESMTP id 0BD351AB2FC; Fri, 7 Dec 2007 15:45:31 +0100 (CET) Message-ID: <47595C8C.6060203@free.fr> Date: Fri, 07 Dec 2007 15:45:32 +0100 From: julien User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: Jim Rees References: <1196788551.47558b47ee317@imp.free.fr> <86d4tiu0ui.fsf@ds4.des.no> <20071207125852.GC6665@citi.umich.edu> In-Reply-To: <20071207125852.GC6665@citi.umich.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Antivirus: avast! (VPS 071206-0, 06/12/2007), Outbound message X-Antivirus-Status: Clean Cc: freebsd-fs@freebsd.org, =?ISO-8859-1?Q?Dag-Erling_Sm=F8rgrav?= Subject: Re: FSCK failed does'nt correct file size when INCORRECT BLOCK COUNT is found X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 14:45:33 -0000 I can't get a UPS in my environment. I already tested with the write cache desactivated, but the problem still the same, I obtain files with holes and incorrect size and blockcount. The only difference is that there is less holes and performance are falling down. The problem is really easy to reproduce, you have just to copy several big files and shutdown the power in the midle of the copy. Jim Rees a écrit : > Dag-Erling Smørgrav wrote: > > Get a UPS. > > We should strive to prevent data corruption in the face of unexpected system > shutdown. Having a user who loses power several times a week seems useful > for testing, especially when he is willing to delve into fsck sources and > figure out what's going on. My recommendation would be to turn off caching > in the disk and report back in a couple of weeks. > > > From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 17:49:26 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C26B16A474; Fri, 7 Dec 2007 17:49:26 +0000 (UTC) (envelope-from ticso@cicely12.cicely.de) Received: from raven.bwct.de (raven.bwct.de [85.159.14.73]) by mx1.freebsd.org (Postfix) with ESMTP id 0DDBF13C4E1; Fri, 7 Dec 2007 17:49:25 +0000 (UTC) (envelope-from ticso@cicely12.cicely.de) Received: from cicely5.cicely.de ([10.1.1.7]) by raven.bwct.de (8.13.4/8.13.4) with ESMTP id lB7HnO3j074856; Fri, 7 Dec 2007 18:49:24 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14]) by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id lB7HnFR8033545 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 7 Dec 2007 18:49:16 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (localhost [127.0.0.1]) by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id lB7HnFME021305; Fri, 7 Dec 2007 18:49:15 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: (from ticso@localhost) by cicely12.cicely.de (8.13.4/8.13.3/Submit) id lB7HnFYS021304; Fri, 7 Dec 2007 18:49:15 +0100 (CET) (envelope-from ticso) Date: Fri, 7 Dec 2007 18:49:15 +0100 From: Bernd Walter To: Ivan Voras Message-ID: <20071207174914.GQ10459@cicely12.cicely.de> References: <868x46u0lz.fsf@ds4.des.no> <20071207133700.GO10459@cicely12.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha User-Agent: Mutt/1.5.9i X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=0.023, BAYES_00=-2.599 autolearn=ham version=3.1.7 X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on cicely12.cicely.de Cc: freebsd-fs@freebsd.org Subject: Re: readv: parallel or sequential? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ticso@cicely.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2007 17:49:26 -0000 On Fri, Dec 07, 2007 at 03:16:30PM +0100, Ivan Voras wrote: > Bernd Walter wrote: > > > I wonder if the kernel can read a single file in parallel, because > > disk heads can't be on multiple positions at the same time. > > They can be in case of RAID0 and similar schemes. Yes, but how can it now that it is on a RAID0 and taking advantage of multiple spindles instead of making it worse? The FS has to do sensible things for single spindle as well. And normaly disks are fastest when reading linear and with disk read caches this doesn't even have to be interleaved. I don't see any potential for parallell access within the same file beside some special constructed cases maybe. Granted if you issue many access in parallel you allow the disk queue to sort them in the most effective way, but most FS do a hard job getting single files almost linear, so there is no seek time win at all. I assume the best is the application sorting the readv entries in an increasing order. > > ZFS does fill read cache in parallel if it knowns that there are enough > > spindels, but in every other case the FS doesn't know about multiple > > spindels. > > In case of ZFS you don't have to care much about it in you application > > because the next sequentiel fileread will use the previously parallel > > prefilled cache. > > Yes, ZFS is supposed to be doing marvelous things with IO prediction and > scheduling, but I think even basic "ladder" scheduling done in FreeBSD > could in theory help in tight spots with multiple requests. At least there are some workloads with very good results. A friend recently measured almost twice the speed when reading a big file on a two disk ZFS mirror compared to single disk raw speed. -- B.Walter http://www.bwct.de http://www.fizon.de bernd@bwct.de info@bwct.de support@fizon.de From owner-freebsd-fs@FreeBSD.ORG Sat Dec 8 00:03:11 2007 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9763E16A420 for ; Sat, 8 Dec 2007 00:03:11 +0000 (UTC) (envelope-from victorloureirolima@gmail.com) Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.183]) by mx1.freebsd.org (Postfix) with ESMTP id 4AFC513C447 for ; Sat, 8 Dec 2007 00:03:11 +0000 (UTC) (envelope-from victorloureirolima@gmail.com) Received: by py-out-1112.google.com with SMTP id u77so1988203pyb for ; Fri, 07 Dec 2007 16:03:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=GGNy31w5bJDgoYCXB3SpGqTcd16W8pYF+ISVMxeO4zs=; b=FwncVyj0hdtu4E6vTS+oI8pZ/afQDiHz3ksZhCdYFUFqHTmZ8vu0y+7NppBcc+8BllQ7JfezrxnJzjWyID7ID9x1YiWdAF6aaNHTBjaYWSydtNQ3oIe9ThsSBHnLcBaAUkfy8CL+g5tGZEM96GF9aKqnPaTNfuSCVatAytmDfYA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=tAsw0Di2+a13jZLoxv6hmwLRPjzceivTTimIr/dHZaH2K1OK+UxloyRnMtwKZm6IdylRstBpzXqWp5S0Xjyu5mRfaAJmAPhAdN4/sLJ+6nSn7t4Ofd63Gwm4hWW6JucSKf9DQD4DO913lR1fF/8ZJyw4zQzOogYonUOugVpcgc8= Received: by 10.35.33.15 with SMTP id l15mr4016058pyj.1197070561952; Fri, 07 Dec 2007 15:36:01 -0800 (PST) Received: by 10.35.125.7 with HTTP; Fri, 7 Dec 2007 15:36:01 -0800 (PST) Message-ID: Date: Fri, 7 Dec 2007 21:36:01 -0200 From: "Victor Loureiro Lima" To: ticso@cicely.de In-Reply-To: <20071206220755.GH10459@cicely12.cicely.de> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20071206210848.GA63825@transwarp.tao.org.uk> <20071206220755.GH10459@cicely12.cicely.de> Cc: Josef Karthauser , fs@freebsd.org Subject: Re: -14% available on /tmp X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Dec 2007 00:03:11 -0000 Okay, and what about this? root@zion# df -H Filesystem Size Used Avail Capacity Mounted on /dev/ad4s1a 520M 341M 137M 71% / devfs 1.0k 1.0k 0B 100% /dev /dev/ad4s1e 520M 25M 453M 5% /tmp /dev/ad4s1f 120G 71G 39G 65% /usr /dev/ad4s1d 2.0G 2.0G -162M 109% /var Has anyone seen this a partition (/var)? Any pointers on how to fix this!? cheers, victor From owner-freebsd-fs@FreeBSD.ORG Sat Dec 8 00:29:04 2007 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 38D4816A417 for ; Sat, 8 Dec 2007 00:29:04 +0000 (UTC) (envelope-from ticso@cicely12.cicely.de) Received: from raven.bwct.de (raven.bwct.de [85.159.14.73]) by mx1.freebsd.org (Postfix) with ESMTP id A14CC13C43E for ; Sat, 8 Dec 2007 00:29:03 +0000 (UTC) (envelope-from ticso@cicely12.cicely.de) Received: from cicely5.cicely.de ([10.1.1.7]) by raven.bwct.de (8.13.4/8.13.4) with ESMTP id lB80T1Af086815; Sat, 8 Dec 2007 01:29:01 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14]) by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id lB80StUc036373 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 8 Dec 2007 01:28:56 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (localhost [127.0.0.1]) by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id lB80St1w022183; Sat, 8 Dec 2007 01:28:55 +0100 (CET) (envelope-from ticso@cicely12.cicely.de) Received: (from ticso@localhost) by cicely12.cicely.de (8.13.4/8.13.3/Submit) id lB80Ss3a022182; Sat, 8 Dec 2007 01:28:54 +0100 (CET) (envelope-from ticso) Date: Sat, 8 Dec 2007 01:28:54 +0100 From: Bernd Walter To: Victor Loureiro Lima Message-ID: <20071208002854.GT10459@cicely12.cicely.de> References: <20071206210848.GA63825@transwarp.tao.org.uk> <20071206220755.GH10459@cicely12.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha User-Agent: Mutt/1.5.9i X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=0.034, BAYES_00=-2.599 autolearn=ham version=3.1.7 X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on cicely12.cicely.de Cc: Josef Karthauser , ticso@cicely.de, fs@freebsd.org Subject: Re: -14% available on /tmp X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ticso@cicely.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Dec 2007 00:29:04 -0000 On Fri, Dec 07, 2007 at 09:36:01PM -0200, Victor Loureiro Lima wrote: > Okay, and what about this? > > root@zion# df -H > Filesystem Size Used Avail Capacity Mounted on > /dev/ad4s1a 520M 341M 137M 71% / > devfs 1.0k 1.0k 0B 100% /dev > /dev/ad4s1e 520M 25M 453M 5% /tmp > /dev/ad4s1f 120G 71G 39G 65% /usr > /dev/ad4s1d 2.0G 2.0G -162M 109% /var > > Has anyone seen this a partition (/var)? Any pointers on how to fix this!? This is expected if you overfilled your FS because it has some space reserved for tuning which was exhausted by root processes - see tunfs(8) -m option for details. There is nothing wrong with your filesystem and the situation can be "fixed" just by deleting files. The original post however had a negative used value, which is not expected and is a summary corruption in the filesystem. -- B.Walter http://www.bwct.de http://www.fizon.de bernd@bwct.de info@bwct.de support@fizon.de