From owner-freebsd-stable@FreeBSD.ORG Tue Dec 12 13:54:00 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id DD3AA16A51C; Tue, 12 Dec 2006 13:54:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from fw.zoral.com.ua (fw.zoral.com.ua [213.186.206.134]) by mx1.FreeBSD.org (Postfix) with ESMTP id DDDE743DDA; Tue, 12 Dec 2006 13:51:50 +0000 (GMT) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by fw.zoral.com.ua (8.13.4/8.13.4) with ESMTP id kBCDqsoY000344 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 12 Dec 2006 15:52:54 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.13.8/8.13.8) with ESMTP id kBCDqrKb019225; Tue, 12 Dec 2006 15:52:53 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.13.8/8.13.8/Submit) id kBCDqplF019224; Tue, 12 Dec 2006 15:52:51 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 12 Dec 2006 15:52:51 +0200 From: Kostik Belousov To: Bruce Evans Message-ID: <20061212135251.GJ311@deviant.kiev.zoral.com.ua> References: <456950AF.3090308@sh.cvut.cz> <20061127092146.GA69556@deviant.kiev.zoral.com.ua> <457E6C06.1020405@FreeBSD.org> <20061212101903.GF311@deviant.kiev.zoral.com.ua> <20061212220705.F57430@delplex.bde.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="v2Uk6McLiE8OV1El" Content-Disposition: inline In-Reply-To: <20061212220705.F57430@delplex.bde.org> User-Agent: Mutt/1.4.2.2i X-Virus-Scanned: ClamAV version 0.88.4, clamav-milter version 0.88.4 on fw.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=1.4 required=5.0 tests=SPF_NEUTRAL, UNPARSEABLE_RELAY autolearn=no version=3.1.4 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on fw.zoral.com.ua Cc: freebsd-stable@freebsd.org, freebsd-current@freebsd.org, Suleiman Souhlal , V??clav Haisman , bde@freebsd.org, tegge@freebsd.org Subject: Re: kqueue LOR X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Dec 2006 13:54:01 -0000 --v2Uk6McLiE8OV1El Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Dec 12, 2006 at 11:49:42PM +1100, Bruce Evans wrote: > On Tue, 12 Dec 2006, Kostik Belousov wrote: >=20 > >On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote: >=20 > >>Is the mount lock really required, if all we're doing is a single read = of=20 > >>a > >>single word (mnt_kern_flags) (v_mount should be read-only for the whole > >>lifetime of the vnode, I believe)? After all, reads of a single word are > >>atomic on all our supported architectures. > >>The only situation I see where there MIGHT be problems are forced=20 > >>unmounts, > >>but I think there are bigger issues with those. > >>Sorry for noticing this email only now. > > > >The problem is real with snapshotting. Ignoring > >MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of > >mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode > >inactivation time. This was the big trouble with nfsd and snapshots. As > >such, I think that precise value of mmnt_kern_flag is critical there, > >and mount interlock is needed. >=20 > Locking for just read is almost always bogus, but here (as in most > cases) there is also a write based on the contents of the flag, and > the lock is held across the write. >=20 > >Practically speaking, I agree with claim that reading of m_k_f is > >surrounded by enough locked operations that would make sure that > >the read value is not stale. But there is no such guarantee on > >future/non-i386 arches, isn't it ? >=20 > I think not-very-staleness is implied by acquire/release semantics > which are part of the API for most atomic operations. This behaviour > doesn't seem to be documented for mutexes, but I don't see how mutexes > could work without it (they have to synchronize all memory accesses, > not just the memory accessed by the lock). >=20 > >As a side note, mount interlock scope could be reduced there. > > > >Index: ufs/ufs/ufs_vnops.c > >=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v > >retrieving revision 1.283 > >diff -u -r1.283 ufs_vnops.c > >--- ufs/ufs/ufs_vnops.c 6 Nov 2006 13:42:09 -0000 1.283 > >+++ ufs/ufs/ufs_vnops.c 12 Dec 2006 10:18:04 -0000 > >@@ -133,19 +134,19 @@ > >{ > > struct inode *ip; > > struct timespec ts; > >- int mnt_locked; > > > > ip =3D VTOI(vp); > >- mnt_locked =3D 0; > > if ((vp->v_mount->mnt_flag & MNT_RDONLY) !=3D 0) { > > VI_LOCK(vp); > > goto out; > > } > > MNT_ILOCK(vp->v_mount); /* For reading of mnt_kern_flags. */ > >- mnt_locked =3D 1; > > VI_LOCK(vp); > >- if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) =3D=3D 0) > >- goto out_unl; > >+ if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) =3D=3D 0) { > >+ MNT_IUNLOCK(vp->v_mount); > >+ VI_UNLOCK(vp); > >+ return; > >+ } >=20 > The version that depends on not-very-staleness would test the flags > without acquiring the lock(s) and return immediately in the usual case > where none of the flags are set. It would have to acquire the locks > and repeat the test to make changes (and the test is already repeated > one flag at a time). I think this would be correct enough, but still > inefficient and/or even messier. The current organization is usually: >=20 > acquire vnode interlock in caller > release vnode interlock in caller to avoid messes here (inefficient) > call > acquire mount interlock > acquire vnode interlock > test the flags; goto cleanup code if none set (usual case) > do the work > release vnode interlock > release mount interlock > return > acquire vnode interlock (if needed) > release vnode interlock (if needed) >=20 > and it might become: >=20 > acquire vnode interlock in caller > call > test the flags; return if none set (usual case) > release vnode interlock // check that callers are aware of=20 > this > acquire mount interlock > acquire vnode interlock > do the work > // Assume no LOR problem for release, as below. > // Otherwise need another relese+acquire of vnode interlock. > release mount interlock > return > release vnode interlock >=20 > > > > if ((vp->v_type =3D=3D VBLK || vp->v_type =3D=3D VCHR) && !DOINGSOFTDEP= (vp)) > > ip->i_flag |=3D IN_LAZYMOD; > >@@ -155,6 +156,7 @@ > > ip->i_flag |=3D IN_MODIFIED; > > else if (ip->i_flag & IN_ACCESS) > > ip->i_flag |=3D IN_LAZYACCESS; > >+ MNT_IUNLOCK(vp->v_mount); > > vfs_timestamp(&ts); > > if (ip->i_flag & IN_ACCESS) { > > DIP_SET(ip, i_atime, ts.tv_sec); >=20 > Is there no LOR problem for release? AFAIK, no. Release is guaranteed to success without blocking. >=20 > As I understand it, MNT_ILOCK() is only protecting IN_ACCESS being > converted to IN_MODIFED, so after this conversion is done the lock > is not needed. Is this correct? The code shall set IN_MODIFIED flag only if MNTK_SUSPEND and MNTK_SUSPENDED are clean. The vfs_write_suspend() that set MNTK_SUSPEND, does VFS_SYNC() after setting the flag. Since VFS_SYNC() tries to interlock the vnode (to test the flags), and then locks the vnode for ffs_syncvnode() call, the loop in ffs_sync() cannot finish before the ufs_itimes released vnode interlock. The MNTK_SUSPENDED is set after the loop. Hmm, may be, since vnode must be interlocked by ffs_sync() after MNTK_SUSPENDED set, and before MNTK_SUSPEND, mount interlock is not needed in ufs_itimes. Tor ? >=20 > >@@ -172,10 +174,7 @@ > > > > out: > > ip->i_flag &=3D ~(IN_ACCESS | IN_CHANGE | IN_UPDATE); > >- out_unl: > > VI_UNLOCK(vp); > >- if (mnt_locked) > >- MNT_IUNLOCK(vp->v_mount); > >} > > > >/* > > >=20 > BTW, vfs.lookup_shared defaults to 0 and decides shared access for all > operations including read, so I wonder if there are [m]any bugs > preventing shared accesses being the default for the most important > cases where they can be (lookup and reads). (As you know, not acquiring > the vnode interlock in old versions of the above was one, and since > vfs.lookup_shared is global and not all file systems have the fix, it > still is one.) Kris Kennaway said that lookup_shared caused deadlocks ? --v2Uk6McLiE8OV1El Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFFfrQyC3+MBN1Mb4gRAlPCAJsHQNE3GIdPbTSmxyO75sjugSRqRQCeLkPr TwT9hV93Txg6r3IQD4gMlb8= =NzDM -----END PGP SIGNATURE----- --v2Uk6McLiE8OV1El--