From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 20:16:26 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A45D16A417; Sat, 22 Dec 2007 20:16:26 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197]) by mx1.freebsd.org (Postfix) with ESMTP id 2441113C455; Sat, 22 Dec 2007 20:16:25 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=deviant.kiev.zoral.com.ua) by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1J6Ald-000LAK-OO; Sat, 22 Dec 2007 22:16:24 +0200 Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id lBMKGE83034471; Sat, 22 Dec 2007 22:16:14 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id lBMKGEiI034470; Sat, 22 Dec 2007 22:16:14 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 22 Dec 2007 22:16:13 +0200 From: Kostik Belousov To: Bruce Evans Message-ID: <20071222201613.GX57756@deviant.kiev.zoral.com.ua> References: <20071221234347.GS25053@tnn.dglawrence.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> <20071223032944.G48303@delplex.bde.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="0k4Rxg87Lb8yV0u3" Content-Disposition: inline In-Reply-To: <20071223032944.G48303@delplex.bde.org> User-Agent: Mutt/1.4.2.3i X-Scanner-Signature: 69ad68b945e79e087f1af101bf3fefec X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 1946 [Dec 22 2007] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: "Freebsd-Net@Freebsd. Org" , David Schwartz , freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 20:16:26 -0000 --0k4Rxg87Lb8yV0u3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Dec 23, 2007 at 04:08:09AM +1100, Bruce Evans wrote: > On Sat, 22 Dec 2007, Kostik Belousov wrote: > >Yes, rewriting the syncer is the right solution. It probably cannot be d= one > >quickly enough. If the yield workaround provide mitigation for now, it > >shall go in. >=20 > I don't think rewriting the syncer just for this is the right solution. > Rewriting the syncer so that it schedules actual i/o more efficiently > might involve a solution. Better scheduling would probably take more > CPU and increase the problem. I think that we can easily predict what vnode(s) become dirty at the places where we do vn_start_write(). >=20 > Note that MNT_VNODE_FOREACH() is used 17 times, so the yielding fix is > needed in 17 places if it isn't done internally in MNT_VNODE_FOREACH(). > There are 4 places in vfs and 13 places in 6 file systems: >=20 > % ./ufs/ffs/ffs_snapshot.c: MNT_VNODE_FOREACH(xvp, mp, mvp) { > % ./ufs/ffs/ffs_snapshot.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./ufs/ffs/ffs_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./ufs/ffs/ffs_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./ufs/ufs/ufs_quota.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./ufs/ufs/ufs_quota.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./ufs/ufs/ufs_quota.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./fs/msdosfs/msdosfs_vfsops.c: MNT_VNODE_FOREACH(vp, mp, nvp) { > % ./fs/coda/coda_subr.c: MNT_VNODE_FOREACH(vp, mp, nvp) { > % ./gnu/fs/ext2fs/ext2_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./gnu/fs/ext2fs/ext2_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./kern/vfs_default.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./kern/vfs_subr.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./kern/vfs_subr.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./nfs4client/nfs4_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./nfsclient/nfs_subs.c: MNT_VNODE_FOREACH(vp, mp, nvp) { > % ./nfsclient/nfs_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { >=20 > Only file systems that support writing need it (for VOP_SYNC() and for > MNT_RELOAD), else there would be many more places. There would also > be more places if MNT_RELOAD support were not missing for some file > systems. Ok, since you talked about this first :). I already made the following patch, but did not published it since I still did not inspected all callers of MNT_VNODE_FOREACH() for safety of dropping mount interlock. It shall be safe, but better to check. Also, I postponed the check until it was reported that yielding does solve the original problem. diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c index 14acc5b..046af82 100644 --- a/sys/kern/vfs_mount.c +++ b/sys/kern/vfs_mount.c @@ -1994,6 +1994,12 @@ __mnt_vnode_next(struct vnode **mvp, struct mount *m= p) mtx_assert(MNT_MTX(mp), MA_OWNED); =20 KASSERT((*mvp)->v_mount =3D=3D mp, ("marker vnode mount list mismatch")); + if ((*mvp)->v_yield++ =3D=3D 500) { + MNT_IUNLOCK(mp); + (*mvp)->v_yield =3D 0; + uio_yield(); + MNT_ILOCK(mp); + } vp =3D TAILQ_NEXT(*mvp, v_nmntvnodes); while (vp !=3D NULL && vp->v_type =3D=3D VMARKER) vp =3D TAILQ_NEXT(vp, v_nmntvnodes); diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h index dc70417..6e3119b 100644 --- a/sys/sys/vnode.h +++ b/sys/sys/vnode.h @@ -131,6 +131,7 @@ struct vnode { struct socket *vu_socket; /* v unix domain net (VSOCK) */ struct cdev *vu_cdev; /* v device (VCHR, VBLK) */ struct fifoinfo *vu_fifoinfo; /* v fifo (VFIFO) */ + int vu_yield; /* yield count (VMARKER) */ } v_un; =20 /* @@ -185,6 +186,7 @@ struct vnode { #define v_socket v_un.vu_socket #define v_rdev v_un.vu_cdev #define v_fifoinfo v_un.vu_fifoinfo +#define v_yield v_un.vu_yield =20 /* XXX: These are temporary to avoid a source sweep at this time */ #define v_object v_bufobj.bo_object --0k4Rxg87Lb8yV0u3 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHbXCNC3+MBN1Mb4gRAl8eAJ9DvGYrFBcvBUeaesQfI8K8NZa8CwCfabpZ P1ojIQjyRhEbd8gCeutenLM= =t5ni -----END PGP SIGNATURE----- --0k4Rxg87Lb8yV0u3--