From owner-freebsd-current@FreeBSD.ORG Mon Nov 10 08:35:05 2014 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0D778554; Mon, 10 Nov 2014 08:35:05 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8AFA3F64; Mon, 10 Nov 2014 08:35:04 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id sAA8Ywaa067894 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 10 Nov 2014 10:34:58 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua sAA8Ywaa067894 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id sAA8YvGd067893; Mon, 10 Nov 2014 10:34:57 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 10 Nov 2014 10:34:57 +0200 From: Konstantin Belousov To: Luigi Rizzo Subject: Re: dev_lock() contention for fdesc syscalls -- possible fix Message-ID: <20141110083457.GD53947@kib.kiev.ua> References: <20141110014939.GA21626@onelab2.iet.unipi.it> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141110014939.GA21626@onelab2.iet.unipi.it> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: jmg@freebsd.org, gnn@freebsd.org, adrian@freebsd.org, current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Nov 2014 08:35:05 -0000 On Mon, Nov 10, 2014 at 02:49:39AM +0100, Luigi Rizzo wrote: > It was noticed that there is huge dev_lock() contention when multiple > processes do a poll() even on independent file descriptors. > > Turns out that not just poll but most syscalls on file descriptors > (as opposed to sockets) in sys/fs/devfs/devfs_vnops.c including > devfs_poll_f(), devfs_ioctl_f() and read/write share the problem > as they use the following pattern > > devfs_poll_f() { > ... > devfs_fp_check(fp, ...) --> > kern/kern_conf.c :: devvn_refthread(fp->f_vnode, ...) --> > dev_lock(); > dev = vp->v_rdev; // lock on vp ? > ... check that dev != NULL > atomic_add_long(&dev->si_threadcount, 1); > dev_unlock(); > dsw->d_poll() > dev_relthread() --> > atomic_subtract_rel_long(&dev->si_threadcount, 1); > } > > > I believe that dev_lock() in devvn_refthread() is protecting > dev = vp->v_rdev > (the cdev entry 'dev' cannot go away for the reasons stated below). > > However looking at places where vp->v_rdev is modified, turns out > that it only happens when holding VI_LOCK(vp) -- the places are > devfs_allocv() and devfs_reclaim(). > There is one place in zfs where the vnode is created and v_rdev > is set without holding a lock, so nobody can dereference it. > > As a consequence i believe that if in devfs_fp_check() we replace > dev_lock() / dev_unlock() with VI_LOCK(vp) / VI_UNLOCK(vp), > we make sure that we can safely dereference vp->v_rdev, and the > cdev cannot go away because the vnode has a reference to it. > The counter uses an atomic instruction (so the release is lockless) Vnode reference, as well as cdev reference, which is obtained by dev_ref(), do not provide any protection there. v_rdev is only coincidentally protected by the vnode interlock. If you look at larger part of the code, you would note that dev mutex is held even after v_rdev is dereferenced. The real protection it provides is against race with destroy_dev(l)(), which could invalidate dev->so_devsw at any moment when either device thread reference is not held, or dev mutex is not held. So your patch breaks the device destruction. > > This should be enough to remove the contention. If you never calls destroy_dev() for the devfs node, you could use MAKEDEV_ETERNAL flag for make_dev_credf(), which indicates that there is no risk of race with destroy_dev() for the created node. Usually, code which could be compiled in kernel statically but also loaded as module, use MAKEDEV_ETERNAL_KLD flag, which takes care of module needed to call destroy_dev(). > > Anyone care to review/test the patch below ? Yes, there are people who care. > (dev_refthread() still has the single dev_lock() contention, > i don't know how to address that yet) > > cheers > luigi > > Index: /home/luigi/FreeBSD/head/sys/kern/kern_conf.c > =================================================================== > --- /home/luigi/FreeBSD/head/sys/kern/kern_conf.c (revision 273452) > +++ /home/luigi/FreeBSD/head/sys/kern/kern_conf.c (working copy) > @@ -224,10 +224,11 @@ > } > > csw = NULL; > - dev_lock(); > + ASSERT_VI_UNLOCKED(vp, __func__); > + VI_LOCK(vp); // dev_lock(); > dev = vp->v_rdev; > if (dev == NULL) { > - dev_unlock(); > + VI_UNLOCK(vp); // dev_unlock(); > return (NULL); > } > cdp = cdev2priv(dev); > @@ -236,7 +237,7 @@ > if (csw != NULL) > atomic_add_long(&dev->si_threadcount, 1); > } > - dev_unlock(); > + VI_UNLOCK(vp); // dev_unlock(); > if (csw != NULL) { > *devp = dev; > *ref = 1; > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"