From owner-freebsd-arch@FreeBSD.ORG Sat Jan 4 00:55:50 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 58D7276A for ; Sat, 4 Jan 2014 00:55:50 +0000 (UTC) Received: from mail-qe0-x235.google.com (mail-qe0-x235.google.com [IPv6:2607:f8b0:400d:c02::235]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1948A119F for ; Sat, 4 Jan 2014 00:55:50 +0000 (UTC) Received: by mail-qe0-f53.google.com with SMTP id nc12so16097400qeb.26 for ; Fri, 03 Jan 2014 16:55:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=LIhRyRl9dMcl+EqubfYVk6aTYddga12hkxhXFFT6ViE=; b=i+G1L0+NuUodPA3oX2UCb23FJ2s3/5ZyDefZwB69LyBWPUTMP6jgv5BHHWN/ICL39z m9qZpHqJADX0ecmZ5sR2/8sB4tzZlkR0P52P8wJUm3TVqBo1KLwalXRMu03kEdIld4Tb H2v8K9m5qhpsAX28ZQUuMkzN6PbLuubYAB+1p7hlWGj0MXfZWEQ5iVgyOfnQjC2hFejr 3uguFwQm/jWFB1UH06G/UW4qLTAVuiX8mG9Ljvhhxt0tfMjRc+BLlqLz6zO6yMRhXG5l /m31caQPn3mGr3/Gy8ceHoub3okjG/uae7jdu/6W3ZUkF/IvElM1iTUfptfh1L7jGeo9 Ajag== MIME-Version: 1.0 X-Received: by 10.224.13.141 with SMTP id c13mr145750417qaa.76.1388796948485; Fri, 03 Jan 2014 16:55:48 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.52.8 with HTTP; Fri, 3 Jan 2014 16:55:48 -0800 (PST) Date: Fri, 3 Jan 2014 16:55:48 -0800 X-Google-Sender-Auth: QE9XBmD7cGltLoDlq7Y04nacAaM Message-ID: Subject: Acquiring a lock on the same CPU that holds it - what can be done? From: Adrian Chadd To: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jan 2014 00:55:50 -0000 Hi, So here's a fun one. When doing TCP traffic + socket affinity + thread pinning experiments, I seem to hit this very annoying scenario that caps my performance and scalability. Assume I've lined up everything relating to a socket to run on the same CPU (ie, TX, RX, TCP timers, userland thread): * userland code calls something, let's say "kqueue" * the kqueue lock gets grabbed * an interrupt comes in for the NIC * the NIC code runs some RX code, and eventually hits something that wants to push a knote up * and the knote is for the same kqueue above * .. so it grabs the lock.. * .. contests.. * Then the scheduler flips us back to the original userland thread doing TX * The userland thread finishes its kqueue manipulation and releases the queue lock * .. the scheduler then immediately flips back to the NIC thread waiting for the lock, grabs the lock, does a bit of work, then releases the lock I see this on kqueue locks, sendfile locks (for sendfile notification) and vm locks (for the VM page referencing/dereferencing.) This happens very frequently. It's very noticable with large numbers of sockets as the chances of hitting a lock in the NIC RX path that overlaps with something in the userland TX path that you are currently fiddling with (eg kqueue manipulation) or sending data (eg vm_page locks or sendfile locks for things you're currently transmitting) is very high. As I increase traffic and the number of sockets, the amount of context switches goes way up (to 300,000+) and the lock contention / time spent doing locking is non-trivial. Linux doesn't "have this" problem - the lock primitives let you disable driver bottom halves. So, in this instance, I'd just grab the lock with spin_lock_bh() and all the driver bottom halves would not be run. I'd thus not have this scheduler ping-ponging and lock contention as it'd never get a chance to happen. So, does anyone have any ideas? Has anyone seen this? Shall we just implement a way of doing selective thread disabling, a la spin_lock_bh() mixed with spl${foo}() style stuff? Thanks, -adrian