From owner-freebsd-arch@FreeBSD.ORG Tue May 12 03:50:26 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 52185106566B for ; Tue, 12 May 2009 03:50:26 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.239]) by mx1.freebsd.org (Postfix) with ESMTP id 3211E8FC1B for ; Tue, 12 May 2009 03:50:26 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by rv-out-0506.google.com with SMTP id l9so2107055rvb.3 for ; Mon, 11 May 2009 20:50:25 -0700 (PDT) Received: by 10.141.50.11 with SMTP id c11mr3376260rvk.139.1242098945761; Mon, 11 May 2009 20:29:05 -0700 (PDT) Received: from ?10.0.1.198? (udp016664uds.hawaiiantel.net [72.235.41.117]) by mx.google.com with ESMTPS id b39sm117304rvf.3.2009.05.11.20.29.03 (version=SSLv3 cipher=RC4-MD5); Mon, 11 May 2009 20:29:04 -0700 (PDT) Date: Mon, 11 May 2009 17:32:17 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: arch@freebsd.org Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Cc: Subject: lockless file descriptor lookup X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 May 2009 03:50:26 -0000 http://people.freebsd.org/~jeff/locklessfd.diff This patch implements a lockless lookup path for file descriptors. The meat of the algorithm is in fget_unlocked(). This returns a referenced file descriptor, unlike fget_locked(). In the common case this reduces the number of atomics required for fget() while allowing for lookups to proceed concurrently with modifications to the table and preventing preemption from causing context switches. Using the libMicro 4.0 benchmarking suite with a thread count of 16 on an 8core box yields improvements by as much as 428% in descriptor heavy tests. There were no performance regressions with this benchmark. The code works by allowing lookup threads to follow two previously unsafe pointers. First, the file descriptor table itself is never freed on expansion until the process exits. That ensures that no pagefaults or random memory access can occur if expansion happens after the table pointer is fetched. Given that the vast majority of processes never expand their descriptor table, it is not any significant memory overhead to save them. I shamelessly stole this idea from NetBSD. The struct files themselves are marked as UMA_ZONE_NOFREE and never reclaimed. This allows us to safely attempt to reference count them without any locks. To prevent fdrop() races fget_unlocked() uses a cmpset loop to ensure that it never raises the reference count above zero. In this way it can never reference a free'd or recently allocated file. Once the file descriptor is resolved, we verify the path via the descriptor table once more to ensure that it has not changed. At this point, we have a valid reference or we drop an invalid reference and retry. This gives us the overhead of only one atomic instruction for common case file access. In the worst case there can be some spinning in the loop in fget_unlocked(), but some thread always makes forward progress for each iteration of the loop. I'm going to see if the usual suspects will stress test this but I'd like to see it in 8.0. This is your chance to make any counter arguments. I'd also appreciate it if someone could look at my volatile cast and make sure I'm actually forcing the compiler to refresh the fd_ofiles array here: + if (fp == ((struct file *volatile*)fdp->fd_ofiles)[fd]) Thanks, Jeff