From owner-freebsd-arch@FreeBSD.ORG Wed Jul 4 11:58:45 2007 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E4A1F16A421; Wed, 4 Jul 2007 11:58:45 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 59F3113C483; Wed, 4 Jul 2007 11:58:45 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id C3B6D48B11; Wed, 4 Jul 2007 07:58:44 -0400 (EDT) Date: Wed, 4 Jul 2007 12:58:44 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Alfred Perlstein In-Reply-To: <20070704105525.GU45894@elvis.mu.org> Message-ID: <20070704124833.W37059@fledge.watson.org> References: <20070702230728.E552@10.0.0.1> <20070703181242.T552@10.0.0.1> <20070704105525.GU45894@elvis.mu.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: Fine grain select locking. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Jul 2007 11:58:46 -0000 On Wed, 4 Jul 2007, Alfred Perlstein wrote: > * Jeff Roberson [070703 18:16] wrote: > >> Here is an update that avoids the malloc per fd when there are no >> collisions. This unfortunately adds 64bytes to every socket in the system. >> This is less than 10% of the size of the socket. Vnodes only allocate >> their selinfo structures on demand so this does not cause a per-file >> overhead. This was suggested by Peter. This patch also uses a vm zone for >> the selfd structures. I can shrink them slightly by using a SLIST in one >> case vs TAILQ as well. >> >> http://people.freebsd.org/~jeff/select2.diff > > Jeff, I understand you're trying to speed up mysql micro benchmarks, but > have you done any benchmarking on large select operations? I also worry about the narrowness of the benchmarking we're doing -- however, it's hardly new. We do best at optimizing where we have clearly defined targets and measures of performance. The four-times increase in MySQL select performance is a direct result of Kris taking on scalability measurement and helping developers with optimization ideas try them out, profile them, etc. A point I've made at a number of devsummits and elsewhere is that what we really need now is more people to "take ownership" of the performance of workloads they care about. They don't need to be the people to do the optimizations, but if they could help manage outstanding patchsets, measure the change in performance over time, get involved in profiling, etc, then that will have a big effect on performance for the workload, as has happened with MySQL. Here are some workloads I'd really like to see people take responsibility for: - Flat file Apache performance, perhaps with Apachebench or another HTTP throughput measurement tool. - Dynamic Apache performance, perhaps using some combination of Apache/php/MySQL. - BIND query performance with a few realistic-looking workloads. - PostgreSQL performance along the same lines as current MySQL performance. Kris has waved his hands a bit in this direction already and much of the MySQL measurement work can be reused. - Some sort of compiler/build/etc test -- buildworld of HEAD tends to be highly variable over time as components change, compilers change, etc, but optimizing build performance still has a big benefit for developers. Perhaps how long it takes to do the post-buildtools bit of buildworld for a fixed FreeBSD version. - Network micro-benchmarks, including loopback TCP and UDP, multi-machine TCP and UDP, both single stream and multi-stream. - UI interactivity testing -- how long it takes to go from a simultaned keypress from the keyboard device to an input program running in an xterm and other related latency tests that will be affected by scheduling, IPC, and so on. There seem to be two parts of owning a benchmark: - Establishing baselines over time -- how doe FreeBSD 4.8, 5.5, 6.0, 6.1, 6.2, 6-STABLE weekly, 7-CURRENT weekly, and maybe a Linux or NetBSD version perform for the workload using otherwise identical configuration. - Measurement and feedback -- identifying bottlenecks, working with developers to measure the results of specific optimizations, etc, across the life cycle of the patch. If Kris can motivate such a dramatic improvement in MySQL performance, it seems likely that people doing similar things with other workloads could have similar effects. And, as you say, breadth is really important -- tuning the system for MySQL is very important, but has it generally hurt or helped other workloads? In most cases, I'd expect work to date to have helped, because it involved lowering overhead, etc. However, when we get into schedulers, space/time trade-offs, and so on, then that balance will become harder to strike. Robert N M Watson Computer Laboratory University of Cambridge > > You seemed very dismissive when I brought up caching of the selfd objects > and malloc'd bitmap space per-thread on IRC, so I'd like to know if that was > based on anything. > > What are the numbers before and after for selecting on 1000 or maybe 10000 > descriptors before and after your patch? > > This is especially important if you'd like it in the door for 7.0, right? > > -Alfred > > > >> >> Thanks, >> Jeff >> >> On Mon, 2 Jul 2007, Jeff Roberson wrote: >> >>> I have a diff which makes the following improvements to select: >>> >>> 1) Per-thread wait channel rather than global select wait channel. >>> 2) Per-thread select lock. >>> 3) Rescan after sleep scans only descriptors which have come active. >>> 4) No exposed select internals. >>> 5) selwakeuppri() works again. >>> 6) No thread_lock()ing in select, no TDF_SELECT required. >>> 7) No more collisions. >>> >>> This is based on an approach from Alfred with some locking and rescan >>> improvements by me. It only required changing select users in cases where >>> they assumed only one thread could select at a time. >>> >>> The unfortunate cost of this patch is that a descriptor per select fd must >>> be allocated to track individual threads. This is what allows us to know >>> which descriptor has fired an event and allows us to use per-thread >>> locking etc. >>> >>> The one thing I haven't fixed is netsmb and netncp which both have some >>> wonky select implementation that could be replaced with kern_select(). >>> That could be done seperately from this patch but is required for this to >>> go in. >>> >>> http://people.freebsd.org/~jeff/select.diff >>> >>> Comments and suggestions welcome. >>> >>> Thanks, >>> Jeff >>> _______________________________________________ >>> freebsd-arch@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >>> >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > > -- > - Alfred Perlstein > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >