From owner-freebsd-arch@FreeBSD.ORG  Wed Jul  4 11:58:45 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: arch@freebsd.org
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id E4A1F16A421;
	Wed,  4 Jul 2007 11:58:45 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 59F3113C483;
	Wed,  4 Jul 2007 11:58:45 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id C3B6D48B11;
	Wed,  4 Jul 2007 07:58:44 -0400 (EDT)
Date: Wed, 4 Jul 2007 12:58:44 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Alfred Perlstein <alfred@freebsd.org>
In-Reply-To: <20070704105525.GU45894@elvis.mu.org>
Message-ID: <20070704124833.W37059@fledge.watson.org>
References: <20070702230728.E552@10.0.0.1> <20070703181242.T552@10.0.0.1>
	<20070704105525.GU45894@elvis.mu.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@freebsd.org
Subject: Re: Fine grain select locking.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Jul 2007 11:58:46 -0000


On Wed, 4 Jul 2007, Alfred Perlstein wrote:

> * Jeff Roberson <jroberson@chesapeake.net> [070703 18:16] wrote:
>
>> Here is an update that avoids the malloc per fd when there are no 
>> collisions.  This unfortunately adds 64bytes to every socket in the system. 
>> This is less than 10% of the size of the socket.  Vnodes only allocate 
>> their selinfo structures on demand so this does not cause a per-file 
>> overhead.  This was suggested by Peter.  This patch also uses a vm zone for 
>> the selfd structures.  I can shrink them slightly by using a SLIST in one 
>> case vs TAILQ as well.
>>
>> http://people.freebsd.org/~jeff/select2.diff
>
> Jeff, I understand you're trying to speed up mysql micro benchmarks, but 
> have you done any benchmarking on large select operations?

I also worry about the narrowness of the benchmarking we're doing -- however, 
it's hardly new.  We do best at optimizing where we have clearly defined 
targets and measures of performance.  The four-times increase in MySQL select 
performance is a direct result of Kris taking on scalability measurement and 
helping developers with optimization ideas try them out, profile them, etc.

A point I've made at a number of devsummits and elsewhere is that what we 
really need now is more people to "take ownership" of the performance of 
workloads they care about.  They don't need to be the people to do the 
optimizations, but if they could help manage outstanding patchsets, measure 
the change in performance over time, get involved in profiling, etc, then that 
will have a big effect on performance for the workload, as has happened with 
MySQL.

Here are some workloads I'd really like to see people take responsibility for:

- Flat file Apache performance, perhaps with Apachebench or another HTTP
   throughput measurement tool.

- Dynamic Apache performance, perhaps using some combination of
   Apache/php/MySQL.

- BIND query performance with a few realistic-looking workloads.

- PostgreSQL performance along the same lines as current MySQL performance.
   Kris has waved his hands a bit in this direction already and much of the
   MySQL measurement work can be reused.

- Some sort of compiler/build/etc test -- buildworld of HEAD tends to be
   highly variable over time as components change, compilers change, etc, but
   optimizing build performance still has a big benefit for developers.
   Perhaps how long it takes to do the post-buildtools bit of buildworld for a
   fixed FreeBSD version.

- Network micro-benchmarks, including loopback TCP and UDP, multi-machine TCP
   and UDP, both single stream and multi-stream.

- UI interactivity testing -- how long it takes to go from a simultaned
   keypress from the keyboard device to an input program running in an xterm
   and other related latency tests that will be affected by scheduling, IPC,
   and so on.

There seem to be two parts of owning a benchmark:

- Establishing baselines over time -- how doe FreeBSD 4.8, 5.5, 6.0, 6.1, 6.2,
   6-STABLE weekly, 7-CURRENT weekly, and maybe a Linux or NetBSD version
   perform for the workload using otherwise identical configuration.

- Measurement and feedback -- identifying bottlenecks, working with developers
   to measure the results of specific optimizations, etc, across the life cycle
   of the patch.

If Kris can motivate such a dramatic improvement in MySQL performance, it 
seems likely that people doing similar things with other workloads could have 
similar effects.  And, as you say, breadth is really important -- tuning the 
system for MySQL is very important, but has it generally hurt or helped other 
workloads?  In most cases, I'd expect work to date to have helped, because it 
involved lowering overhead, etc.  However, when we get into schedulers, 
space/time trade-offs, and so on, then that balance will become harder to 
strike.

Robert N M Watson
Computer Laboratory
University of Cambridge

>
> You seemed very dismissive when I brought up caching of the selfd objects 
> and malloc'd bitmap space per-thread on IRC, so I'd like to know if that was 
> based on anything.
>
> What are the numbers before and after for selecting on 1000 or maybe 10000 
> descriptors before and after your patch?
>
> This is especially important if you'd like it in the door for 7.0, right?
>
> -Alfred
>
>
>
>>
>> Thanks,
>> Jeff
>>
>> On Mon, 2 Jul 2007, Jeff Roberson wrote:
>>
>>> I have a diff which makes the following improvements to select:
>>>
>>> 1) Per-thread wait channel rather than global select wait channel.
>>> 2) Per-thread select lock.
>>> 3) Rescan after sleep scans only descriptors which have come active.
>>> 4) No exposed select internals.
>>> 5) selwakeuppri() works again.
>>> 6) No thread_lock()ing in select, no TDF_SELECT required.
>>> 7) No more collisions.
>>>
>>> This is based on an approach from Alfred with some locking and rescan
>>> improvements by me.  It only required changing select users in cases where
>>> they assumed only one thread could select at a time.
>>>
>>> The unfortunate cost of this patch is that a descriptor per select fd must
>>> be allocated to track individual threads.  This is what allows us to know
>>> which descriptor has fired an event and allows us to use per-thread
>>> locking etc.
>>>
>>> The one thing I haven't fixed is netsmb and netncp which both have some
>>> wonky select implementation that could be replaced with kern_select().
>>> That could be done seperately from this patch but is required for this to
>>> go in.
>>>
>>> http://people.freebsd.org/~jeff/select.diff
>>>
>>> Comments and suggestions welcome.
>>>
>>> Thanks,
>>> Jeff
>>> _______________________________________________
>>> freebsd-arch@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
>>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>>>
>> _______________________________________________
>> freebsd-arch@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>
> -- 
> - Alfred Perlstein
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>