From owner-freebsd-questions@FreeBSD.ORG Thu Oct 7 21:35:28 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9726F16A4CE for ; Thu, 7 Oct 2004 21:35:28 +0000 (GMT) Received: from imo-d21.mx.aol.com (imo-d21.mx.aol.com [205.188.144.207]) by mx1.FreeBSD.org (Postfix) with ESMTP id 14E6D43D3F for ; Thu, 7 Oct 2004 21:35:28 +0000 (GMT) (envelope-from TM4525@aol.com) Received: from TM4525@aol.com by imo-d21.mx.aol.com (mail_out_v37_r3.7.) id l.15b.40fefe13 (16633); Thu, 7 Oct 2004 17:35:18 -0400 (EDT) From: TM4525@aol.com Message-ID: <15b.40fefe13.2e971096@aol.com> Date: Thu, 7 Oct 2004 17:35:18 EDT To: drosih@rpi.edu MIME-Version: 1.0 X-Mailer: 9.0 for Windows sub 5112 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.1 cc: questions@freebsd.org Subject: Re: What version of FBSD does Yahoo run? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Oct 2004 21:35:28 -0000 In a message dated 10/7/04 4:06:34 PM Eastern Daylight Time, drosih@rpi.edu writes: Here's one benchmark, showing UDP packet/second generation rate from userland on a dual xeon machine under various target loads: Desired Optimal 5.x-UP 5.x-SMP 4.x-UP 4.x-SMP 50000 50000 50000 50000 50000 50000 75000 75000 75001 75001 75001 75001 100000 100000 100000 100000 100000 100000 125000 125000 125000 125000 125000 125000 150000 150000 150015 150014 150015 150015 175000 175000 175008 175008 175008 169097 200000 200000 200000 179621 181445 169451 225000 225000 225022 179729 181367 169831 250000 250000 242742 179979 181138 169212 275000 275000 242102 180171 181134 169283 300000 300000 242213 179157 181098 169355 That does show results for both single-processor (5.x-UP 4.x-UP) and multi- processor (5.x-SMP, 4.x-SMP) benchmarks. It may be that he ignored the table as soon as he read "dual Xeon". -------------------------------------------- I haven't seen this before. If I did, I would immediately ask: - What is the control here? What does your "benchmark" test? - Is this on a gigabit link? What are the packet sizes? Was network availability a factor in limiting the test results? - What does "target load" mean? Does it mean don't try to send more than that? If so, what does it show if you reach it? If you don't measure the utilization that it takes to saturate your "target" I don't see the point of having it. - It seems that the only thing you could learn from this test would be what is the maximum pps you could achieve unidirectionally out of a system. Why is that useful, since its hardly ever the requirement unless you're building a traffic generator? - a relatively slow machine (a 1.7Ghz celeron with a 32-bit/33mhz fxp NIC running 4.9) pushes over 250Kpps, so why is your machine, with seemingly superior hardware, so slow? - the test seems backwards. What you are doing in this test is not something that any device does. If you want to measure user-space performance, it has to include receive and transmit response, not just transmit. Perhaps it indirectly shows process-switching performance, but doesn't tell you very much about network performance, since transmit is much more trivial than receive in terms of processing requirements. When you transmit you know exactly what you have, when you receive you have to do a lot of checking and testing to see what needs to be done. When I test network performance, I want to isoloate kernel performance if possible. If you're evaluating the system for use as a network device (such as a router, a bridge, a firewall, etc), you have to eliminate userland from the formula. The interaction between user space and the kernel is a key factor in your "benchmark" that is absent in a pure network device, so its not useful in testing pure stack performance. Also, there is a significant problem with "maximum packets/second" tests. As you reach high levels of saturation, you often get abnormal processing requirements that skew the results. For example as you get higher and higher bus saturations the processing requirements change, as I/Os take longer waiting for access to the bus, transmit queues may fill, etc. Testing under such unusual conditions may inlcude abnormal recovery code to handle such saturations that would never occur with a machine under "normal" loads. A better way to test is measuring utilization under realistically normal conditions. Machines can get very inefficient if their recovery code is poor, but it may not matter since no-one realistically runs a machine at 98% utilization. Assuming that your benchmark does test something, Your "results" seem to show that a uniprocessor machine is substantially more efficient than an SMP box. It also seems that the gap has widened between UP and SMP performance in 5.x. Wasn't one of the goals of 5.x to substantially improve SMP performance? This seems to show the opposite. TM