From owner-freebsd-smp@FreeBSD.ORG Thu Nov 13 11:55:02 2008 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A00051065674 for ; Thu, 13 Nov 2008 11:55:02 +0000 (UTC) (envelope-from archimedes.gaviola@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.182]) by mx1.freebsd.org (Postfix) with ESMTP id 6C32F8FC0C for ; Thu, 13 Nov 2008 11:55:02 +0000 (UTC) (envelope-from archimedes.gaviola@gmail.com) Received: by wa-out-1112.google.com with SMTP id m34so449748wag.27 for ; Thu, 13 Nov 2008 03:55:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=fi+ChQSbCw/rFAAVJRb/qwFHbduJDiGQ1nO0cDY4qaw=; b=i2Idkj9Ya7STjLHtWnXqpYhWzDASHk1TITgiTCyFg1KyZa3tT9WBBE5zkVjoHWC1Ni evtpTvY035HPLTnfKagWLVMiyuKCGnzyK42qaTVnIjflfI5yzxKJyLJOGCpCvmlckJ7i IRluOAdcWMmbRX1FPdaLI9C67KfCtifvBu5dE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=UUYZn9g2H3+cfIM8zOe7urtVncvnCmFu/cDkJOH5yKUrqt886+Dm15rjFSMYnue7ya LFiTv7WRpTYtxOSlTssCiiW9+mO3PdJ+1y8a9YV+Yggn/RzZpaNTS0SocBMXpPVmBFa1 ZeGcmglXaUxoQrXIFMzme4630RiArG4h1d838= Received: by 10.114.200.2 with SMTP id x2mr6790095waf.83.1226577301822; Thu, 13 Nov 2008 03:55:01 -0800 (PST) Received: by 10.115.76.12 with HTTP; Thu, 13 Nov 2008 03:55:01 -0800 (PST) Message-ID: <42e3d810811130355x3857bceap447e134b18eee04b@mail.gmail.com> Date: Thu, 13 Nov 2008 19:55:01 +0800 From: "Archimedes Gaviola" To: "John Baldwin" In-Reply-To: <200811111216.37462.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811101733.04547.jhb@freebsd.org> <42e3d810811102032w7850a1c0t386d80ce747f37d3@mail.gmail.com> <200811111216.37462.jhb@freebsd.org> Cc: freebsd-smp@freebsd.org Subject: Re: CPU affinity with ULE scheduler X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Nov 2008 11:55:02 -0000 On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin wrote: > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote: >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: >> >> To Whom It May Concerned: >> >> >> >> Can someone explain or share about ULE scheduler (latest version 2 if >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD >> >> scheduler and as what I have observed especially on processing high >> >> network load traffic on multiple CPU cores, only one CPU were being >> >> stressed with network interrupt while the rests are mostly in idle >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom >> >> network interface cards (bce0 and bce1). Below is the snapshot of the >> >> case. >> > >> > Interrupts are routed to a single CPU. Since bce0 and bce1 are both on > the >> > same interrupt (irq 23), the CPU that interrupt is routed to is going to > end >> > up handling all the interrupts for bce0 and bce1. This not something ULE > or >> > 4BSD have any control over. >> > >> > -- >> > John Baldwin >> > >> >> Hi John, >> >> I'm sorry for the wrong snapshot. Here's the right one with my concern. >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: cpu0 >> 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: cpu2 >> 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: cpu3 >> 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: cpu4 >> 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: cpu5 >> 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: cpu1 >> 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: cpu6 >> 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% >> irq23: bce0 bce1 >> 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: cpu7 >> 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% pagezero >> 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd >> 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd >> 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: > clock s >> 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: net >> 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd >> 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd >> >> Actually I was doing a network performance testing on this system with >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a >> tool to generate big amount of traffic around 600Mbps-700Mbps >> traversing the FreeBSD system in bi-direction, meaning both network >> interfaces are receiving traffic. What happened was, the CPU (cpu7) >> that handles the (irq 23) on both interfaces consumed big amount of >> CPU utilization around 65.53% in which it affects other running >> applications and services like sshd and httpd. It's no longer >> accessible when traffic is bombarded. With the current situation of my >> FreeBSD system with only one CPU being stressed, I was thinking of >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought >> my concern has something to do with the distributions of load on >> multiple CPU cores handled by the scheduler especially at the network >> level, processing network load. So, if it is more of interrupt >> handling and not on the scheduler, is there a way we can optimize it? >> Because if it still routed only to one CPU then for me it's still >> inefficient. Who handles interrupt scheduling for bounding CPU in >> order to prevent shared IRQ? Is there any improvements with >> FreeBSD-7.0 with regards to interrupt handling? > > It depends. In all likelihood, the interrupts from bce0 and bce1 are both > hardwired to the same interrupt pin and so they will always share the same > ithread when using the legacy INTx interrupts. However, bce(4) parts do > support MSI, and if you try a newer OS snap (6.3 or later) these devices > should use MSI in which case each NIC would be assigned to a separate CPU. I > would suggest trying 7.0 or a 7.1 release candidate and see if it does > better. > > -- > John Baldwin > Hi John, I try 7.0 release and each network interface were already allocated separately on different CPU. Here, MSI is already working. PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root 1 171 ki31 0K 16K CPU6 6 123:55 100.00% idle: cpu6 15 root 1 171 ki31 0K 16K CPU3 3 123:54 100.00% idle: cpu3 14 root 1 171 ki31 0K 16K CPU4 4 123:26 100.00% idle: cpu4 16 root 1 171 ki31 0K 16K CPU2 2 123:15 100.00% idle: cpu2 17 root 1 171 ki31 0K 16K CPU1 1 123:15 100.00% idle: cpu1 37 root 1 -68 - 0K 16K CPU7 7 9:09 100.00% irq256: bce0 13 root 1 171 ki31 0K 16K CPU5 5 123:49 99.07% idle: cpu5 40 root 1 -68 - 0K 16K WAIT 0 4:40 51.17% irq257: bce1 18 root 1 171 ki31 0K 16K RUN 0 117:48 49.37% idle: cpu0 11 root 1 171 ki31 0K 16K RUN 7 115:25 0.00% idle: cpu7 19 root 1 -32 - 0K 16K WAIT 0 0:39 0.00% swi4: clock s 14367 root 1 44 0 5176K 3104K select 2 0:01 0.00% dhcpd 22 root 1 -16 - 0K 16K - 3 0:01 0.00% yarrow 25 root 1 -24 - 0K 16K WAIT 0 0:00 0.00% swi6: Giant t 11658 root 1 44 0 32936K 4540K select 1 0:00 0.00% sshd 14224 root 1 44 0 32936K 4540K select 5 0:00 0.00% sshd 41 root 1 -60 - 0K 16K WAIT 0 0:00 0.00% irq1: atkbd0 4 root 1 -8 - 0K 16K - 2 0:00 0.00% g_down The bce0 interface interrupt (irq256) gets stressed out which already have 100% of CPU7 while CPU0 is around 51.17%. Any more recommendations? Is there anything we can do about optimization with MSI? Thanks, Archimedes