From owner-freebsd-net@FreeBSD.ORG  Sun May 12 08:00:17 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 2B5B6460
 for <freebsd-net@freebsd.org>; Sun, 12 May 2013 08:00:17 +0000 (UTC)
 (envelope-from egrosbein@rdtc.ru)
Received: from eg.sd.rdtc.ru (eg.sd.rdtc.ru [IPv6:2a03:3100:c:13::5])
 by mx1.freebsd.org (Postfix) with ESMTP id 73F5CF71
 for <freebsd-net@freebsd.org>; Sun, 12 May 2013 08:00:16 +0000 (UTC)
Received: from eg.sd.rdtc.ru (localhost [127.0.0.1])
 by eg.sd.rdtc.ru (8.14.6/8.14.6) with ESMTP id r4C80Bwj028594;
 Sun, 12 May 2013 15:00:12 +0700 (NOVT)
 (envelope-from egrosbein@rdtc.ru)
Message-ID: <518F4C06.3060500@rdtc.ru>
Date: Sun, 12 May 2013 15:00:06 +0700
From: Eugene Grosbein <egrosbein@rdtc.ru>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130415 Thunderbird/17.0.5
MIME-Version: 1.0
To: Barney Cordoba <barney_cordoba@yahoo.com>
Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3
References: <1368287797.70288.YahooMailClassic@web121603.mail.ne1.yahoo.com>
In-Reply-To: <1368287797.70288.YahooMailClassic@web121603.mail.ne1.yahoo.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-net@freebsd.org,
 =?UTF-8?B?IkNsw6ltZW50IEhlcm1hbm4gKG5vZGVucyk=?= =?UTF-8?B?Ig==?=
 <nodens2099@gmail.com>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 12 May 2013 08:00:17 -0000

On 11.05.2013 22:56, Barney Cordoba wrote:

>> In practice, the problem is easily solved without any change
>> in the igb code.
>> The same problem will occur for other NIC drivers too -
>> if several NICs were combined within one lagg(4). So, driver
>> is not guilty and
>> solution would be same - eliminate bottleneck and you will
>> be fine and capable
>> to spread the load on several CPU cores.
>>
>> Therefore, I don't care of CS theory for this particular
>> case.
> 
> Clearly you don't understand the problem. Your logic is that because
> other drivers are defective also; therefore its not a driver problem?

The problem is not in drivers. Driver author should make no assumption
of manner packets will be processed after NIC because there are pretty many
patterns. NIC driver must get packets from the NIC with minimal overhead,
loading so many CPU cores as it instructed to by system settings
(loader.conf/sysctl.conf/etc.) and never do voluntary packet drops.

> The problem is caused by a multi-threaded driver that haphazardly launches
> tasks and that doesn't manage the case that the rest of the system can't
> handle the load.

This is not driver author who decides how many tasks to launch and
what will happen with packets later. I, as system administrator, perform driver
and system tuning for my particular task and I decide that, using system settings.

> It's no different than a driver that barfs when mbuf
> clusters are exhausted. The answer isn't to increase memory or mbufs,  even
> though that may alleviate the problem. The answer is to fix the driver,
> so that it doesn't crash the system for an event that is wholly predictable.

No crashes observed due to igb(4) and it is easy to make the system predictable
with some tuning.

> igb has 1) too many locks and 2) exasperates the problem by binding to
> cpus, which causes it to not only have to wait for the lock to free, but 
> also for a specific cpu to become free.

I can't say for locks but CPU binding is not a problem.
One always can use cpuset(1) to make bindings suitable for particular task.
For example, this is rcNG script I use for my igb-based BRAS'es
to change default igb bindings:

#!/bin/sh

# PROVIDE: cpuset-igb
# REQUIRE: FILESYSTEMS
# BEFORE:  netif
# KEYWORD: nojail

case "$1" in
*start)
  echo "Binding igb(4) IRQs to CPUs"
  cpus=`sysctl -n kern.smp.cpus`
  vmstat -ai | sed -E '/^irq.*que/!d; s/^irq([0-9]+): igb([0-9]+):que ([0-9]+).*/\1 \2 \3/' |\
  while read irq igb que
  do
    cpuset -l $(( ($igb+$que) % $cpus )) -x $irq
  done
  ;;
esac

There is no rocket science here. Even simplier script may be used to disable CPU bindings
altogether using "cpuset -l all" command.

> So it chugs along happily until 
> it encounters a bottleneck, at which point it quickly blows up the entire
> system in a domino effect. It needs to manage locks more efficiently, and
> also to detect when the backup is unmanageable.

> Ever since FreeBSD 5 the answer has been "it's fixed in 7, or its fixed in
> 9, or it's fixed in 10". There will always be bottlenecks, and no driver
> should blow up the system no matter what intermediate code may present a
> problem. Its the driver's responsibility to behave and to drop packets
> if necessary.

Voluntary packet drops at driver level should not be permitted.
That's not driver's responsibility. I, as system administrator, expect from a driver
to spit packets as fast as possible. I can employ some packet buffering system if I need it
or use ng_car or dummynet or something to deal with bottlenecks. Or not to deal with them for purpose.
NIC driver cannot know my needs.

Eugene Grosbein