Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 11 May 2013 08:56:37 -0700 (PDT)
From:      Barney Cordoba <barney_cordoba@yahoo.com>
To:        Eugene Grosbein <egrosbein@rdtc.ru>
Cc:        freebsd-net@freebsd.org, =?iso-8859-1?Q?Cl=E9ment_Hermann_=28nodens=29?= <nodens2099@gmail.com>
Subject:   Re: High CPU interrupt load on intel I350T4 with igb on 8.3
Message-ID:  <1368287797.70288.YahooMailClassic@web121603.mail.ne1.yahoo.com>
In-Reply-To: <518CEE95.7020702@rdtc.ru>

next in thread | previous in thread | raw e-mail | index | archive | help


--- On Fri, 5/10/13, Eugene Grosbein <egrosbein@rdtc.ru> wrote:

> From: Eugene Grosbein <egrosbein@rdtc.ru>
> Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3
> To: "Barney Cordoba" <barney_cordoba@yahoo.com>
> Cc: freebsd-net@freebsd.org, ""Cl=E9ment Hermann (nodens)"" <nodens2099@g=
mail.com>
> Date: Friday, May 10, 2013, 8:56 AM
> On 10.05.2013 05:16, Barney Cordoba
> wrote:
>=20
> >>>> Network device driver is not guilty here,
> that's
> >> just pf's
> >>>> contention
> >>>> running in igb's context.
> >>>
> >>> They're both at play. Single threadedness
> aggravates
> >> subsystems that=20
> >>> have too many lock points.
> >>>
> >>> It can also be "solved" with using 1 queue,
> because
> >> then you don't
> >>> have 4 queues going into a single thread.
> >>
> >> Again, the problem is within pf(4)'s global lock,
> not in the
> >> igb(4).
> >>
> >=20
> > Again, you're wrong. It's not the bottleneck's fault;
> it's the fault of=20
> > the multi-threaded code for only working properly when
> there are no
> > bottlenecks.
>=20
> In practice, the problem is easily solved without any change
> in the igb code.
> The same problem will occur for other NIC drivers too -
> if several NICs were combined within one lagg(4). So, driver
> is not guilty and
> solution would be same - eliminate bottleneck and you will
> be fine and capable
> to spread the load on several CPU cores.
>=20
> Therefore, I don't care of CS theory for this particular
> case.

Clearly you don't understand the problem. Your logic is that because
other drivers are defective also; therefore its not a driver problem?

The problem is caused by a multi-threaded driver that haphazardly launches
tasks and that doesn't manage the case that the rest of the system can't
handle the load. It's no different than a driver that barfs when mbuf
clusters are exhausted. The answer isn't to increase memory or mbufs,  even
though that may alleviate the problem. The answer is to fix the driver,
so that it doesn't crash the system for an event that is wholly predictable=
.

igb has 1) too many locks and 2) exasperates the problem by binding to
cpus, which causes it to not only have to wait for the lock to free, but=20
also for a specific cpu to become free. So it chugs along happily until=20
it encounters a bottleneck, at which point it quickly blows up the entire
system in a domino effect. It needs to manage locks more efficiently, and
also to detect when the backup is unmanageable.

Ever since FreeBSD 5 the answer has been "it's fixed in 7, or its fixed in
9, or it's fixed in 10". There will always be bottlenecks, and no driver
should blow up the system no matter what intermediate code may present a
problem. Its the driver's responsibility to behave and to drop packets
if necessary.

BC



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1368287797.70288.YahooMailClassic>