Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 7 Feb 2007 00:34:40 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        MQ <antinvidia@gmail.com>
Cc:        Oleg Bulyzhin <oleg@FreeBSD.org>, net@FreeBSD.org
Subject:   Re: [antinvidia@gmail.com: some questions about bge(4)]
Message-ID:  <20070206234100.S31484@besplex.bde.org>
In-Reply-To: <be0088ce0702051952w927d0bs4a20b7e34be5801e@mail.gmail.com>
References:  <20061206085401.GH32700@cell.sick.ru> <20061212224351.GE91560@lath.rinet.ru> <be0088ce0612131655j5829ca7cg3066b8855904c2e7@mail.gmail.com> <20061214092248.GA21394@lath.rinet.ru> <be0088ce0702051952w927d0bs4a20b7e34be5801e@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 6 Feb 2007, MQ wrote:

> 2006/12/14, Oleg Bulyzhin <oleg@freebsd.org>:
>> 
>> On Thu, Dec 14, 2006 at 12:55:51AM +0000, MQ wrote:
>> > 2006/12/12, Oleg Bulyzhin <oleg@freebsd.org>:
>> > >
>> > >On Wed, Dec 06, 2006 at 11:54:01AM +0300, Gleb Smirnoff wrote:
>> > >>   Forwarding to net@ list and to Oleg, who has made polling
>> > >> support for bge(4).
>> > >>
>> > >> ----- Forwarded message from MQ < antinvidia@gmail.com> -----
>> > >>
>> > >> From: MQ <antinvidia@gmail.com>
>> > >> To: glebius@freebsd.org , davidch@broadcom.com
>> > >> Subject: some questions about bge(4)
>> > >> Date: Sat, 2 Dec 2006 09:32:27 +0000
>> > >> Delivered-To: glebius@freebsd.org
>> > >> DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
>> > >>         s=beta; d=gmail.com;
>> > >>
>> > >h=received:message-id:date:from:to:subject:mime-version:content-type;
>> > >>
>> >
>> >b=ZL3ZZ1zR0mt4LaUN2Rr+jXTPSzQgJYRwLiwKnv95r2UCEids5Wl7oA2BNgicJ2QRG8OalJ7DqY7lM1HBgv0OVTlXOhGQ9aFmKQAuTNi6ueZA817XUacXyViEepnj0oNyYgAnkbaaBO1+nl2Fpb3IxV+MIe575WRlqbglF8kdOek=
>> > >
>> > >>
>> > >> Hi David and Gleb,
>> > >>    I'm using several chips whose driver is bge(4). And now I have
>> some
>> > >> questions about the driver, would you please an answer for me?
>> > >>    My confusion is related with some codes in /sys/dev/mii/brgphy.c.
>> The
>> > >
>> > >> bge(4) uses the callout to drive the watchdog. And the
>> brgphy_service()
>> > >is
>> > >> called once per second. It calls brgphy_mii_phy_auto() every 5
>> seconds
>> > >to
>> > >> autonegotiate the media. Normally, it costs about 0.5ms in the first
>> > >> function brgphy_service(), and about 5ms when autonegotiation is
>> > >proceeded.
>> > >
>> > >brgphy_mii_phy_auto() is called only if there is no link.

I haven't seen more than 50-100 uS for an average brgphy_service(), but
even 500 uS shouldn't be a problem except near the maximum theoretical
packet rate of ~1500 kpps, since the device has buffering for 512-1024
packets (512 @ 1500 kpps = 341 uS = not quite 500 uS).  Half of the
available rx buffering for non-jumbo packets is not being used, so the
worst case is actually 170 uS of buffering @ 1500 kpps.

However, known bugs cause brgphy_service() to often lose a packet.

>> > >>    I haven't done streestest on it, consequently I don't know if this
>> > >delay
>> > >> will cause packets to be dropped. But I've enabled device polling
>> with
>> > >the
>> > >> bge(4) on FreeBSD 6.1-RELEASE. If HZ is set to a high value(e.g.
>> 4000),
>> > >this
>> > >> delay will cause the kern.polling.lost_polls to increase by one or
>> two
>> > >every
>> > >> second. And for about five seconds, the lost poll will increase by at
>> > >least
>> > >> 16 regularly. So I think this behavior has some impact on the systems
>> > >that
>> > >> enables device polling.

Lost polls shouldn't be much more of a problem.  Polls are only lost
if the system can't actually poll at 1/HZ.  Increasing HZ won't make
the worst case any better.  Polls are lost at HZ = 1000 then the worst
case extra delay is >= 1mS the 512-1024 packets of buffering starts
becoming a problem.  It is alread a problem at the maximum packet rate,
since at least 1500 packets of buffering would be needed for polling
at 1000 Hz to have any chacne of keeping up.  The practival limits for
polling at 1000 Hz with bge are now close to 256 kpps for rx (since
half the buffering is not configured) and 512 kpps for tx.

>> Could we get something to make the bge(4) a
>> bit
>> > >more
>> > >> friendly to the device polling? I don't know if autonegotiation is
>> > >really
>> > >> needed to be called so frequently when we are connected to a good
>> > >network
>> > >> environment. Can I modify the interval between two autonegotiations
> ...
>> > >If you have lost poll it does not guarantee packet loss.

It hould never result in packet loss, due to buffering being adequate.

>> > >Packets can be retrieved by next poll or even by idle_poll thread.
>> > >bge_tick() is doing couple of pci register reads (it's polling phy
>> status
>> > >and
>> > >updates some statistic counters), this why it takes some time.

I don't believe in polling, but occasionally test it to check that
interrupt handling doesn't lose to it :-).  I mostly use HZ = 100 and
get up to 640 kpps (tx) where polling at 1000 Hz is limited to 512
kpps.  Polling in idle can work better if the system is actually idle.
Interrupt handling still loses to it for latency -- I have a ping
latency of 50-60 uS with interrupt handling and 40-50 uS with polling
in idle.  However, something (perhaps excessive PCI reads to check the
link checks on every poll) limits packet rates for polling -- large
values of HZ work as expected, and polling in idle should work better
provided the system is actually idle, but in practice polling in idle
with low HZ doesn't work as well for throughput as not polling in
idle with a large HZ.  (I guess this is because the PCI reads take
several cycles per poll and each poll delivers an average of <= 1 packet.
For a ping latency of 40 uS, the few extra uS are not dominant, but
at 100's of kpps they become dominant.)

>> > By the way, bge_tick() takes about 0.5ms to finish its work, this
>> results
>> > the lost poll every second when HZ is higher. Lower HZ will limit the
>> > performance under heavy traffic, and may result packet loss in that
>> > situation. And higher HZ will make a confusing situation that whether we
>> 
>> > have encountered a packet loss? It's really hard to make a decision
>> between
>> > these two kinds of situation.
>> 
>> IMO, high HZ would not give perfomance gain if you have idle polling on
>> (sysctl kern.polling.idle_poll=1 ).
>> So it's better to have HZ=1000 & idle polling, than HZ=10000 and idle
>> polling
>> disabled.

A higher HZ can work better than idle polling.  If the system is rarely
idle, then idle polling is useless.  At any reasonable value of HZ,
latency is very bad unless idle polling is used and the system is often
idle.  Unreasonably large values of HZ (10000 - 100000 are probably
possible) can be used to reduced the latency.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070206234100.S31484>