From owner-freebsd-stable@FreeBSD.ORG  Sun Sep 10 16:19:12 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: freebsd-stable@freebsd.org
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 95DBE16A412
	for <freebsd-stable@freebsd.org>; Sun, 10 Sep 2006 16:19:12 +0000 (UTC)
	(envelope-from sam@errno.com)
Received: from ebb.errno.com (ebb.errno.com [69.12.149.25])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 956D343D55
	for <freebsd-stable@freebsd.org>; Sun, 10 Sep 2006 16:19:11 +0000 (GMT)
	(envelope-from sam@errno.com)
Received: from [10.0.0.2] ([10.0.0.2]) (authenticated bits=0)
	by ebb.errno.com (8.13.6/8.12.6) with ESMTP id k8AGJA7K093014
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 10 Sep 2006 09:19:11 -0700 (PDT) (envelope-from sam@errno.com)
Message-ID: <45043AFA.4080001@errno.com>
Date: Sun, 10 Sep 2006 09:19:06 -0700
From: Sam Leffler <sam@errno.com>
Organization: Errno Consulting
User-Agent: Thunderbird 1.5.0.5 (Macintosh/20060719)
MIME-Version: 1.0
To: dandee@volny.cz
References: <000001c6d340$004fdf40$6508280a@tocnet28.jspoj.czf>
In-Reply-To: <000001c6d340$004fdf40$6508280a@tocnet28.jspoj.czf>
X-Enigmail-Version: 0.94.0.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Cc: freebsd-stable@freebsd.org
Subject: Re: Where is the maximum of hw.ath.txbuf and rxbuf ? (former:
 atheros driver under high load, panics and even more freezes)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 10 Sep 2006 16:19:12 -0000

Daniel Dvořák wrote:
> Hi Sam,
> 
> thank you for your answer. I think it is connected to this problem
somehow, but not fully.
> 
> I increased txbuf and rxbuf twice to 200 and 80, I saw some
> betterment
in less of "no buffers space ...", but latency went up to 2000 ms.
> 
> Now I ended at txbuf=800 and rxbuf=320 on both sides R1 and R2.
> 
> But still, there is the same problem:
> 
> It was tested after the rebooting R2 almost at once.
> 
> --- R1 ping statistics ---
> 10000 packets transmitted, 8752 packets received, 12% packet loss
> round-trip min/avg/max/stddev = 1.324/920.480/6323.454/766.399 ms           up to 6k ms
> 
> R2# athstats -i ath0
> 11309 data frames received
> 11384 data frames transmit
> 12508 long on-chip tx retries
> 769 tx failed 'cuz too many retries
> 24M current transmit rate
> 2 tx management frames
> 6 tx frames discarded prior to association
> 31 tx stopped 'cuz no xmit buffer
> 38 tx frames with no ack marked
> 3 rx failed 'cuz of bad CRC
> 4464 rx failed 'cuz of PHY err
>     4464 OFDM timing
> 24 periodic calibrations
> 27 rssi of last ack
> 27 avg recv rssi
> -96 rx noise floor
> 1 switched default/rx antenna
> Antenna profile:
> [1] tx    10614 rx    11449
> 
> 
> Where is the maximum of txbuf and rxbuf ?

The max is derived from available memory.  The h/w has no upper bounds.

> 
> I would like to test it.
> 
> Thank you for your attention.
> 
> Daniel
>  
> 
>> -----Original Message-----
>> From: Sam Leffler [mailto:sam@errno.com] 
>> Sent: Friday, September 08, 2006 6:30 AM
>> To: dandee@volny.cz
>> Cc: freebsd-stable@freebsd.org
>> Subject: Re: atheros driver under high load, panics and even 
>> more freezes
>>
>> Daniel DvoÃ¸Ã¡k wrote:
>>> Hi Sam and all,
>>>
>>> I am not sure if I understand your answer, but I try it.
>>>
>>> When I use start my test, athstats shows this:
>>>
>>> athstats -i ath0
>>>
>>> 19308912 data frames received
>>> 15723536 data frames transmit
>>> 6536 tx frames with an alternate rate
>>> 2188280 long on-chip tx retries
>>> 62583 tx failed 'cuz too many retries
>>> 348 tx linearized to cluster
>>> 24M current transmit rate
>>> 6 tx management frames
>>> 6 tx frames discarded prior to association
>>> 27129 tx stopped 'cuz no xmit buffer
>>> 23057 tx frames with no ack marked
>>> 1182 rx failed 'cuz of bad CRC
>>> 761604 rx failed 'cuz of PHY err
>>>     761604 OFDM timing
>>> 4829 periodic calibrations
>>> 28 rssi of last ack
>>> 27 avg recv rssi
>>> -96 rx noise floor
>>> 1 switched default/rx antenna
>>> Antenna profile:
>>> [1] tx 15660942 rx 19451935
>>> [2] tx        2 rx        0
>>>
>>> ...
>>>
>>>
>>> I use this ping command from R2:
>>> ping -i .002 -c 10000 -s 1472 opposite side R1
>>>
>>> --- R1 ping statistics ---
>>> 10000 packets transmitted, 10000 packets received, 0% packet loss 
>>> round-trip min/avg/max/stddev = 1.316/1.442/49.391/1.757 ms
>>>
>>> You can see nice average latency about 1,4 ms and no one 
>> packet was lost.
>>> athstats almost wasnÂ´t changed.
>>>
>>> 19309465 data frames received
>>> 15724079 data frames transmit
>>> 6536 tx frames with an alternate rate
>>> 2188281 long on-chip tx retries
>>> 62583 tx failed 'cuz too many retries
>>> 348 tx linearized to cluster
>>> 24M current transmit rate
>>> 6 tx management frames
>>> 6 tx frames discarded prior to association
>>> 27129 tx stopped 'cuz no xmit buffer
>>> 23075 tx frames with no ack marked
>>> 1182 rx failed 'cuz of bad CRC
>>> 761605 rx failed 'cuz of PHY err
>>>     761605 OFDM timing
>>> 4834 periodic calibrations
>>> 29 rssi of last ack
>>> 27 avg recv rssi
>>> -96 rx noise floor
>>> 1 switched default/rx antenna
>>> Antenna profile:
>>> [1] tx 15661485 rx 19452488
>>> [2] tx        2 rx        0
>>>
>>> For compare with flood ping at once:
>>>
>>> --- R1 ping statistics ---
>>> 10000 packets transmitted, 10000 packets received, 0% packet loss 
>>> round-trip min/avg/max/stddev = 1.319/1.516/5.594/0.120 ms
>>>
>>> Almost the same, yes max is even better.
>>>
>>>
>> ----------------------------------------------------------------------
>>> ------
>>> --------------
>>>
>>> If I use interval 1/1000 s to send the echo request, the 
>> situation is 
>>> rapidly changed.
>>> ping -i .001 -c 10000 -s 1472 opposite side R1
>>>
>>> --- R1 ping statistics ---
>>> 10000 packets transmitted, 9681 packets received, 3% packet loss 
>>> round-trip min/avg/max/stddev = 1.319/335.806/564.946/170.691 ms
>>>
>>> R2# ifconfig -v ath0
>>> ath0: 
>> flags=8c43<UP,BROADCAST,RUNNING,OACTIVE,SIMPLEX,MULTICAST> mtu 
>>> 1500
>>> ------ ??????????? OACTIVE FLAG ????????? ----
>>>         inet6 fe80::20b:6bff:fe2a:c78e%ath0 prefixlen 64 scopeid 0x2
>>>         inet 10.XX.YY.ZZ netmask 0xfffffffc broadcast 10.40.64.19
>>>         ether xxxxxxxxxxxxxxxx
>>>         media: IEEE 802.11 Wireless Ethernet OFDM/24Mbps mode 11a 
>>> <flag0,adhoc> (OFDM/24Mbps)
>>>         status: associated
>>>
>>> 19350739 data frames received
>>> 15765446 data frames transmit
>>> 6536 tx frames with an alternate rate
>>> 2194842 long on-chip tx retries
>>> 62590 tx failed 'cuz too many retries
>>> 348 tx linearized to cluster
>>> 24M current transmit rate
>>> 6 tx management frames
>>> 6 tx frames discarded prior to association
>>> 29242 tx stopped 'cuz no xmit buffer
>>> 23155 tx frames with no ack marked
>>> 1182 rx failed 'cuz of bad CRC
>>> 764641 rx failed 'cuz of PHY err
>>>     764641 OFDM timing
>>> 4856 periodic calibrations
>>> 28 rssi of last ack
>>> 27 avg recv rssi
>>> -96 rx noise floor
>>> 1 switched default/rx antenna
>>> Antenna profile:
>>> [1] tx 15702845 rx 19493774
>>> [2] tx        2 rx        0
>>>
>>> I observe flags of ath and when latency is going to high more and 
>>> more, there is a new flag which IÂ´ve never seen before, 
>> OACTIVE FLAG ?
>>> R2# man ifconfig | grep "OACTIVE"
>>>
>>> When ping ends oactive flag disappears.
>>>
>>> When the same ping test is done from linux box to fbsd, 
>> nice latency 
>>> 1,2ms and no "no buffer".
>>>
>>> with -i 0.002 the throughput is about 0,5MB/s in and out of cource
>>>
>>> with -i 0.001 until no buffer is about 0,85MB/s in and out.
>>>
>>> when no buffer and octive appears, the throughput is about 
>> 0,1MB/s or 
>>> 128KB/s if you like or 1Mbit/s.
>>>
>>> I attached the progress of pinging ip address.
>> You ask why you're seeing OACTIVE when you lower the 
>> inter-packet wait time to ping.  This is because you're 
>> flooding the tx queue of the ath driver and using up all the 
>> tx buffers/descriptors.  When ath is handed a frame to send 
>> and it has no resources available it will mark the interface 
>> "active' (OACTIVE) and drop the packet.  You can also see 
>> this in the athstats output ("tx stopped 'cuz no xmit 
>> buffer").  Linux behaves differently because it blocks the 
>> user process when this happens  until such time as there are 
>> resources to do the send.  This behaviour likely also 
>> explains the variability in the ping times; I think the rx 
>> processing may be deferred while the driver deals with the tx flood.
>>
>> You can up the number of tx buffers available with the 
>> ATH_TXBUF config option or by setting the hw.ath.txbuf 
>> tunable from the loader.  The default is 100 buffers which is 
>> usually plenty for sta-mode operation--which is what the 
>> driver is optimized for (it appears linux defaults to 200 tx 
>> buffers which would also explain different behaviour).  
>> Likewise there is a rx-side tunable for the number of rx buffers.
>>
>> 	Sam
>>
> 
> 
>