From owner-freebsd-stable@FreeBSD.ORG Fri Sep 8 12:12:12 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 47CDD16A4DA for ; Fri, 8 Sep 2006 12:12:12 +0000 (UTC) (envelope-from dandee@hellteam.net) Received: from pipa.vshosting.cz (pipa.vshosting.cz [81.0.201.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 626A543D55 for ; Fri, 8 Sep 2006 12:12:09 +0000 (GMT) (envelope-from dandee@hellteam.net) Received: from localhost (localhost [127.0.0.1]) by pipa.vshosting.cz (Postfix) with ESMTP id 36F684E740; Fri, 8 Sep 2006 14:12:08 +0200 (CEST) Received: from pipa.vshosting.cz ([127.0.0.1]) by localhost (pipa [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 03880-06; Fri, 8 Sep 2006 14:12:07 +0200 (CEST) Received: from gandalf (unknown [81.0.245.205]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by pipa.vshosting.cz (Postfix) with ESMTP id ADA644E73C; Fri, 8 Sep 2006 14:12:07 +0200 (CEST) From: =?utf-8?Q?Daniel_Dvo=C5=99=C3=A1k?= To: "'Sam Leffler'" Date: Fri, 8 Sep 2006 14:12:04 +0200 Message-ID: <000001c6d340$004fdf40$6508280a@tocnet28.jspoj.czf> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 11 Thread-Index: AcbS/3d/IQUNfXPRT4+ZyJ4lsu0cUQAP15Rw In-Reply-To: <4500F1C8.6000702@errno.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2962 X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at profix.cz Cc: freebsd-stable@freebsd.org Subject: Where is the maximum of hw.ath.txbuf and rxbuf ? (former: atheros driver under high load, panics and even more freezes) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dandee@volny.cz List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Sep 2006 12:12:12 -0000 Hi Sam, thank you for your answer. I think it is connected to this problem = somehow, but not fully. I increased txbuf and rxbuf twice to 200 and 80, I saw some betterment = in less of "no buffers space ...", but latency went up to 2000 ms. Now I ended at txbuf=3D800 and rxbuf=3D320 on both sides R1 and R2. But still, there is the same problem: It was tested after the rebooting R2 almost at once. --- R1 ping statistics --- 10000 packets transmitted, 8752 packets received, 12% packet loss round-trip min/avg/max/stddev =3D 1.324/920.480/6323.454/766.399 ms = up to 6k ms R2# athstats -i ath0 11309 data frames received 11384 data frames transmit 12508 long on-chip tx retries 769 tx failed 'cuz too many retries 24M current transmit rate 2 tx management frames 6 tx frames discarded prior to association 31 tx stopped 'cuz no xmit buffer 38 tx frames with no ack marked 3 rx failed 'cuz of bad CRC 4464 rx failed 'cuz of PHY err 4464 OFDM timing 24 periodic calibrations 27 rssi of last ack 27 avg recv rssi -96 rx noise floor 1 switched default/rx antenna Antenna profile: [1] tx 10614 rx 11449 Where is the maximum of txbuf and rxbuf ? I would like to test it. Thank you for your attention. Daniel =20 > -----Original Message----- > From: Sam Leffler [mailto:sam@errno.com]=20 > Sent: Friday, September 08, 2006 6:30 AM > To: dandee@volny.cz > Cc: freebsd-stable@freebsd.org > Subject: Re: atheros driver under high load, panics and even=20 > more freezes >=20 > Daniel Dvo=C3=B8=C3=A1k wrote: > > Hi Sam and all, > >=20 > > I am not sure if I understand your answer, but I try it. > >=20 > > When I use start my test, athstats shows this: > >=20 > > athstats -i ath0 > >=20 > > 19308912 data frames received > > 15723536 data frames transmit > > 6536 tx frames with an alternate rate > > 2188280 long on-chip tx retries > > 62583 tx failed 'cuz too many retries > > 348 tx linearized to cluster > > 24M current transmit rate > > 6 tx management frames > > 6 tx frames discarded prior to association > > 27129 tx stopped 'cuz no xmit buffer > > 23057 tx frames with no ack marked > > 1182 rx failed 'cuz of bad CRC > > 761604 rx failed 'cuz of PHY err > > 761604 OFDM timing > > 4829 periodic calibrations > > 28 rssi of last ack > > 27 avg recv rssi > > -96 rx noise floor > > 1 switched default/rx antenna > > Antenna profile: > > [1] tx 15660942 rx 19451935 > > [2] tx 2 rx 0 > >=20 > > ... > >=20 > >=20 > > I use this ping command from R2: > > ping -i .002 -c 10000 -s 1472 opposite side R1 > >=20 > > --- R1 ping statistics --- > > 10000 packets transmitted, 10000 packets received, 0% packet loss=20 > > round-trip min/avg/max/stddev =3D 1.316/1.442/49.391/1.757 ms > >=20 > > You can see nice average latency about 1,4 ms and no one=20 > packet was lost. > >=20 > > athstats almost wasn=C2=B4t changed. > >=20 > > 19309465 data frames received > > 15724079 data frames transmit > > 6536 tx frames with an alternate rate > > 2188281 long on-chip tx retries > > 62583 tx failed 'cuz too many retries > > 348 tx linearized to cluster > > 24M current transmit rate > > 6 tx management frames > > 6 tx frames discarded prior to association > > 27129 tx stopped 'cuz no xmit buffer > > 23075 tx frames with no ack marked > > 1182 rx failed 'cuz of bad CRC > > 761605 rx failed 'cuz of PHY err > > 761605 OFDM timing > > 4834 periodic calibrations > > 29 rssi of last ack > > 27 avg recv rssi > > -96 rx noise floor > > 1 switched default/rx antenna > > Antenna profile: > > [1] tx 15661485 rx 19452488 > > [2] tx 2 rx 0 > >=20 > > For compare with flood ping at once: > >=20 > > --- R1 ping statistics --- > > 10000 packets transmitted, 10000 packets received, 0% packet loss=20 > > round-trip min/avg/max/stddev =3D 1.319/1.516/5.594/0.120 ms > >=20 > > Almost the same, yes max is even better. > >=20 > >=20 > ---------------------------------------------------------------------- > > ------ > > -------------- > >=20 > > If I use interval 1/1000 s to send the echo request, the=20 > situation is=20 > > rapidly changed. > > ping -i .001 -c 10000 -s 1472 opposite side R1 > >=20 > > --- R1 ping statistics --- > > 10000 packets transmitted, 9681 packets received, 3% packet loss=20 > > round-trip min/avg/max/stddev =3D 1.319/335.806/564.946/170.691 ms > >=20 > > R2# ifconfig -v ath0 > > ath0:=20 > flags=3D8c43 mtu=20 > > 1500 > > ------ ??????????? OACTIVE FLAG ????????? ---- > > inet6 fe80::20b:6bff:fe2a:c78e%ath0 prefixlen 64 scopeid 0x2 > > inet 10.XX.YY.ZZ netmask 0xfffffffc broadcast 10.40.64.19 > > ether xxxxxxxxxxxxxxxx > > media: IEEE 802.11 Wireless Ethernet OFDM/24Mbps mode 11a=20 > > (OFDM/24Mbps) > > status: associated > >=20 > > 19350739 data frames received > > 15765446 data frames transmit > > 6536 tx frames with an alternate rate > > 2194842 long on-chip tx retries > > 62590 tx failed 'cuz too many retries > > 348 tx linearized to cluster > > 24M current transmit rate > > 6 tx management frames > > 6 tx frames discarded prior to association > > 29242 tx stopped 'cuz no xmit buffer > > 23155 tx frames with no ack marked > > 1182 rx failed 'cuz of bad CRC > > 764641 rx failed 'cuz of PHY err > > 764641 OFDM timing > > 4856 periodic calibrations > > 28 rssi of last ack > > 27 avg recv rssi > > -96 rx noise floor > > 1 switched default/rx antenna > > Antenna profile: > > [1] tx 15702845 rx 19493774 > > [2] tx 2 rx 0 > >=20 > > I observe flags of ath and when latency is going to high more and=20 > > more, there is a new flag which I=C2=B4ve never seen before,=20 > OACTIVE FLAG ? > >=20 > > R2# man ifconfig | grep "OACTIVE" > >=20 > > When ping ends oactive flag disappears. > >=20 > > When the same ping test is done from linux box to fbsd,=20 > nice latency=20 > > 1,2ms and no "no buffer". > >=20 > > with -i 0.002 the throughput is about 0,5MB/s in and out of cource > >=20 > > with -i 0.001 until no buffer is about 0,85MB/s in and out. > >=20 > > when no buffer and octive appears, the throughput is about=20 > 0,1MB/s or=20 > > 128KB/s if you like or 1Mbit/s. > >=20 > > I attached the progress of pinging ip address. >=20 > You ask why you're seeing OACTIVE when you lower the=20 > inter-packet wait time to ping. This is because you're=20 > flooding the tx queue of the ath driver and using up all the=20 > tx buffers/descriptors. When ath is handed a frame to send=20 > and it has no resources available it will mark the interface=20 > "active' (OACTIVE) and drop the packet. You can also see=20 > this in the athstats output ("tx stopped 'cuz no xmit=20 > buffer"). Linux behaves differently because it blocks the=20 > user process when this happens until such time as there are=20 > resources to do the send. This behaviour likely also=20 > explains the variability in the ping times; I think the rx=20 > processing may be deferred while the driver deals with the tx flood. >=20 > You can up the number of tx buffers available with the=20 > ATH_TXBUF config option or by setting the hw.ath.txbuf=20 > tunable from the loader. The default is 100 buffers which is=20 > usually plenty for sta-mode operation--which is what the=20 > driver is optimized for (it appears linux defaults to 200 tx=20 > buffers which would also explain different behaviour). =20 > Likewise there is a rx-side tunable for the number of rx buffers. >=20 > Sam >=20