From owner-freebsd-stable@FreeBSD.ORG Fri Sep 8 04:30:01 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BCFAC16A4DA for ; Fri, 8 Sep 2006 04:30:01 +0000 (UTC) (envelope-from sam@errno.com) Received: from ebb.errno.com (ebb.errno.com [69.12.149.25]) by mx1.FreeBSD.org (Postfix) with ESMTP id 429A543D49 for ; Fri, 8 Sep 2006 04:30:01 +0000 (GMT) (envelope-from sam@errno.com) Received: from [10.0.0.199] ([10.0.0.199]) (authenticated bits=0) by ebb.errno.com (8.13.6/8.12.6) with ESMTP id k884Txnw077812 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 7 Sep 2006 21:30:00 -0700 (PDT) (envelope-from sam@errno.com) Message-ID: <4500F1C8.6000702@errno.com> Date: Thu, 07 Sep 2006 21:30:00 -0700 From: Sam Leffler Organization: Errno Consulting User-Agent: Thunderbird 1.5.0.5 (Macintosh/20060719) MIME-Version: 1.0 To: dandee@volny.cz References: <000401c6d166$03d1fa80$6508280a@tocnet28.jspoj.czf> In-Reply-To: <000401c6d166$03d1fa80$6508280a@tocnet28.jspoj.czf> X-Enigmail-Version: 0.94.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-stable@freebsd.org Subject: Re: atheros driver under high load, panics and even more freezes X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Sep 2006 04:30:01 -0000 Daniel Dvoøák wrote: > Hi Sam and all, > > I am not sure if I understand your answer, but I try it. > > When I use start my test, athstats shows this: > > athstats -i ath0 > > 19308912 data frames received > 15723536 data frames transmit > 6536 tx frames with an alternate rate > 2188280 long on-chip tx retries > 62583 tx failed 'cuz too many retries > 348 tx linearized to cluster > 24M current transmit rate > 6 tx management frames > 6 tx frames discarded prior to association > 27129 tx stopped 'cuz no xmit buffer > 23057 tx frames with no ack marked > 1182 rx failed 'cuz of bad CRC > 761604 rx failed 'cuz of PHY err > 761604 OFDM timing > 4829 periodic calibrations > 28 rssi of last ack > 27 avg recv rssi > -96 rx noise floor > 1 switched default/rx antenna > Antenna profile: > [1] tx 15660942 rx 19451935 > [2] tx 2 rx 0 > > ... > > > I use this ping command from R2: > ping -i .002 -c 10000 -s 1472 opposite side R1 > > --- R1 ping statistics --- > 10000 packets transmitted, 10000 packets received, 0% packet loss > round-trip min/avg/max/stddev = 1.316/1.442/49.391/1.757 ms > > You can see nice average latency about 1,4 ms and no one packet was lost. > > athstats almost wasn´t changed. > > 19309465 data frames received > 15724079 data frames transmit > 6536 tx frames with an alternate rate > 2188281 long on-chip tx retries > 62583 tx failed 'cuz too many retries > 348 tx linearized to cluster > 24M current transmit rate > 6 tx management frames > 6 tx frames discarded prior to association > 27129 tx stopped 'cuz no xmit buffer > 23075 tx frames with no ack marked > 1182 rx failed 'cuz of bad CRC > 761605 rx failed 'cuz of PHY err > 761605 OFDM timing > 4834 periodic calibrations > 29 rssi of last ack > 27 avg recv rssi > -96 rx noise floor > 1 switched default/rx antenna > Antenna profile: > [1] tx 15661485 rx 19452488 > [2] tx 2 rx 0 > > For compare with flood ping at once: > > --- R1 ping statistics --- > 10000 packets transmitted, 10000 packets received, 0% packet loss > round-trip min/avg/max/stddev = 1.319/1.516/5.594/0.120 ms > > Almost the same, yes max is even better. > > ---------------------------------------------------------------------------- > -------------- > > If I use interval 1/1000 s to send the echo request, the situation is > rapidly changed. > ping -i .001 -c 10000 -s 1472 opposite side R1 > > --- R1 ping statistics --- > 10000 packets transmitted, 9681 packets received, 3% packet loss > round-trip min/avg/max/stddev = 1.319/335.806/564.946/170.691 ms > > R2# ifconfig -v ath0 > ath0: flags=8c43 mtu 1500 > ------ ??????????? OACTIVE FLAG ????????? ---- > inet6 fe80::20b:6bff:fe2a:c78e%ath0 prefixlen 64 scopeid 0x2 > inet 10.XX.YY.ZZ netmask 0xfffffffc broadcast 10.40.64.19 > ether xxxxxxxxxxxxxxxx > media: IEEE 802.11 Wireless Ethernet OFDM/24Mbps mode 11a > (OFDM/24Mbps) > status: associated > > 19350739 data frames received > 15765446 data frames transmit > 6536 tx frames with an alternate rate > 2194842 long on-chip tx retries > 62590 tx failed 'cuz too many retries > 348 tx linearized to cluster > 24M current transmit rate > 6 tx management frames > 6 tx frames discarded prior to association > 29242 tx stopped 'cuz no xmit buffer > 23155 tx frames with no ack marked > 1182 rx failed 'cuz of bad CRC > 764641 rx failed 'cuz of PHY err > 764641 OFDM timing > 4856 periodic calibrations > 28 rssi of last ack > 27 avg recv rssi > -96 rx noise floor > 1 switched default/rx antenna > Antenna profile: > [1] tx 15702845 rx 19493774 > [2] tx 2 rx 0 > > I observe flags of ath and when latency is going to high more and more, > there is a new flag which I´ve never seen before, OACTIVE FLAG ? > > R2# man ifconfig | grep "OACTIVE" > > When ping ends oactive flag disappears. > > When the same ping test is done from linux box to fbsd, nice latency 1,2ms > and no "no buffer". > > with -i 0.002 the throughput is about 0,5MB/s in and out of cource > > with -i 0.001 until no buffer is about 0,85MB/s in and out. > > when no buffer and octive appears, the throughput is about 0,1MB/s or > 128KB/s if you like or 1Mbit/s. > > I attached the progress of pinging ip address. You ask why you're seeing OACTIVE when you lower the inter-packet wait time to ping. This is because you're flooding the tx queue of the ath driver and using up all the tx buffers/descriptors. When ath is handed a frame to send and it has no resources available it will mark the interface "active' (OACTIVE) and drop the packet. You can also see this in the athstats output ("tx stopped 'cuz no xmit buffer"). Linux behaves differently because it blocks the user process when this happens until such time as there are resources to do the send. This behaviour likely also explains the variability in the ping times; I think the rx processing may be deferred while the driver deals with the tx flood. You can up the number of tx buffers available with the ATH_TXBUF config option or by setting the hw.ath.txbuf tunable from the loader. The default is 100 buffers which is usually plenty for sta-mode operation--which is what the driver is optimized for (it appears linux defaults to 200 tx buffers which would also explain different behaviour). Likewise there is a rx-side tunable for the number of rx buffers. Sam