From owner-freebsd-stable@FreeBSD.ORG Tue Mar 30 10:18:01 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6CB07106566C for ; Tue, 30 Mar 2010 10:18:01 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id E38298FC1D for ; Tue, 30 Mar 2010 10:18:00 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id 9DB0024750A; Tue, 30 Mar 2010 12:17:55 +0200 (CEST) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 8.5113] X-CRM114-CacheID: sfid-20100330_12175_A3BE12FB X-CRM114-Status: Good ( pR: 8.5113 ) Message-ID: <4BB1CFD1.9040602@fsn.hu> Date: Tue, 30 Mar 2010 12:17:53 +0200 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.23) Gecko/20090817 Thunderbird/2.0.0.23 Mnenhy/0.7.6.0 MIME-Version: 1.0 To: pyunyh@gmail.com References: <4BAB718C.3090001@fsn.hu> <886B21E1787F0003B89E34B6@[192.168.1.44]> <4BB087B7.3030602@fsn.hu> <20100329183848.GE1473@michelle.cdnetworks.com> <4BB0FDC6.7050105@fsn.hu> <20100329194131.GG1473@michelle.cdnetworks.com> In-Reply-To: <20100329194131.GG1473@michelle.cdnetworks.com> X-Stationery: 0.4.10 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.1 X-Spambayes-Classification: ham; 0.00 X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Tue Mar 30 12:17:55 2010 X-DSPAM-Confidence: 0.9936 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 4bb1cfd3143412856173044 X-DSPAM-Factors: 27, X-Bogosity*Ham+tests=bogofilter, 0.00231, X-Bogosity*Ham, 0.00231, X-Spambayes-Classification*ham+0.00, 0.00262, X-Spambayes-Classification*0.00, 0.00262, X-CRM114-Status*Good+(, 0.00312, X-CRM114-Status*Good, 0.00312, X-Bogosity*spamicity=0.000000+version=1.2.1, 0.00361, X-Bogosity*tests=bogofilter+spamicity=0.000000, 0.00361, X-Bogosity*spamicity=0.000000, 0.00361, Url*freebsd, 0.00507, wrote, 0.00574, wrote, 0.00574, >+>, 0.00589, >+>, 0.00589, wrote+>, 0.00661, wrote+>, 0.00661, X-Spambayes-Classification*ham, 0.00813, Nagy+, Michael Loftis Subject: Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Mar 2010 10:18:01 -0000 Pyun YongHyeon wrote: > On Mon, Mar 29, 2010 at 09:21:42PM +0200, Attila Nagy wrote: > >> Pyun YongHyeon wrote: >> >>> On Mon, Mar 29, 2010 at 12:57:59PM +0200, Attila Nagy wrote: >>> >>> >>>> Hi, >>>> >>>> Michael Loftis wrote: >>>> >>>> >>>>> --On Thursday, March 25, 2010 3:22 PM +0100 Attila Nagy >>>>> wrote: >>>>> >>>>> <...> >>>>> >>>>> >>>>>> Both unbound and python accepts DNS requests, and it seems when 25% >>>>>> interrupt happens, only unbound is in *udp state, where it is 50%, both >>>>>> programs are in that state. >>>>>> >>>>>> >>>>> Try turning of hardware TSO/checksum offload if it's availble on your >>>>> chipset? ifconfig -rxcsum -txcsum -tso -- I'm only using >>>>> nfe chips right now, but w/ the TSO/CSUM on they lock up constantly >>>>> under high load. We're pretty sure it's mostly the nfe driver, or the >>>>> chips themselves, but have never ruled out some generic 8.x hardware >>>>> offload issues. >>>>> >>>>> >>>> Bingo, this solved the problem. The current uptime nears four days. >>>> Previously I couldn't go further than a day. >>>> >>>> The machine gets very light TCP load (and other machines which get work >>>> well), so I guess it's UDP RX or TX checksum related. >>>> >>>> >>>> >>> Hmm, this is unexpected result. Since you're using UDP, TSO is not >>> involved in this issue. Because you disabled RX/TX checksum >>> offloading could you check how many number of 'bad checksum' and >>> and 'no checksum' you have from netstat(1)? >>> To narrow down which side of checksum offloading causes the issue, >>> would you just disable one side in a time? For instance, disable TX >>> checksum offloading with RX checksum offloading enabled and see how >>> bce(4) works. >>> #ifconfig bce0 -txcsum rxcsum >>> If that shows the same issue, try disabling RX checksum offloading >>> but enabling TX checksum offloading. >>> #ifconfig bce0 txcsum -rxcsum >>> >>> >> It's interesting. During the day, I've disabled only HW checksumming and >> left TSO enabled. It couldn't run more than a few hours. >> I have disabled tso again to see what happens. >> >> BTW, of course there is TCP traffic on that interface (DNS is also >> available on TCP), maybe this causes the problem. >> > > The only guess I can think of at this moment is incorrect use of > bus_dma(9) in TX path. But I'm not sure this is related with the > issue you're seeing. Would you try the experimental patch at the > following URL? > http://people.freebsd.org/~yongari/bce/bce.20100305.diff > Please make sure to back up your old bce(4) driver before applying > the patch. I didn't see any abnormal things in testing but it > wasn't much stressed. > With the default settings (rx, tx csum, tso) it froze in about an hour: CPU: 0.0% user, 0.0% nice, 0.0% system, 25.0% interrupt, 75.0% idle 714 bind 4 102 0 1200M 1182M *lle 3 17:24 0.00% unbound