From owner-freebsd-stable@FreeBSD.ORG Mon Mar 29 19:42:44 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0E571065676 for ; Mon, 29 Mar 2010 19:42:44 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-bw0-f216.google.com (mail-bw0-f216.google.com [209.85.218.216]) by mx1.freebsd.org (Postfix) with ESMTP id 463A98FC1A for ; Mon, 29 Mar 2010 19:42:43 +0000 (UTC) Received: by bwz8 with SMTP id 8so5077294bwz.3 for ; Mon, 29 Mar 2010 12:42:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:from:date:to:cc :subject:message-id:reply-to:references:mime-version:content-type :content-disposition:in-reply-to:user-agent; bh=33WQAg6oteyuRFtG5hz7Tlf7Deh3AQjiESCnJ3PyvZI=; b=GnimeQcuv0xfYKz27Hvn5005+zTYnjBNGXvIC2OE4odF/iBzMl4jk6mX+hctlFdnmY s5tHtBsieMl6oRVTVIq3KNxxpUj1OzvpfR8Uwtw7duAGjBUuaOi4yRmk3fpfquN0nvoI IZLmWoGYmdtTSUKFz/+qRUS5ls8WpDQkq5uHE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=iwK9QdORm/+tM+dMLRnqLBw9cWKap1ZDOYTUTnofgdTW7oW/aYfXb2N+wG16qeetkw dybCIcL3GefGuX0puJrIY4nFzDaY6FFQ3KZ1GCma7GKWxHRB0gpN7eLH16hG7RsRtOUS PRBgrUKkZyNJcVUEL5Saj8DWiW0CMPlap9q/A= Received: by 10.204.24.134 with SMTP id v6mr5125800bkb.204.1269891753993; Mon, 29 Mar 2010 12:42:33 -0700 (PDT) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id 14sm2345668bwz.2.2010.03.29.12.42.26 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 29 Mar 2010 12:42:32 -0700 (PDT) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Mon, 29 Mar 2010 12:41:31 -0700 From: Pyun YongHyeon Date: Mon, 29 Mar 2010 12:41:31 -0700 To: Attila Nagy Message-ID: <20100329194131.GG1473@michelle.cdnetworks.com> References: <4BAB718C.3090001@fsn.hu> <886B21E1787F0003B89E34B6@[192.168.1.44]> <4BB087B7.3030602@fsn.hu> <20100329183848.GE1473@michelle.cdnetworks.com> <4BB0FDC6.7050105@fsn.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4BB0FDC6.7050105@fsn.hu> User-Agent: Mutt/1.4.2.3i Cc: Mailing List FreeBSD Stable , Michael Loftis Subject: Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Mar 2010 19:42:44 -0000 On Mon, Mar 29, 2010 at 09:21:42PM +0200, Attila Nagy wrote: > Pyun YongHyeon wrote: > > On Mon, Mar 29, 2010 at 12:57:59PM +0200, Attila Nagy wrote: > > > >> Hi, > >> > >> Michael Loftis wrote: > >> > >>> --On Thursday, March 25, 2010 3:22 PM +0100 Attila Nagy > >>> wrote: > >>> > >>> <...> > >>> > >>>> Both unbound and python accepts DNS requests, and it seems when 25% > >>>> interrupt happens, only unbound is in *udp state, where it is 50%, both > >>>> programs are in that state. > >>>> > >>> Try turning of hardware TSO/checksum offload if it's availble on your > >>> chipset? ifconfig -rxcsum -txcsum -tso -- I'm only using > >>> nfe chips right now, but w/ the TSO/CSUM on they lock up constantly > >>> under high load. We're pretty sure it's mostly the nfe driver, or the > >>> chips themselves, but have never ruled out some generic 8.x hardware > >>> offload issues. > >>> > >> Bingo, this solved the problem. The current uptime nears four days. > >> Previously I couldn't go further than a day. > >> > >> The machine gets very light TCP load (and other machines which get work > >> well), so I guess it's UDP RX or TX checksum related. > >> > >> > > > > Hmm, this is unexpected result. Since you're using UDP, TSO is not > > involved in this issue. Because you disabled RX/TX checksum > > offloading could you check how many number of 'bad checksum' and > > and 'no checksum' you have from netstat(1)? > > To narrow down which side of checksum offloading causes the issue, > > would you just disable one side in a time? For instance, disable TX > > checksum offloading with RX checksum offloading enabled and see how > > bce(4) works. > > #ifconfig bce0 -txcsum rxcsum > > If that shows the same issue, try disabling RX checksum offloading > > but enabling TX checksum offloading. > > #ifconfig bce0 txcsum -rxcsum > > > It's interesting. During the day, I've disabled only HW checksumming and > left TSO enabled. It couldn't run more than a few hours. > I have disabled tso again to see what happens. > > BTW, of course there is TCP traffic on that interface (DNS is also > available on TCP), maybe this causes the problem. The only guess I can think of at this moment is incorrect use of bus_dma(9) in TX path. But I'm not sure this is related with the issue you're seeing. Would you try the experimental patch at the following URL? http://people.freebsd.org/~yongari/bce/bce.20100305.diff Please make sure to back up your old bce(4) driver before applying the patch. I didn't see any abnormal things in testing but it wasn't much stressed.