From owner-freebsd-net@FreeBSD.ORG Wed Mar 30 20:30:13 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7CDDC1065675; Wed, 30 Mar 2011 20:30:13 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id D17678FC14; Wed, 30 Mar 2011 20:30:12 +0000 (UTC) Received: by pzk27 with SMTP id 27so323423pzk.13 for ; Wed, 30 Mar 2011 13:30:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:date:to:cc:subject:message-id:reply-to :references:mime-version:content-type:content-disposition :in-reply-to:user-agent; bh=5rpsYhL8OkXlYz/uv7PjipIs0rer0gwSUdcPeEhcaE4=; b=C89uFMt2FFOzjW2Axsk5pmwHCc/SfxIk6OMnK34JX5pLm4Qk2Im8kqen6xNbNkIDBx N18g1EMd6l5vMoC5uozQBz+UwLFPaJ+AI4UpiSu0BJ12ictYiQQiwzpDlJUWmO4TXU7M rww3qvGfQUYjAdoWrdh4BHMzri+fN3jvPj8Os= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=AiQ9yR33nNLNPDy7KREzcpMBCOn8SDOvK0cEaBeUxpYKDUhm1/V584JVQxVDN+uRIk rcd8e6QfD8nWl/ufZwDENKpsRkZs6aVZ/AtdM2U0tuhU8rlIyM9MhL5JVX3455mx6pY+ wVYyxoWIHR4C65UlPc1Eg/s//Vm9y0nc3UGiI= Received: by 10.142.117.5 with SMTP id p5mr1296390wfc.246.1301517012335; Wed, 30 Mar 2011 13:30:12 -0700 (PDT) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id 25sm505518wfb.22.2011.03.30.13.30.09 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 30 Mar 2011 13:30:11 -0700 (PDT) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Wed, 30 Mar 2011 13:28:58 -0700 From: YongHyeon PYUN Date: Wed, 30 Mar 2011 13:28:58 -0700 To: Yamagi Burmeister Message-ID: <20110330202858.GC8601@michelle.cdnetworks.com> References: <20110330173145.GB8601@michelle.cdnetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@freebsd.org, yongari@freebsd.org Subject: Re: Kernel memory corruption(?) with age(4) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Mar 2011 20:30:13 -0000 On Wed, Mar 30, 2011 at 09:50:12PM +0200, Yamagi Burmeister wrote: > On Wed, 30 Mar 2011, YongHyeon PYUN wrote: > > >On Wed, Mar 30, 2011 at 04:22:23PM +0200, Yamagi Burmeister wrote: > > > >>All for boxes are unstable if the Attansic NIC is in use, no one of them > >>survived more than 60 minutes of ~20mb/s network traffic. I managed to > >>get some coredumps and extracted the backtraces. Since everytime one of > >>the boxes paniced I got different panic message and a different backtrace > >>with a different subsystem involved I suspected broken hardware. I > >>plugged a em(4) NIC into the PCI slot and wasn't able to reproduce the > >>problem, in fact the boxes run rock solid for several days. Next I set > >>up a Windows 7, installed the Attansic vendor driver and did another > >>run. All went smooth, no crash for nearly 24 hours. > >> > >>My guess is kernel memory corruption by age(4), which would explain all > >>the different backtraces and the different panic messages. This problem > >>is reproducible in at least FreeBSD 7.4 and 8.2 and with TSO4 enabled > >>and disabled. I'm willing to debug this, but I really don't know how. So > >>any help or a pointer into the right direction would be appreciated. > >> > > > >AFAIK this is the first report for possible memory corruption > >triggered by age(4). I'm still not sure whether it's caused by > >age(4) but you can disable RX checksum offloading and see whether > >that makes any difference. > >Since I have no longer access to the hardware it would be even > >better if you can tell me which traffic pattern triggered the > >issue. > > Okay, I did a test run with RX checksum, TX checksum and both disabled. > In all three cases the crash occurs within about 20 minutes. I'm either > not sure that age(4) is the problem but it has definedly something to do > with the problem, since with another nic driver the same scenario is > rock solid... > OK. > The workload: It's a NFS3 server (FreeBSDs non-experimental > implementation), serving and receiving file with about 250 to 500 > megabytes at about 20mb/s. The clients are FreeBSD 7 and 8 systems and > are mounting the shares via TCP. The connection is 1000mbit/s via a > "dumb" gigabit switch. > That's too broad to narrow down the issue. :-( I'm not sure but your box seem to have more than 4GB memory. Could you limit the available memory to 3GB via loader.conf and test it again?