From owner-freebsd-stable@FreeBSD.ORG Fri Jul 6 20:04:41 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B7F1106564A for ; Fri, 6 Jul 2012 20:04:41 +0000 (UTC) (envelope-from healer@rpi.edu) Received: from smtp5.server.rpi.edu (smtp5.server.rpi.edu [128.113.2.225]) by mx1.freebsd.org (Postfix) with ESMTP id 3D23C8FC0C for ; Fri, 6 Jul 2012 20:04:41 +0000 (UTC) Received: from [129.161.73.15] ([129.161.73.15]) (authenticated bits=0) by smtp5.server.rpi.edu (8.13.1/8.13.1) with ESMTP id q66ItxGc009475 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 6 Jul 2012 14:56:00 -0400 Message-ID: <4FF734CD.9070401@rpi.edu> Date: Fri, 06 Jul 2012 14:56:13 -0400 From: Bob Healey User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Bayes-Prob: 0.0001 (Score 0) X-RPI-SA-Score: 2.60 (**) [Hold at 11.00] RATWARE_GECKO_BUILD, 24412(1.2), 22490(-25) X-CanItPRO-Stream: outgoing X-Canit-Stats-ID: 50851031 - 405ee53b83d8 X-Scanned-By: CanIt (www . roaringpenguin . com) on 128.113.2.225 Subject: Problems with crashing IBM X3630 M3/ZFS X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Jul 2012 20:04:41 -0000 Hello. I've got a quartet of IBM x3630 M3 with one that is frequently hard locking under heavy NFS load. I am running 9.0-RELEASE with all the patches from freebsd-update. My problem machine has 8 16 core clients, each doing IO intensive tasks connected to it via a Procurve and the onboard igb0 interface. Mostly network reads, typically 10MB read per MB written. When the machine locks under load, none of the consoles respond, nor can I reach the machine via ethernet. I can break into DDB via the serial over lan interface, and am running a debug/witness kernel at the moment (I was running GENERIC previously). During the boot sequence, witness tosses me into DDB ~10 times before I get a login prompt. Prior to this machine acting up, it had multiple 802.1q vlans, and ran 9K packets on its private network to the compute clients. A dmesg can be found at http://boyle.che.rpi.edu/~healer/boomer/dmesg /etc/rc.conf can be found at http://boyle.che.rpi.edu/~healer/boomer/rc.conf A listing of installed ports can be found at http://boyle.che.rpi.edu/~healer/boomer/pkg_info The output of psauxwwo wchan against my two crash dumps can be found at http://boyle.che.rpi.edu/~healer/boomer/crash1-psaux-wchan and http://boyle.che.rpi.edu/~healer/boomer/crash2-psaux-wchan I'm not entire convinced this is software, but I've run out of local experts to ask, and can't prove its hardware. -- Bob Healey Systems Administrator Biocomputation and Bioinformatics Constellation and Molecularium healer@rpi.edu (518) 276-4407