From owner-freebsd-stable@FreeBSD.ORG Mon Apr 27 17:21:17 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5CC4A1065688 for ; Mon, 27 Apr 2009 17:21:17 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-qy0-f105.google.com (mail-qy0-f105.google.com [209.85.221.105]) by mx1.freebsd.org (Postfix) with ESMTP id 094248FC29 for ; Mon, 27 Apr 2009 17:21:16 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: by qyk3 with SMTP id 3so92839qyk.3 for ; Mon, 27 Apr 2009 10:21:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=W7LFFjNY4UpnkWLyVe5SvIdioNIpw6P24xkSwrNN3os=; b=ORHIiIzMfOfGsBX2mnrYtxJ648gvufgknRk4OG1zHlFnKPY7UcnbJfyBPNlxfEFdR0 iqVSFqgmd5WS99DlzDMDMQz8idXcvte38OK5JHxIkUdDs7+t27aE/SmRg5MHCQ3up3ji TS8Mbr9/+vLaCgckLWZXrDZvYOOHa8hQXRyIQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=Rhy5yraNoAqBP+CJS/fdhlSdXLJ1Th5OX2WDJ8I24rZSk15muKVeA9KR1FlR/t9vo+ HxqWYxVhln9KYCbxpadKgj+kG76AHWV/umwnP1ZZ1Xqla0mOcEeqfnq4wo5AWXtzJTLY jZ9qJpEk0nFUCZfKngmYj300iBwu0f8Tz7lcA= MIME-Version: 1.0 Received: by 10.224.45.203 with SMTP id g11mr6224167qaf.16.1240851063675; Mon, 27 Apr 2009 09:51:03 -0700 (PDT) In-Reply-To: <20090426125008.GK1550@core.byshenk.net> References: <20090426125008.GK1550@core.byshenk.net> Date: Mon, 27 Apr 2009 09:51:03 -0700 Message-ID: <2a41acea0904270951i20a7d65fja677e3e7865802b@mail.gmail.com> From: Jack Vogel To: Greg Byshenk Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-stable@freebsd.org Subject: Re: em0 watchdog timeout (and 3ware problems) 7-stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Apr 2009 17:21:17 -0000 Greg, I have another report of this problem, and I have a patch for you to try out, will be sending it out a bit later today. Jack On Sun, Apr 26, 2009 at 5:50 AM, Greg Byshenk wrote: > I have one machine that is seeing watchdog timeouts on em0, running > 7-STABLE > amd64 as of 2009.04.19, and also some other more perverse errors. > > Twice now in the last 48 hours, this machine has become unreachable via the > network, and connecting to the console shows an endless string of > > [...] > em0: watchdog timeout -- resetting > em0: watchdog timeout -- resetting > em0: watchdog timeout -- resetting > > messages. The machine is almost locked up. That is, I can get a login > prompt, but can go no further than typing in a username; after the > username, no password prompt, and nothing further. The only option is > to hard reset the machine or to drop to debugger and reboot. > > Now the "perverse" part. After restarting, the system partition is no > more. > > Background detail: the machine is a fileserver, with a 3Ware 9650SE-16ML > SATA controller, connected to 16 1TB SATA drives, this configured as > a 14-drive RAID10 array (+ 2 hot spares), with a 50GB system partition > and 6.5TB data partition. The system partition is configured as da1, > with one slice and more or less standard partitions for / /var /tmp, etc. > (the data partition of the array is sliced with gpt). > > The issue here is that, upon restart, all parition information on da0 > seems to have disappeared, and restarting results in a "no operating > system found" message, and a failure to boot (obviously). > > But all of the data is still present. If I boot into rescue mode, > recreate da0s1, mark it bootable, and restore the bsdlabel, then > everything works again. I can restart the machine, and it comes back > up normally (it requires an fsck of everything on da0, but after that > everything is back to normal). > > I don't know if this is two unrelated problems, or one problem with > two symptoms, or something else. I think that I can safely say that > it is not a problem with the 3Ware controller itself, as I replaced > the controller with a spare (identical model), and the problem > recurred. Additionally, I have an almost-identical configuration on > four other machines, none of which are experiencing any problems. > One thing that is different is that the other machines use > Intel PRO/1000 PF (pci-e) NICs. > > Is there some known problem with the Intel 2572 fibre NIC? Or some > potential interaction of it with the 3ware RAID controller? > > For the moment, I've set hw.pci.enable_msi=0 (as discussed in the > threads on 7.2/bge), and am building a new kernel/world from sources > csup'd one hour ago, but I'd really like to hear any ideas about this > -- particularly the wiping of the label. > > Some information about the system: > > > # /dev/da0s1: > 8 partitions: > # size offset fstype [fsize bsize bps/cpg] > a: 2097152 0 4.2BSD 0 0 0 > b: 8388608 2097152 swap > c: 104856192 0 unused 0 0 # "raw" part, don't > edit > d: 8388608 10485760 4.2BSD 0 0 0 > e: 2097152 18874368 4.2BSD 0 0 0 > f: 41943040 20971520 4.2BSD 0 0 0 > g: 41941632 62914560 4.2BSD 0 0 0 > > > em0@pci0:4:1:0: class=0x020000 card=0x10038086 chip=0x10018086 rev=0x02 > hdr=0x00 > vendor = 'Intel Corporation'thernet Controller (Fiber)' > device = '2572 10/100/1000 Ethernet Controller (Fiber)' > class = networktory, range 32, base 0xda000000, size 131072, > enabled > subclass = ethernetory, range 32, base 0xda000000, size 131072, > enabled > bar [10] = type Memory, range 32, base 0xda000000, size 131072, > enabled > bar [14] = type Memory, range 32, base 0xda020000, size 65536, > enabled0x00 > > twa0@pci0:9:0:0: class=0x010400 card=0x100413c1 chip=0x100413c1 > rev=0x01 hdr=0x00 > device = '9650SE Series PCI-Express SATA2 Raid Controller' > class = mass storage > subclass = RAID > bar [10] = type Prefetchable Memory, range 64, base 0xd8000000, size > 33554432, enabled > bar [18] = type Memory, range 64, base 0xda300000, size 4096, enabled > bar [20] = type I/O Port, range 32, base 0x3000, size 256, enabled > cap 01[40] = powerspec 2 supports D0 D1 D2 D3 current D0 > cap 05[50] = MSI supports 32 messages, 64 bit > cap 10[70] = PCI-Express 1 legacy endpoint > > -- > greg byshenk - gbyshenk@byshenk.net - Leiden, NL > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >