From owner-freebsd-stable@FreeBSD.ORG Wed Feb 9 09:29:00 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BD861106566B for ; Wed, 9 Feb 2011 09:29:00 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta05.emeryville.ca.mail.comcast.net (qmta05.emeryville.ca.mail.comcast.net [76.96.30.48]) by mx1.freebsd.org (Postfix) with ESMTP id 9A6B68FC08 for ; Wed, 9 Feb 2011 09:29:00 +0000 (UTC) Received: from omta02.emeryville.ca.mail.comcast.net ([76.96.30.19]) by qmta05.emeryville.ca.mail.comcast.net with comcast id 5lTs1g0010QkzPwA5lUzk6; Wed, 09 Feb 2011 09:28:59 +0000 Received: from koitsu.dyndns.org ([98.248.34.134]) by omta02.emeryville.ca.mail.comcast.net with comcast id 5lUy1g00T2tehsa8NlUyGX; Wed, 09 Feb 2011 09:28:59 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 6DD579B422; Wed, 9 Feb 2011 01:28:58 -0800 (PST) Date: Wed, 9 Feb 2011 01:28:58 -0800 From: Jeremy Chadwick To: Greg Bonett Message-ID: <20110209092858.GA35033@icarus.home.lan> References: <1297026074.23922.8.camel@ubuntu> <20110207045501.GA15568@icarus.home.lan> <1297065041.754.12.camel@ubuntu> <20110207085537.GA20545@icarus.home.lan> <1297143276.9417.400.camel@ubuntu> <20110208055239.GA2557@icarus.home.lan> <1297145806.9417.413.camel@ubuntu> <20110208064633.GA3367@icarus.home.lan> <1297235241.4729.35.camel@ubuntu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1297235241.4729.35.camel@ubuntu> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org Subject: Re: 8.1 amd64 lockup (maybe zfs or disk related) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Feb 2011 09:29:00 -0000 On Tue, Feb 08, 2011 at 11:07:21PM -0800, Greg Bonett wrote: > rebuilt my kernel with debug options, but thankfully I think I've > learned how to avoid lockup for the time being. I think I am asking too > much of my 650 watt power supply. I unplugged one hard drive and > disabled another CPU core (now running 4 of 6). I'm sad to lose the > horsepower, but I was able to complete an entire zpool scrub and other > high load tasks without a lockup. Too much to reply to with regards to your disk setup, so I'll summarise my recommendations at this point: 1) Re-enable both CPU cores; I can't see this being responsible for the problem. I do understand the concern over added power draw, but see recommendation (4a) below. 1) Disable the JMicron SATA controller entirely. 2) Disable the ATI IXP700/800 SATA controller entirely. 3a) Purchase a Silicon Image controller (one of the models I referenced in my previous mail). Many places sell them, but lots of online vendors hide or do not disclose what ASIC they're using for the controller. You might have to look at their Driver Downloads section to find out what actual chip is used. 3b) You've stated you're using one of your drives on an eSATA cable. If you are using a SATA-to-eSATA adapter bracket[1][2], please stop immediately and use a native eSATA port instead. Adapter brackets are known to cause all sorts of problems that appear as bizarre/strange failures (xxx_DMAxx errors are quite common in this situation), not to mention with all the internal cabling and external cabling, a lot of the time people exceed the maximum SATA cable length without even realising it -- it's the entire length from the SATA port on your motherboard, to and through the adapter (good luck figuring out how much wire is used there, to the end of the eSATA cable. Native eSATA removes use of the shoddy adapters and also extends the maximum cable length (from 1 metre to 2 metres), plus provides the proper amount of power for eSATA devices (yes this matters!). Wikipedia has details[3]. Silicon Image and others do make chips that offer both internal SATA and an eSATA port on the same controller. Given your number of disks, you might have to invest in multiple controllers. 4a) Purchase a Kill-a-Watt meter and measure exactly how much power your entire PC draws, including on power-on (it will be a lot higher during power-on than during idle/use, as drives spinning up draw lots of amps). I strongly recommend the Kill-a-Watt P4600 model[4] over the P4400 model. Based on the wattage and amperage results, you should be able to determine if you're nearing the maximum draw of your PSU. 4b) However, even if you're way under-draw (say, 400W), the draw may not be the problem but instead the maximum amount of power/amperage/whatever a single physical power cable can provide. I imagine to some degree it depends on the gauge of wire being used; excessive use of Y-splitters to provide more power connectors than the physical cable provides means that you might be drawing too much across the existing gauge of cable that runs to the PSU. I have seen setups where people have 6 hard disks coming off of a single power cable (with Y-splitters and molex-to-SATA power adapters) and have their drives randomly drop off the bus. Please don't do this. A better solution might be to invest in a server-grade chassis, such as one from Supermicro, that offers a hot-swap SATA backplane. The backplane provides all the correct amounts of power to the maximum number of disks that can be connected to it. Here are some cases you can look at that[5][6][7]. Also be aware that if you're already using a hot-swap backplane, most consumer-grade ones are complete junk and have been known to cause strange anomalies; it's always best in those situations to go straight from motherboard-to-drive or card-to-drive. [1]: http://www.cooldrives.com/newesiidebrf.html [2]: http://www.cooldrives.com/essaii3gbexp.html [3]: http://en.wikipedia.org/wiki/Serial_ATA#eSATA [4]: http://www.amazon.com/dp/B000RGF29Q [5]: http://www.supermicro.com/products/chassis/4U/?chs=742 [6]: http://www.supermicro.com/products/chassis/4U/?chs=743 [7]: http://www.supermicro.com/products/chassis/4U/?chs=745 > I've attached the output of smartctl -a /dev/ad1. I don't think this > error is being caused by the disk though. After reviewing your SMART stats on the drive, I agree -- it looks perfectly healthy (for a Seagate disk). Nothing wrong there. > > > calcru: runtime went backwards from 82 usec to 70 usec for pid 20 (flowcleaner) > > > calcru: runtime went backwards from 363 usec to 317 usec for pid 8 (pagedaemon) > > > calcru: runtime went backwards from 111 usec to 95 usec for pid 7 (xpt_thrd) > > > calcru: runtime went backwards from 1892 usec to 1629 usec for pid 1 (init) > > > calcru: runtime went backwards from 6786 usec to 6591 usec for pid 0 (kernel) > > > > This is a problem that has plagued FreeBSD for some time. It's usually > > caused by EIST (est) being used, but that's on Intel platforms. AMD has > > something similar called Cool'n'Quiet (see cpufreq(4) man page). Are > > you running powerd(8) on this system? If so, try disabling that and see > > if these go away. > > sadly, I don't know if I'm running powerd. > ps aux | grep power gives nothing, so no I guess... > as far as I can tell, this error is the least of my problems right now, > but i would like to fix it. Yes that's an accurate ps/grep to use; powerd_enable="yes" in /etc/rc.conf is how you make use of it. Could you provide output from "sysctl -a | grep freq"? That might help shed some light on the above errors as well, but as I said, I'm not familiar with AMD systems. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB |