From owner-freebsd-stable@FreeBSD.ORG Mon Jan 10 06:51:17 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8F51A106566B for ; Mon, 10 Jan 2011 06:51:17 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta06.emeryville.ca.mail.comcast.net (qmta06.emeryville.ca.mail.comcast.net [76.96.30.56]) by mx1.freebsd.org (Postfix) with ESMTP id 6B2358FC0A for ; Mon, 10 Jan 2011 06:51:17 +0000 (UTC) Received: from omta14.emeryville.ca.mail.comcast.net ([76.96.30.60]) by qmta06.emeryville.ca.mail.comcast.net with comcast id tXXa1f0051HpZEsA6irAMq; Mon, 10 Jan 2011 06:51:10 +0000 Received: from koitsu.dyndns.org ([98.248.34.134]) by omta14.emeryville.ca.mail.comcast.net with comcast id tir91f0022tehsa8air9ZH; Mon, 10 Jan 2011 06:51:09 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 23D3A9B427; Sun, 9 Jan 2011 22:51:09 -0800 (PST) Date: Sun, 9 Jan 2011 22:51:09 -0800 From: Jeremy Chadwick To: Tom Vijlbrief Message-ID: <20110110065109.GA61075@icarus.home.lan> References: <20110109122243.GA37530@icarus.home.lan> <20110109163027.GA42562@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org Subject: Re: Panic 8.2 PRERELEASE WRITE_DMA48 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jan 2011 06:51:17 -0000 On Mon, Jan 10, 2011 at 07:13:57AM +0100, Tom Vijlbrief wrote: > 2011/1/9 Jeremy Chadwick : > > > > > Not to get off topic, but what is causing this?  It looks like you have > > a cron job or something very aggressive doing a "smartctl -t short > > /dev/ad4" or equivalent.  If you have such, please disable this > > immediately.  You shouldn't be doing SMART tests with such regularity; > > it accomplishes absolutely nothing, especially the "short" tests.  Let > > the drive operate normally, otherwise run smartd and watch logs instead. > > > > I have this default entry (from the author of that file) in > smartd.conf and enabled it on many machines over the years. > Is it a bad practice? > > # First (primary) ATA/IDE hard disk. Monitor all attributes, enable > # automatic online data collection, automatic Attribute autosave, and > # start a short self-test every day between 2-3am, and a long self test > # Saturdays between 3-4am. > /dev/hda -a -o on -S on -s (S/../.././02|L/../../6/03) I'll have to talk to Bruce Allen about that. Those entries in smartd.conf are pretty old (meaning they've existed for a very long time, and chances are Bruce hasn't gone back to revamp them or reconsider the logic/justification behind them). I'm an opponent of running SMART tests automatically, given what some do to drives. It's important to remember that most SMART tests can be done while the drive is in operation, and some of theses tests stress the drive, which could potentially cause timeouts or other I/O anomalies (data loss is unlikely, but odd errors may occur; it all depends on the firmware). This is especially important WRT "long" tests. For example, on newer 2TB Western Digital Caviar Black drives, a long test does something that I haven't heard (yes, heard) any other drive do -- it emits a noise that's almost identical to that of a head crash. It could be scanning a very specific region of LBAs (possibly out-of-range sectors, e.g. spares) repetitively, but it sounds nothing like a selective LBA scan. Honestly it does sound like a head crash. Is this something you'd really want to be running every 7 days? I've always advocated that people run smartd only if they want to monitor attributes -- which ultimately are the most important things to keep an eye on anyway. It's even more important to know how to read them. :-) 90% of drives out there update their attributes at set intervals or when the SMART READ DATA command is encountered. And honestly I've never seen a SMART short test do anything useful, on any drive I've used (SATA or SCSI; WD, Seagate, Maxtor, Hitachi, Fujitsu). Long test are different in this regard. I'm fully aware that the terms "short" and "long" are vague in nature and don't really tell a person what the drive is doing behind the scenes. Sadly that's the nature of SMART; they're just tests that are defined on a per-vendor (or per-disk-model!) basis. But as my 2nd paragraph above implies, the behaviour is not consistent. So when people ask me "how do I monitor my disks reliably with SMART then?", I tell them to either do it by hand (which is what I do), or run smartd(8) and keep an eye on their logs. This requires some tuning, and familiarity with what attribute means what, and again on a per-drive or per-vendor basis. It's great that there's no actual standard for these, isn't it? :-) -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB |