Date: Tue, 14 Feb 2012 12:31:23 -0800 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Oscar Prieto <oscarmpp@googlemail.com> Cc: Harald Schmalzbauer <h.schmalzbauer@omnilan.de>, freebsd-stable@freebsd.org, Martin Sugioarto <martin@sugioarto.com>, Claudius Herder <claudius@ambtec.de> Subject: Re: problems with AHCI on FreeBSD 8.2 Message-ID: <20120214203123.GA5959@icarus.home.lan> In-Reply-To: <CAK9wqRqR3KMUDchFs9L5bVV_CZUF_DEAx_i_Rp5StAa_%2BdGbGw@mail.gmail.com> References: <20120214100513.GA94501@icarus.home.lan> <20120214135435.GQ2010@equilibrium.bsdes.net> <20120214141601.GA98986@icarus.home.lan> <4F3A83DE.3000200@ambtec.de> <20120214165029.GA1852@icarus.home.lan> <4F3A971F.9040407@omnilan.de> <20120214192319.44ff7aff@zelda.sugioarto.com> <4F3AB4F0.9010002@omnilan.de> <20120214205143.2a6b9c87@zelda.sugioarto.com> <CAK9wqRqR3KMUDchFs9L5bVV_CZUF_DEAx_i_Rp5StAa_%2BdGbGw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 14, 2012 at 09:19:02PM +0100, Oscar Prieto wrote: > Thank you Jeremy, i'm already checking your links. > > When i installed smartd i configured a daily short test and a weekly > long one for all the drives while the machine remains mostly unused, > never thought it could be a problem reading the documentation and info > around. > > # /usr/local/etc/smartd.conf > /dev/ada0 -a -o on -S on -s (S/../.././03|L/../../2/07) > /dev/ada1 -a -o on -S on -s (S/../.././04|L/../../3/07) > /dev/ada2 -a -o on -S on -s (S/../.././05|L/../../4/07) > /dev/ada3 -a -o on -S on -s (S/../.././06|L/../../5/07) The problem is that, quite honestly, these do you zero good. All it does is make a mess (per se) of the SMART self-test log. Take for example your situation with ada3: smartd(8) told you that the number of pending sectors increased to 5, and uncorrected increased to 1. That's really all you need to know at that point. If you want to know the LBA numbers which are problematic, you can manually intervene. The point is: the drive itself is going to notice problematic or bad sectors quicker than periodic short or long or surface scan tests will. Let the drive do its thing normally and only use SMART tests when there's indication something is wrong. > I'll remove the checks, do you advice for removing the daemon altogether? smartd(8) is useful because it keeps track of attributes which change in value and logs data to syslog (if I remember right), thus you have an exact time/date when an attribute changed. This is especially useful for things pertaining to sector/physical media problems. As such, I tend to recommend folks using smartd(8) properly tune their smartd.conf to only monitor specific attributes. This varies from drive to drive, but the key ones are things like attributes 5, 10, 11, 192, 193, 194 (if you want temperature logging), 196, 197, 198, 199, and 200. I'm speaking strictly for Western Digital disks here. The stock defaults, if I remember right, are to "monitor everything", which really doesn't work well given that so many vendors encode their RAW_VALUE fields in proprietary/vendor-specific formats. People will often monitor things like the Hardware_ECC_Recovered attribute and start "freaking out" once day when the value goes from 0 to 838938239 or something larger. Attribute data formats are not part of the ATA standard, so vendors choose to encode them. Plus, not many admins that I've run into (honest) know what that attribute actually means disk-wise (hint: it's 100% normal for sector ECC to happen at all times; magnetic media is not perfect, that's what the per-sector ECC section is for!) However: people don't understand what SMART attribute acquisition actually does behind the scenes -- it results in the disk having to read from the HPA area (not user accessible or within LBA regions), which means seeking + moving the arms to an area, reading, then reporting all of this back. Thus, it impacts I/O performance. This is why I don't use smartd(8) on any of our systems. But if I was to use it? I would have it poll maybe every 120 minutes, rather than every 30. It all depends on the system/load/etc.. I've seen people poll every 5 minutes (I think they're absolutely crazy/paranoid). Their systems, their problem. :-) Hope this helps. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120214203123.GA5959>