FreeBSD Mail Archives

Date:      Tue, 14 Feb 2012 12:31:23 -0800
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Oscar Prieto <oscarmpp@googlemail.com>
Cc:        Harald Schmalzbauer <h.schmalzbauer@omnilan.de>, freebsd-stable@freebsd.org, Martin Sugioarto <martin@sugioarto.com>, Claudius Herder <claudius@ambtec.de>
Subject:   Re: problems with AHCI on FreeBSD 8.2
Message-ID:  <20120214203123.GA5959@icarus.home.lan>
In-Reply-To: <CAK9wqRqR3KMUDchFs9L5bVV_CZUF_DEAx_i_Rp5StAa_%2BdGbGw@mail.gmail.com>
References:  <20120214100513.GA94501@icarus.home.lan> <20120214135435.GQ2010@equilibrium.bsdes.net> <20120214141601.GA98986@icarus.home.lan> <4F3A83DE.3000200@ambtec.de> <20120214165029.GA1852@icarus.home.lan> <4F3A971F.9040407@omnilan.de> <20120214192319.44ff7aff@zelda.sugioarto.com> <4F3AB4F0.9010002@omnilan.de> <20120214205143.2a6b9c87@zelda.sugioarto.com> <CAK9wqRqR3KMUDchFs9L5bVV_CZUF_DEAx_i_Rp5StAa_%2BdGbGw@mail.gmail.com>

On Tue, Feb 14, 2012 at 09:19:02PM +0100, Oscar Prieto wrote:
> Thank you Jeremy, i'm already checking your links.
> 
> When i installed smartd i configured a daily short test and a weekly
> long one for all the drives while the machine remains mostly unused,
> never thought it could be a problem reading the documentation and info
> around.
> 
> # /usr/local/etc/smartd.conf
> /dev/ada0 -a -o on -S on -s (S/../.././03|L/../../2/07)
> /dev/ada1 -a -o on -S on -s (S/../.././04|L/../../3/07)
> /dev/ada2 -a -o on -S on -s (S/../.././05|L/../../4/07)
> /dev/ada3 -a -o on -S on -s (S/../.././06|L/../../5/07)

The problem is that, quite honestly, these do you zero good.  All it does
is make a mess (per se) of the SMART self-test log.

Take for example your situation with ada3: smartd(8) told you that the
number of pending sectors increased to 5, and uncorrected increased to
1.  That's really all you need to know at that point.  If you want to
know the LBA numbers which are problematic, you can manually intervene.

The point is: the drive itself is going to notice problematic or bad
sectors quicker than periodic short or long or surface scan tests will.
Let the drive do its thing normally and only use SMART tests when
there's indication something is wrong.

> I'll remove the checks, do you advice for removing the daemon altogether?

smartd(8) is useful because it keeps track of attributes which change in
value and logs data to syslog (if I remember right), thus you have an
exact time/date when an attribute changed.  This is especially useful
for things pertaining to sector/physical media problems.

As such, I tend to recommend folks using smartd(8) properly tune their
smartd.conf to only monitor specific attributes.  This varies from drive
to drive, but the key ones are things like attributes 5, 10, 11, 192,
193, 194 (if you want temperature logging), 196, 197, 198, 199, and 200.
I'm speaking strictly for Western Digital disks here.

The stock defaults, if I remember right, are to "monitor everything",
which really doesn't work well given that so many vendors encode their
RAW_VALUE fields in proprietary/vendor-specific formats.  People will
often monitor things like the Hardware_ECC_Recovered attribute and start
"freaking out" once day when the value goes from 0 to 838938239 or
something larger.  Attribute data formats are not part of the ATA
standard, so vendors choose to encode them.  Plus, not many admins that
I've run into (honest) know what that attribute actually means
disk-wise (hint: it's 100% normal for sector ECC to happen at all times;
magnetic media is not perfect, that's what the per-sector ECC section is
for!)

However: people don't understand what SMART attribute acquisition
actually does behind the scenes -- it results in the disk having to read
from the HPA area (not user accessible or within LBA regions), which
means seeking + moving the arms to an area, reading, then reporting all
of this back.  Thus, it impacts I/O performance.  This is why I don't
use smartd(8) on any of our systems.  But if I was to use it?  I would
have it poll maybe every 120 minutes, rather than every 30.  It all
depends on the system/load/etc..  I've seen people poll every 5 minutes
(I think they're absolutely crazy/paranoid).  Their systems, their
problem.  :-)

Hope this helps.

-- 
| Jeremy Chadwick                                 jdc@parodius.com |
| Parodius Networking                     http://www.parodius.com/ |
| UNIX Systems Administrator                 Mountain View, CA, US |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120214203123.GA5959>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation