Date: Mon, 24 Nov 2008 12:48:22 -0800 From: Jo Rhett <jrhett@svcolo.com> To: freebsd-stable Stable <freebsd-stable@freebsd.org> Cc: Jeremy Chadwick <koitsu@freebsd.org> Subject: smartd long self-test causes drives to hang Message-ID: <EBDD87D8-401B-4812-9121-C3301C06276B@svcolo.com>
next in thread | raw e-mail | index | archive | help
I've spent about 3 months tracing down what was causing my personal colo box to start getting "sluggish" right around dawn every Saturday morning. It took so long because some mornings I simply couldn't pull my head out of my tail enough to do proper debugging. The cause was *really slow* filesystem response time. No cron jobs in that period. No specific process ran any slower than another, although I eventually learned that ones which did no file i/o were fine. And finally I realized that just "ls -la" was very slow (~1 minute) even after I had killed off every disk-using process in the system. SMTP and HTTP in particular were basically fubar. No data loss, just *real slow*. Nothing other than a soft reboot ever solved the problem. Even leaving it running only minimal processes for 24 hours didn't bring it back to normal. Finally I was browsing through Jeremy Chadwick's list of known ATA problems and spotted his comments about smartd self-tests causing problems. Sure enough, my long self test was scheduled for 5am on Saturday mornings. Rechecking the observed slow-down periods confirmed that the problem never became visible before 5am. (sometimes it took up to 45 minutes before things slowed down enough to set off monitoring alarms) So, long story short, if you're having weirdness in system time response - check the smartd configuration, and try disabling the self tests. The short self test I was running daily didn't appear to affect anything, but the long test was just bringing the system to just shuddering and limping at best.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?EBDD87D8-401B-4812-9121-C3301C06276B>