Date: Mon, 24 Nov 2008 13:50:38 -0800 From: Jo Rhett <jrhett@netconsonance.com> To: freebsd-stable Stable <freebsd-stable@freebsd.org> Cc: Jeremy Chadwick <koitsu@freebsd.org> Subject: Re: smartd long self-test causes drives to hang Message-ID: <1766C532-64AB-400F-8383-2DBE6BF51D9B@netconsonance.com> In-Reply-To: <EBDD87D8-401B-4812-9121-C3301C06276B@svcolo.com> References: <EBDD87D8-401B-4812-9121-C3301C06276B@svcolo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On re-reading the message I realized that my message was in danger of being content-free. gmirror whole-disk mirror of seagate 300gb drives $ atacontrol list ATA channel 0: Master: ad0 <ST3300622A/3.AAH> ATA/ATAPI revision 7 Slave: ad1 <ST3300622A/3.AAH> ATA/ATAPI revision 7 $ gmirror list Geom name: gm0 State: COMPLETE Components: 2 Balance: round-robin Slice: 4096 Flags: NONE GenID: 0 SyncID: 1 ID: 575427344 Providers: 1. Name: mirror/gm0 Mediasize: 300069051904 (279G) Sectorsize: 512 Mode: r5w5e6 Consumers: 1. Name: ad0 Mediasize: 300069052416 (279G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 3917165570 2. Name: ad1 Mediasize: 300069052416 (279G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 3874187635 On Nov 24, 2008, at 12:48 PM, Jo Rhett wrote: > I've spent about 3 months tracing down what was causing my personal > colo box to start getting "sluggish" right around dawn every > Saturday morning. It took so long because some mornings I simply > couldn't pull my head out of my tail enough to do proper debugging. > > The cause was *really slow* filesystem response time. No cron jobs > in that period. No specific process ran any slower than another, > although I eventually learned that ones which did no file i/o were > fine. And finally I realized that just "ls -la" was very slow (~1 > minute) even after I had killed off every disk-using process in the > system. SMTP and HTTP in particular were basically fubar. > > No data loss, just *real slow*. Nothing other than a soft reboot > ever solved the problem. Even leaving it running only minimal > processes for 24 hours didn't bring it back to normal. > > Finally I was browsing through Jeremy Chadwick's list of known ATA > problems and spotted his comments about smartd self-tests causing > problems. Sure enough, my long self test was scheduled for 5am on > Saturday mornings. Rechecking the observed slow-down periods > confirmed that the problem never became visible before 5am. > (sometimes it took up to 45 minutes before things slowed down enough > to set off monitoring alarms) > > So, long story short, if you're having weirdness in system time > response - check the smartd configuration, and try disabling the > self tests. The short self test I was running daily didn't appear > to affect anything, but the long test was just bringing the system > to just shuddering and limping at best. > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org > "
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1766C532-64AB-400F-8383-2DBE6BF51D9B>