Date: Sat, 10 Feb 2001 20:44:54 +0000 From: David Goddard <goddard@acm.org> To: questions@freebsd.org Subject: Dying disk? Message-ID: <3A85A846.A066D7F8@acm.org>
next in thread | raw e-mail | index | archive | help
Hi,
I have a colocated machine that has recently shown some performance
issues with operations relating to reading from one disk (quotachecking
for example).
As the machine is colocated, I'd really like to have as good an idea as
possible as to what the problem is before going in and taking it
offline - I don't want to go in and slap a new disk in only to find
that the controller is duff. Is it possible to get a handle on what
the problem is by purely remote means?
The machine has two basically identical disks (except that one works OK
and one doesn't ;) and is running 4.2-STABLE.
The disks themselves are:
ad0: 19541MB <Maxtor 52049H4> [39703/16/63] at ata0-master UDMA66
ad2: 19541MB <Maxtor 52049H4> [39703/16/63] at ata1-master UDMA66
They are partitioned as follows:
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/ad0s1a 396895 90173 274971 25% /
/dev/ad0s1f 6450348 8835 5925486 0% /home
/dev/ad0s1e 4961725 879537 3685250 19% /usr
/dev/ad0s1h 3105622 294395 2562778 10% /var
/dev/ad2s1f 13804609 789687 11910554 6% /data
procfs 4 4 0 100% /proc
/dev/ad2s1e 5081581 1341716 3333339 29% /source
/dev/ad0s1g 3969982 640670 3011714 18% /tmp
/dev/ad0s1a on / (ufs, local)
/dev/ad0s1f on /home (ufs, local, with quotas, soft-updates)
/dev/ad0s1e on /usr (ufs, local)
/dev/ad0s1h on /var (ufs, local)
/dev/ad2s1f on /data (ufs, local, with quotas, soft-updates)
procfs on /proc (procfs, local)
/dev/ad2s1e on /source (ufs, local)
/dev/ad0s1g on /tmp (ufs, local, soft-updates)
/dev/ad0s1b swap
/dev/ad2s1b swap
There is nothing else hanging off the two controllers (the CD is
SCSI) and neither disk has heavy activity.
A simple test that I think illustrates the problem is:
% dd if=/dev/ad0s1f of=/dev/null bs=64k
^C10+0 records in
10+0 records out
655360 bytes transferred in 12.117085 secs (54086 bytes/sec)
% dd if=/dev/ad2s1e of=/dev/null bs=64k
^C3910+0 records in
3910+0 records out
256245760 bytes transferred in 8.631285 secs (29688020 bytes/sec)
The performance shown above is typical of the tests I tried on various
different partitions - ad0 is consistently poor.
When I ran iozone -a on the affected disk, the system ground to a halt,
and generated kernel log messages like the following:
... swap_pager: indefinite wait buffer: device: #ad /0x20001,
blkno: 744, size: 8192
... swap_pager: indefinite wait buffer: device: #ad /0x20001,
blkno: 888, size: 4096
However, no other kernel messages are generated in normal operation
and the only symptom I notice is poor read performance.
Here's some edited iozone output data - it seems slightly
ambigous to me though:
random random bkwd record stride
KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
ad0 256 256 70447 99764 268085 247108 240360 99455 260951 75161 260413 75537 104027 102356 115995
ad2 256 256 78840 103982 267484 250480 232929 99262 255487 73037 263351 74440 97116 107066 118023
ad0 4096 4096 24229 31252 166761 168282 167355 49435 167752 23819 166524 63642 28024 82164 89767
ad2 4096 4096 58110 32543 168054 34598 52357 62948 168199 57459 166999 15515 42101 56220 38839
ad0 65536 16384 10754 11769 125784 82355 93980 14951 143286 13446 156247 10897 13897 80613 58369
ad2 65536 16384 27298 26043 99463 100081 84366 25778 131916 27387 139401 27694 26630 53390 56263
Any comments would be greatly appreciated...
Dave
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3A85A846.A066D7F8>
