From owner-freebsd-questions Sat Feb 10 12:45:38 2001 Delivered-To: freebsd-questions@freebsd.org Received: from anchor-post-31.mail.demon.net (anchor-post-31.mail.demon.net [194.217.242.89]) by hub.freebsd.org (Postfix) with ESMTP id BEF0337B65D for ; Sat, 10 Feb 2001 12:45:16 -0800 (PST) Received: from shootthemlater.demon.co.uk ([194.222.93.84] helo=cerebus.parse.net) by anchor-post-31.mail.demon.net with esmtp (Exim 2.12 #1) id 14RgtO-000901-0V; Sat, 10 Feb 2001 20:45:15 +0000 Received: from wbra0013.cognos.com ([10.0.0.3] helo=acm.org) by cerebus.parse.net with esmtp (Exim 3.16 #1) id 14RgtD-0001IR-00; Sat, 10 Feb 2001 20:45:03 +0000 Message-ID: <3A85A846.A066D7F8@acm.org> Date: Sat, 10 Feb 2001 20:44:54 +0000 From: David Goddard X-Mailer: Mozilla 4.75 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: questions@freebsd.org Subject: Dying disk? Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hi, I have a colocated machine that has recently shown some performance issues with operations relating to reading from one disk (quotachecking for example). As the machine is colocated, I'd really like to have as good an idea as possible as to what the problem is before going in and taking it offline - I don't want to go in and slap a new disk in only to find that the controller is duff. Is it possible to get a handle on what the problem is by purely remote means? The machine has two basically identical disks (except that one works OK and one doesn't ;) and is running 4.2-STABLE. The disks themselves are: ad0: 19541MB [39703/16/63] at ata0-master UDMA66 ad2: 19541MB [39703/16/63] at ata1-master UDMA66 They are partitioned as follows: Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 396895 90173 274971 25% / /dev/ad0s1f 6450348 8835 5925486 0% /home /dev/ad0s1e 4961725 879537 3685250 19% /usr /dev/ad0s1h 3105622 294395 2562778 10% /var /dev/ad2s1f 13804609 789687 11910554 6% /data procfs 4 4 0 100% /proc /dev/ad2s1e 5081581 1341716 3333339 29% /source /dev/ad0s1g 3969982 640670 3011714 18% /tmp /dev/ad0s1a on / (ufs, local) /dev/ad0s1f on /home (ufs, local, with quotas, soft-updates) /dev/ad0s1e on /usr (ufs, local) /dev/ad0s1h on /var (ufs, local) /dev/ad2s1f on /data (ufs, local, with quotas, soft-updates) procfs on /proc (procfs, local) /dev/ad2s1e on /source (ufs, local) /dev/ad0s1g on /tmp (ufs, local, soft-updates) /dev/ad0s1b swap /dev/ad2s1b swap There is nothing else hanging off the two controllers (the CD is SCSI) and neither disk has heavy activity. A simple test that I think illustrates the problem is: % dd if=/dev/ad0s1f of=/dev/null bs=64k ^C10+0 records in 10+0 records out 655360 bytes transferred in 12.117085 secs (54086 bytes/sec) % dd if=/dev/ad2s1e of=/dev/null bs=64k ^C3910+0 records in 3910+0 records out 256245760 bytes transferred in 8.631285 secs (29688020 bytes/sec) The performance shown above is typical of the tests I tried on various different partitions - ad0 is consistently poor. When I ran iozone -a on the affected disk, the system ground to a halt, and generated kernel log messages like the following: ... swap_pager: indefinite wait buffer: device: #ad /0x20001, blkno: 744, size: 8192 ... swap_pager: indefinite wait buffer: device: #ad /0x20001, blkno: 888, size: 4096 However, no other kernel messages are generated in normal operation and the only symptom I notice is poor read performance. Here's some edited iozone output data - it seems slightly ambigous to me though: random random bkwd record stride KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread ad0 256 256 70447 99764 268085 247108 240360 99455 260951 75161 260413 75537 104027 102356 115995 ad2 256 256 78840 103982 267484 250480 232929 99262 255487 73037 263351 74440 97116 107066 118023 ad0 4096 4096 24229 31252 166761 168282 167355 49435 167752 23819 166524 63642 28024 82164 89767 ad2 4096 4096 58110 32543 168054 34598 52357 62948 168199 57459 166999 15515 42101 56220 38839 ad0 65536 16384 10754 11769 125784 82355 93980 14951 143286 13446 156247 10897 13897 80613 58369 ad2 65536 16384 27298 26043 99463 100081 84366 25778 131916 27387 139401 27694 26630 53390 56263 Any comments would be greatly appreciated... Dave To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message