From owner-freebsd-questions  Sat Feb 10 12:45:38 2001
Delivered-To: freebsd-questions@freebsd.org
Received: from anchor-post-31.mail.demon.net (anchor-post-31.mail.demon.net [194.217.242.89])
	by hub.freebsd.org (Postfix) with ESMTP id BEF0337B65D
	for <questions@freebsd.org>; Sat, 10 Feb 2001 12:45:16 -0800 (PST)
Received: from shootthemlater.demon.co.uk ([194.222.93.84] helo=cerebus.parse.net)
	by anchor-post-31.mail.demon.net with esmtp (Exim 2.12 #1)
	id 14RgtO-000901-0V; Sat, 10 Feb 2001 20:45:15 +0000
Received: from wbra0013.cognos.com ([10.0.0.3] helo=acm.org)
	by cerebus.parse.net with esmtp (Exim 3.16 #1)
	id 14RgtD-0001IR-00; Sat, 10 Feb 2001 20:45:03 +0000
Message-ID: <3A85A846.A066D7F8@acm.org>
Date: Sat, 10 Feb 2001 20:44:54 +0000
From: David Goddard <goddard@acm.org>
X-Mailer: Mozilla 4.75 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
To: questions@freebsd.org
Subject: Dying disk?
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Hi,

I have a colocated machine that has recently shown some performance
issues with operations relating to reading from one disk (quotachecking
for example).

As the machine is colocated, I'd really like to have as good an idea as
possible as to what the problem is before going in and taking it
offline - I don't want to go in and slap a new disk in only to find
that the controller is duff.  Is it possible to get a handle on what
the problem is by purely remote means?

The machine has two basically identical disks (except that one works OK
and one doesn't ;) and is running 4.2-STABLE.

The disks themselves are:

ad0: 19541MB <Maxtor 52049H4> [39703/16/63] at ata0-master UDMA66
ad2: 19541MB <Maxtor 52049H4> [39703/16/63] at ata1-master UDMA66

They are partitioned as follows:

Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
/dev/ad0s1a    396895    90173   274971    25%    /
/dev/ad0s1f   6450348     8835  5925486     0%    /home
/dev/ad0s1e   4961725   879537  3685250    19%    /usr
/dev/ad0s1h   3105622   294395  2562778    10%    /var
/dev/ad2s1f  13804609   789687 11910554     6%    /data
procfs              4        4        0   100%    /proc
/dev/ad2s1e   5081581  1341716  3333339    29%    /source
/dev/ad0s1g   3969982   640670  3011714    18%    /tmp

/dev/ad0s1a on / (ufs, local)
/dev/ad0s1f on /home (ufs, local, with quotas, soft-updates)
/dev/ad0s1e on /usr (ufs, local)
/dev/ad0s1h on /var (ufs, local)
/dev/ad2s1f on /data (ufs, local, with quotas, soft-updates)
procfs on /proc (procfs, local)
/dev/ad2s1e on /source (ufs, local)
/dev/ad0s1g on /tmp (ufs, local, soft-updates)

/dev/ad0s1b             swap
/dev/ad2s1b             swap

There is nothing else hanging off the two controllers (the CD is
SCSI) and neither disk has heavy activity.

A simple test that I think illustrates the problem is:

  % dd if=/dev/ad0s1f of=/dev/null bs=64k
  ^C10+0 records in
  10+0 records out
  655360 bytes transferred in 12.117085 secs (54086 bytes/sec)

  % dd if=/dev/ad2s1e of=/dev/null bs=64k
  ^C3910+0 records in
  3910+0 records out
  256245760 bytes transferred in 8.631285 secs (29688020 bytes/sec)

The performance shown above is typical of the tests I tried on various
different partitions - ad0 is consistently poor.

When I ran iozone -a on the affected disk, the system ground to a halt,
and generated kernel log messages like the following:

... swap_pager: indefinite wait buffer: device: #ad /0x20001,
blkno: 744, size: 8192
... swap_pager: indefinite wait buffer: device: #ad /0x20001,
blkno: 888, size: 4096

However, no other kernel messages are generated in normal operation
and the only symptom I notice is poor read performance.

Here's some edited iozone output data - it seems slightly
ambigous to me though:

                                                      random  random    bkwd  record  stride                                   
        KB  reclen   write rewrite    read    reread    read   write    read rewrite    read   fwrite frewrite   fread  freread

ad0    256     256   70447   99764   268085   247108  240360   99455  260951   75161  260413    75537   104027  102356   115995
ad2    256     256   78840  103982   267484   250480  232929   99262  255487   73037  263351    74440    97116  107066   118023

ad0   4096    4096   24229   31252   166761   168282  167355   49435  167752   23819  166524    63642    28024   82164    89767
ad2   4096    4096   58110   32543   168054    34598   52357   62948  168199   57459  166999    15515    42101   56220    38839

ad0  65536   16384   10754   11769   125784    82355   93980   14951  143286   13446  156247    10897    13897   80613    58369
ad2  65536   16384   27298   26043    99463   100081   84366   25778  131916   27387  139401    27694    26630   53390    56263

Any comments would be greatly appreciated...

Dave


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message