Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Jan 2013 17:15:03 -0800
From:      Artem Belevich <art@freebsd.org>
To:        Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: slowdown of zfs (tx->tx)
Message-ID:  <CAFqOu6jrng=v8eVyhqV-PBqJM_dYy%2BU7X4%2B=ahBeoxvK4mxcSA@mail.gmail.com>
In-Reply-To: <20130109162613.GA34276@mid.pc5.i.0x5.de>
References:  <20130108174225.GA17260@mid.pc5.i.0x5.de> <CAFqOu6jgA8RWV5d%2BrOBk8D=3Vu3yWSnDkAi1cFJ0esj4OpBy2Q@mail.gmail.com> <20130109162613.GA34276@mid.pc5.i.0x5.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jan 9, 2013 at 8:26 AM, Nicolas Rachinsky
<fbsd-mas-0@ml.turing-complete.org> wrote:
> * Artem Belevich <art@freebsd.org> [2013-01-08 12:47 -0800]:
>> On Tue, Jan 8, 2013 at 9:42 AM, Nicolas Rachinsky
>> <fbsd-mas-0@ml.turing-complete.org> wrote:
>> >       NAME                      STATE     READ WRITE CKSUM
>> >         pool1                     DEGRADED     0     0     0
>> >           raidz2-0                DEGRADED     0     0     0
>> >             ada5                  ONLINE       0     0     0
>> >             ada8                  ONLINE       0     0     0
>> >             ada2                  ONLINE       0     0     0
>> >             ada3                  ONLINE       0     0     0
>> >             11846390416703086268  UNAVAIL      0     0     0  was /dev/dsk/ada1
>> >             ada6                  ONLINE       0     0     0
>> >             ada0                  ONLINE       0     0     1
>> >             ada7                  ONLINE       0     0     0
>> >             ada4                  ONLINE       0     0     3
>>
>> You seem to have some checksum errors which does suggest hardware troubles.
>
> I somehow missed these. Is there any way to learn when these checksum
> errors happen?

Not on FreeBSD (yet) as far as I can tell. Not explicitly, anyways.
Check /var/log/messages for any indications of SATA errors. There's a
good chance that there was a timeout at some point.

>> For starters, check smart info for all drives and see if they have any
>> relocated sectors.
>
> There are some disks with relocated sectors, but for both ada0 and
> ada4 Reallocated_Sector_Ct is 0.

Are there any UDMA errors? Those would suggest trouble with cabling.

>> Use gstat during your workload to see if any of the drives takes much
>> longer than others to handle its job.
>
> There is one disk sticking out a bit.

In a raid-z pool number of transactions/second is determined by the
slowest disk. Check ms/w column. Look for numbers substantially higher
than typical seek rate (10..20ms is OK, 100 is not).

>
>> > There is almost no disk activity during this time.
>>
>> What kind of disk activity *is* there?
>
> What would be interesting?

Drives 'sticking out' being busy longer than their peers in the pool.
Excessive ms/r or ms/w in gstat. Unexpected reads or writes.

--Artem



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFqOu6jrng=v8eVyhqV-PBqJM_dYy%2BU7X4%2B=ahBeoxvK4mxcSA>