Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Jan 2013 17:26:13 +0100
From:      Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org>
To:        Artem Belevich <art@freebsd.org>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: slowdown of zfs (tx->tx)
Message-ID:  <20130109162613.GA34276@mid.pc5.i.0x5.de>
In-Reply-To: <CAFqOu6jgA8RWV5d%2BrOBk8D=3Vu3yWSnDkAi1cFJ0esj4OpBy2Q@mail.gmail.com>
References:  <20130108174225.GA17260@mid.pc5.i.0x5.de> <CAFqOu6jgA8RWV5d%2BrOBk8D=3Vu3yWSnDkAi1cFJ0esj4OpBy2Q@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
* Artem Belevich <art@freebsd.org> [2013-01-08 12:47 -0800]:
> On Tue, Jan 8, 2013 at 9:42 AM, Nicolas Rachinsky
> <fbsd-mas-0@ml.turing-complete.org> wrote:
> >       NAME                      STATE     READ WRITE CKSUM
> >         pool1                     DEGRADED     0     0     0
> >           raidz2-0                DEGRADED     0     0     0
> >             ada5                  ONLINE       0     0     0
> >             ada8                  ONLINE       0     0     0
> >             ada2                  ONLINE       0     0     0
> >             ada3                  ONLINE       0     0     0
> >             11846390416703086268  UNAVAIL      0     0     0  was /dev/dsk/ada1
> >             ada6                  ONLINE       0     0     0
> >             ada0                  ONLINE       0     0     1
> >             ada7                  ONLINE       0     0     0
> >             ada4                  ONLINE       0     0     3
> 
> You seem to have some checksum errors which does suggest hardware troubles.

I somehow missed these. Is there any way to learn when these checksum
errors happen?

> For starters, check smart info for all drives and see if they have any
> relocated sectors.

There are some disks with relocated sectors, but for both ada0 and
ada4 Reallocated_Sector_Ct is 0.

> Use gstat during your workload to see if any of the drives takes much
> longer than others to handle its job.

There is one disk sticking out a bit.

> > There is almost no disk activity during this time.
> 
> What kind of disk activity *is* there?

What would be interesting?


> > sync is disabled for the whole pool.
> 
> If that's the case (assyming you're talking about sync=disabled zfs
> property), then synchronous writes are probably not the cause of
> slowdown. My guess would be either failing HDD or something funky with
> cabling or sata controller.

Yes, sync=disabled for pool1.


Ok, I will start swapping hardware (sadly the machine is quite a drive
away).

Thank you very much for your help.

Nicolas
-- 
http://www.rachinsky.de/nicolas



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130109162613.GA34276>