Date: Wed, 9 Jan 2013 17:15:03 -0800 From: Artem Belevich <art@freebsd.org> To: Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org> Cc: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: slowdown of zfs (tx->tx) Message-ID: <CAFqOu6jrng=v8eVyhqV-PBqJM_dYy%2BU7X4%2B=ahBeoxvK4mxcSA@mail.gmail.com> In-Reply-To: <20130109162613.GA34276@mid.pc5.i.0x5.de> References: <20130108174225.GA17260@mid.pc5.i.0x5.de> <CAFqOu6jgA8RWV5d%2BrOBk8D=3Vu3yWSnDkAi1cFJ0esj4OpBy2Q@mail.gmail.com> <20130109162613.GA34276@mid.pc5.i.0x5.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jan 9, 2013 at 8:26 AM, Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org> wrote: > * Artem Belevich <art@freebsd.org> [2013-01-08 12:47 -0800]: >> On Tue, Jan 8, 2013 at 9:42 AM, Nicolas Rachinsky >> <fbsd-mas-0@ml.turing-complete.org> wrote: >> > NAME STATE READ WRITE CKSUM >> > pool1 DEGRADED 0 0 0 >> > raidz2-0 DEGRADED 0 0 0 >> > ada5 ONLINE 0 0 0 >> > ada8 ONLINE 0 0 0 >> > ada2 ONLINE 0 0 0 >> > ada3 ONLINE 0 0 0 >> > 11846390416703086268 UNAVAIL 0 0 0 was /dev/dsk/ada1 >> > ada6 ONLINE 0 0 0 >> > ada0 ONLINE 0 0 1 >> > ada7 ONLINE 0 0 0 >> > ada4 ONLINE 0 0 3 >> >> You seem to have some checksum errors which does suggest hardware troubles. > > I somehow missed these. Is there any way to learn when these checksum > errors happen? Not on FreeBSD (yet) as far as I can tell. Not explicitly, anyways. Check /var/log/messages for any indications of SATA errors. There's a good chance that there was a timeout at some point. >> For starters, check smart info for all drives and see if they have any >> relocated sectors. > > There are some disks with relocated sectors, but for both ada0 and > ada4 Reallocated_Sector_Ct is 0. Are there any UDMA errors? Those would suggest trouble with cabling. >> Use gstat during your workload to see if any of the drives takes much >> longer than others to handle its job. > > There is one disk sticking out a bit. In a raid-z pool number of transactions/second is determined by the slowest disk. Check ms/w column. Look for numbers substantially higher than typical seek rate (10..20ms is OK, 100 is not). > >> > There is almost no disk activity during this time. >> >> What kind of disk activity *is* there? > > What would be interesting? Drives 'sticking out' being busy longer than their peers in the pool. Excessive ms/r or ms/w in gstat. Unexpected reads or writes. --Artem
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFqOu6jrng=v8eVyhqV-PBqJM_dYy%2BU7X4%2B=ahBeoxvK4mxcSA>