Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 03 Jun 2013 15:48:30 -0600 (MDT)
From:      Ross Alexander <rwa@athabascau.ca>
To:        Jeremy Chadwick <jdc@koitsu.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: 9.1-current disk throughput stalls ?
Message-ID:  <alpine.BSF.2.00.1306031433130.1926@autopsy.pc.athabascau.ca>
In-Reply-To: <20130603203146.GB49602@icarus.home.lan>
References:  <alpine.BSF.2.00.1306030844360.79095@auwow.bogons> <20130603203146.GB49602@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 3 Jun 2013, Jeremy Chadwick wrote:

> 1. There is no such thing as 9.1-CURRENT.  Either you meant 9.1-STABLE
> (what should be called stable/9) or -CURRENT (what should be called
> head).

> I wrote:
>> The oldest kernel I have that shows the syndrome is -
>>
>>     FreeBSD aukward.bogons 9.1-STABLE FreeBSD 9.1-STABLE #59 r250498:
>>     Sat May 11 00:03:15 MDT 2013
>>     toor@aukward.bogons:/usr/obj/usr/src/sys/GENERIC  amd64

See above.  You're right, I shouldn't post after a 07:00 dentist's
appt while my spouse is worrying me about the ins adjustor's report
on the car damage :(.  Hey, I'm very fallible.  I'll try harder.

> 2. Is there some reason you excluded details of your ZFS setup?
> "zpool status" would be a good start.

Thanks for the useful hint as to what info you need to diagnose.

One of the machines ran a 5 drive zraid-1 pool (Mnemosyne).

Another was a 2 drive gmirror, in the simplest possible gpart/gmirror setup.
(Mnemosyne-sub-1.)

The third is a 2 drive ZFS raid-1, again in the simplest possible
gpart/gmirror manner (Aukward).

The fourth is a conceptually identical 2 drive ZFS raid-1, swapping
to a zvol (Griffon.)

If you look on the FreeBSD wiki, the pages that say "bootable zfs
gptzfsboot" and "bootable mirror" -

 	  https://wiki.freebsd.org/RootOnZFS
 	  http://www.freebsdwiki.net/index.php/RAID1,_Software,_How_to_setup

Well, I just followed those in cookbook style (modulo device and pool
names).  Didn't see any reason to be creative; I build for
reliability, not performance.

Aukward is gpart/zfs raid-1 box #1:

     aukward:/u0/rwa > ls -l /dev/gpt
     total 0
     crw-r-----  1 root  operator  0x91 Jun  3 10:18 vol0
     crw-r-----  1 root  operator  0x8e Jun  3 10:18 vol1

     aukward:/u0/rwa > zpool list -v
     NAME           SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
     ult_root       111G   108G  2.53G    97%  1.00x  ONLINE  -
       mirror       111G   108G  2.53G         -
 	gpt/vol0      -      -      -         -
 	gpt/vol1      -      -      -         -

     aukward:/u0/rwa > zpool status
       pool: ult_root
      state: ONLINE
       scan: scrub repaired 0 in 1h13m with 0 errors on Sun May  5 04:29:30 2013
     config:

 	    NAME          STATE     READ WRITE CKSUM
 	    ult_root      ONLINE       0     0     0
 	      mirror-0    ONLINE       0     0     0
 		gpt/vol0  ONLINE       0     0     0
 		gpt/vol1  ONLINE       0     0     0

     errors: No known data errors

(Yes, that machine has no swap.  Has NEVER had swap, has 16 GB and
uses maybe 10% at max load.  Has been running 9.x since prerelease
days, FWTW.  The ARC is throttled to 2 GB; zfs-stats says I never get
near using even that.  It's just the box that drives the radios,
a ham radio hobby machine.)

Griffon is also gpart/zfs raid-1 -

     griffon:/u0/rwa > uname -a
 	FreeBSD griffon.cs.athabascau.ca 9.1-STABLE FreeBSD 9.1-STABLE #25 r251062M:
 	Tue May 28 10:39:13 MDT 2013
 	toor@griffon.cs.athabascau.ca:/usr/obj/usr/src/sys/GENERIC
 	amd64

     griffon:/u0/rwa > ls -l /dev/gpt
     total 0
     crw-r-----  1 root  operator  0x7b Jun  3 08:38 disk0
     crw-r-----  1 root  operator  0x80 Jun  3 08:38 disk1
     crw-r-----  1 root  operator  0x79 Jun  3 08:38 swap0
     crw-r-----  1 root  operator  0x7e Jun  3 08:38 swap1

and the pool is fat and happy -

     griffon:/u0/rwa > zpool status -v
       pool: pool0
      state: ONLINE
       scan: none requested
     config:

 	    NAME           STATE     READ WRITE CKSUM
 	    pool0          ONLINE       0     0     0
 	      mirror-0     ONLINE       0     0     0
 		gpt/disk0  ONLINE       0     0     0
 		gpt/disk1  ONLINE       0     0     0

     errors: No known data errors

Note that swap is through ZFS zvol;

     griffon:/u0/rwa > cat /etc/fstab
     # Device        Mountpoint      FStype  Options         Dump    Pass#
     #
     #
     /dev/zvol/pool0/swap none       swap    sw              0       0

     pool0           /               zfs     rw              0       0
     pool0/tmp       /tmp            zfs     rw              0       0
     pool0/var       /var            zfs     rw              0       0
     pool0/usr       /usr            zfs     rw              0       0
     pool0/u0        /u0             zfs     rw              0       0

     /dev/cd0        /cdrom          cd9660  ro,noauto       0       0
     /dev/ada2s1d    /mnt0           ufs     rw,noauto       0       0
     /dev/da0s1      /u0/rwa/camera  msdosfs rw,noauto       0       0

The machine has 32 GB and never swaps.  It runs virtualbox loads, anything
from one to forty virtuals (little OpenBSD images.)  Load is always light.

As for the zraid-5 box (Mnemosyne), I first replaced the ZFS pool with
a simple gpart/gmirror.  The drives gmirrored are known to be good.  That
*also* ran like mud.  Then I downgraded to 8.4-STABLE, GENERIC kernel,
and it's just fine now thanks.

I have the five zraid-1 disks that were pulled sitting in a second 4
core server chassis, on my desk, and they fail in that machine in the
same way that the production box died.  I'm 150 km away and the power
went down over the weekend at the remote site so I'll have to wait
until tomorrow to send you those details.

For now, think cut-and-paste from freebsd wiki, nothing clever,
everything as simple as possible.  Film at 11.

> 3. Do any of your filesystems/pools have ZFS compression enabled, or
> have in the past?

No; disk is too cheap to bother with that.

> 4. Do any of your filesystems/pools have ZFS dedup enabled, or have in
> the past?

No; disk is too cheap to bother with that.

> 5. Does the problem go away after a reboot?

It goes away for a few minutes, and then comes back on little cat feet.
Gradual slowdown.

> 6. Can you provide smartctl -x output for both ada0 and ada1?  You will
> need to install ports/sysutils/smartmontools for this.  The reason I'm
> asking for this is there may be one of your disks which is causing I/O
> transactions to stall for the entire pool (i.e. "single point of
> annoyance").

Been down that path, good call, Mnemosyne (zraid-1) checked clean as a
whistle. (Later) Griffon checks out clean, too.  Both -x and -a.
Aukward might have an iffy device, I will sched some self tests and
post everything, all neatly tabulated.

I've already fought a bad disk, and also just-slighly-iffy cables,
in a ZFS context and that time was nothing like this one.

> 7. Can you remove ZFS from the picture entirely (use UFS only) and
> re-test?  My guess is that this is ZFS behaviour, particularly the ARC
> being flushed to disk, and your disks are old/slow.  (Meaning: you have
> 16GB RAM + 4 core CPU but with very old disks).

Already did that.  A gmirror 9.1 (Mnemosyne-sub-1) box slowly choked
and died just like the ZFS instance did.  An 8.4-STABLE back-rev
without hardware changes was the fix.

Also: I noticed that when I mounted the 9.1 zraid from an 8.4 flash
fixit disk, everything ran quickly and stably.  I did copies of about
635 GB worth of ~3 GB sized .pcap files out of the zraid onto a SCSI
UFS and the ZFS disks were all about 75 to 80% busy for the ~8000
seconds the copy was running.  No slowdowns, no stalls.

BTW, I'd like to thank you for your kind interest, and please forgive
my poor reporting skills - I'm at home, work is 150 k away, the phone
keeps ringing, there are a lot of boxes, I'm sleep deprived, whine &
snivel, grumble & moan ;)

regards,
Ross
--
Ross Alexander, (780) 675-6823 desk / (780) 689-0749 cell, rwa@athabascau.ca

 	"Always do right. This will gratify some people,
 	 and astound the rest."  -- Samuel Clemens

-- 
    This communication is intended for the use of the recipient to whom it
    is addressed, and may contain confidential, personal, and or privileged
    information. Please contact us immediately if you are not the intended
    recipient of this communication, and do not copy, distribute, or take
    action relying on it. Any communications received in error, or
    subsequent reply, should be deleted or destroyed.
---



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1306031433130.1926>