Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 25 Nov 2012 14:55:09 -0500
From:      Trent Nelson <trent@snakebite.org>
To:        <freebsd-fs@freebsd.org>
Subject:   Solved persistent stalls in 'zio->io_cv)'
Message-ID:  <20121125195509.GB52355@snakebite.org>

next in thread | raw e-mail | index | archive | help
    [ For the archives... ]

    Over the course of about a month I noticed progressively worse
    performance on my main FreeBSD ZFS box.  It's a simple box; 1U,
    root-on-zfs, 2TB WD EARS zpool mirror.

    Almost everything started stalling towards the end.  A simple ls
    would take 15-20 seconds to return, with ^T always indicating the
    wait state was 'zio->io_cv)'.

    smartctl/smartmond had detected two offline uncorrectable sectors on
    one of the drives, but subsequent full scans revealed no problems.

    I vaguely recall reading an old mailing list post that mentioned
    zio->io_cv was essentially the zfs equivalent to biord, i.e. "the
    system is waiting for the disk".  Coupled with knowledge that a
    zpool mirror will equally split disk load between physical devices
    regardless of underlying device performance (i.e. it doesn't grok
    "disk 1 takes 600% longer to return my reads() than disk 0"), so
    I decided to run `diskinfo -tv` on each disk, not being able to
    think of anything else to run that would be useful, diagnostically.

    The output was pretty telling (pasted in full at the end of this
    e-mail):

 ada1:
  Seek Times:
    Full stroke:   250 iter in  38.386551 sec =  153.546 msec
    Half stroke:   250 iter in  32.733690 sec =  130.935 msec
    ...
  Transfer Rates:
    outside:  102400 kbytes in  10.554800 sec =    9702 kbytes/sec
    middle:   102400 kbytes in  2.299292 sec  =   44535 kbytes/sec

 ada0:
  Seek Times:
    Full stroke:   250 iter in  13.585292 sec =  54.341 msec
    Half stroke:   250 iter in  6.156059 sec  =  24.624 msec
    ...
  Transfer Rates:
    outside: 102400 kbytes in  2.234584 sec =   45825 kbytes/sec
    middle:  102400 kbytes in  4.405934 sec =   23241 kbytes/sec

    So, I concluded (or rather, hoped) that the abysmal performance
    was due to ada1 responding exponentially slower to requests than
    its zpool mirror counterpart ada0.

    I didn't have a replacement available, but performance was so bad
    I decided to break the zpool mirror and try simply limping along on
    a single disk until a replacement arrived (drive is still under WD
    warranty, so sent it in for an RMA replacement).

    Performance has been fine ever since (been about a week now on the
    single disk).

    Just posting for the benefit of the archives.  There are lots of
    hits for 'zio->io_cv)', but nothing related to the particulars in
    this case.

        Trent.

 Full diskinfo output just prior to splitting the mirror:

    [root@hydrogen/ttypts/3(~)#] diskinfo -tv ada1
    ada1
     512            # sectorsize
     2000398934016  # mediasize in bytes (1.8T)
     3907029168     # mediasize in sectors
     4096           # stripesize
     0              # stripeoffset
     3876021        # Cylinders according to firmware.
     16             # Heads according to firmware.
     63             # Sectors according to firmware.
     WD-WMAZA5463028 # Disk ident.
      
    Seek times:
     Full stroke:    250 iter in  38.386551 sec =  153.546 msec
     Half stroke:    250 iter in  32.733690 sec =  130.935 msec
     Quarter stroke: 500 iter in  30.632886 sec =  61.266 msec
     Short forward:  400 iter in  53.067122 sec =  132.668 msec
     Short backward: 400 iter in  6.971919 sec  =  17.430 msec
     Seq outer:     2048 iter in  0.332570 sec  =   0.162 msec
     Seq inner:     2048 iter in  0.921388 sec  =   0.450 msec
    Transfer rates:
     outside:      102400 kbytes in  10.554800 sec =  9702 kbytes/sec
     middle:       102400 kbytes in  2.299292 sec =   44535 kbytes/sec
     inside:       102400 kbytes in  7.301463 sec =   14025 kbytes/sec
      
    [root@hydrogen/ttypts/3(~)#] diskinfo -tv ada0
    ada0
     512            # sectorsize
     2000398934016  # mediasize in bytes (1.8T)
     3907029168     # mediasize in sectors
     4096           # stripesize
     0              # stripeoffset
     3876021        # Cylinders according to firmware.
     16             # Heads according to firmware.
     63             # Sectors according to firmware.
     WD-WMAZA5467587 # Disk ident.
      
    Seek times:
     Full stroke:    250 iter in  13.585292 sec = 54.341 msec
     Half stroke:    250 iter in  6.156059 sec =  24.624 msec
     Quarter stroke: 500 iter in  8.901942 sec =  17.804 msec
     Short forward:  400 iter in  5.421932 sec =  13.555 msec
     Short backward: 400 iter in  3.237963 sec =   8.095 msec
     Seq outer:     2048 iter in  0.636089 sec =   0.311 msec
     Seq inner:     2048 iter in  0.231709 sec =   0.113 msec
    Transfer rates:
     outside:      102400 kbytes in  2.234584 sec =   45825 kbytes/sec
     middle:       102400 kbytes in  4.405934 sec =   23241 kbytes/sec
     inside:       102400 kbytes in  3.050623 sec =   33567 kbytes/sec

    Interestingly enough, I just re-ran diskinfo again then on the
    remaining disk, and results are significantly better than those
    above:

  [root@hydrogen/ttypts/0(~)#] diskinfo -tv ada0
  ada0
          512             # sectorsize
          2000398934016   # mediasize in bytes (1.8T)
          3907029168      # mediasize in sectors
          4096            # stripesize
          0               # stripeoffset
          3876021         # Cylinders according to firmware.
          16              # Heads according to firmware.
          63              # Sectors according to firmware.
          WD-WMAZA5467587 # Disk ident.
  
  Seek times:
          Full stroke:      250 iter in   7.037776 sec =   28.151 msec
          Half stroke:      250 iter in   5.805128 sec =   23.221 msec
          Quarter stroke:   500 iter in   7.452755 sec =   14.906 msec
          Short forward:    400 iter in   3.966540 sec =    9.916 msec
          Short backward:   400 iter in   1.255183 sec =    3.138 msec
          Seq outer:       2048 iter in   0.245077 sec =    0.120 msec
          Seq inner:       2048 iter in   0.212632 sec =    0.104 msec
  Transfer rates:
          outside:       102400 kbytes in   1.286970 sec =    79567 kbytes/sec
          middle:        102400 kbytes in   1.062096 sec =    96413 kbytes/sec
          inside:        102400 kbytes in   1.996499 sec =    51290 kbytes/sec
  

--
http://www.snakebite.net | @trentnelson



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121125195509.GB52355>