FreeBSD Mail Archives

Date:      Wed, 09 May 2012 08:55:18 +0200
From:      Peter Maloney <peter.maloney@brockmann-consult.de>
To:        freebsd-fs@freebsd.org
Subject:   Re: ZFS resilvering strangles IO
Message-ID:  <4FAA14D6.8060302@brockmann-consult.de>
In-Reply-To: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de>
References:  <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de>

About the slow performance during resilver,

Are they consumer disks? If so, one guess is you have a bad disk. Check
by looking at load and ms per x on disks. If one is high and others are
low, then it's probably bad. If a single 'good' disk is bad, the whole
thing will run very slow. Bad consumer disks run very slow trying over
and over to read the not-yet-bad sectors where enterprise disks would
throw errors and fail.

My other guess is that this is because FreeBSD, unlike Linux and
Solaris, lacks IO scheduling. So there is no way for the zfs code to
truly put the resilver on lower priority than the regular production
applications. I've read that IO scheduling was developed for 8.2, but
never officially adopted. I would love to see it in FreeBSD... I use
"ionice" on Linux all the time (for copying, backups, zipping,
installing a a huge batch of packages [noticeable >300 MB], etc. while I
work on other things), so I miss it. IO scheduling on Solaris also helps
with dedup performance.

Does anyone know if there is a movement to add the IO scheduling code
into the base system?


On 05/08/2012 04:33 PM, Michael Gmelin wrote:
> Hello,
>
> I know I'm not the first one to ask this, but I couldn't find a definitive answers in previous threads.
>
> I'm running a FreeBSD 9.0 RELEASE-p1 amd64 system, 8 x 1TB SATA2 drives (not SAS) and an LSI SAS 9211 controller in IT mode (HBAs, da0-da7). Zpool version 28, raidz2 container. Machine has 4GB of RAM, therefore ZFS prefetch is disabled. No manual tuning of ZFS options. Pool contains about 1TB of data right now (so about 25% full). In normal operations the pool shows excellent performance. Yesterday I had to replace a drive, so resilvering started. The resilver process took about 15 hours - which seems a little bit slow to me, but whatever - what really struck me was that during resilvering the pool performance got really bad. Read performance was acceptable, but write performance got down to 500kb/s (for almost all of the 15 hours). After resilvering finished, system performance returned to normal.
>
> Fortunately this is a backup server and no full backups were scheduled, so no drama, but I really don't want to have to replace a drive in a database (or other high IO) server this way (I would have been forced to offline the drive somehow and migrate data to another server).
>
> So the question is, is there anything I can do to improve the situation? Is this because of memory constraints? Are there any other knobs to adjust? As far as I know zfs_resilver_delay can't be changed in FreeBSD yet.
>
> I have more drives around, so I could replace another one in the server, just to replicate the exact situation.
>
> Cheers,
> Michael
>
> Disk layout:
>
> daXp1128 boot
> daXp2 16G frebsd-swap
> daXp3 915G freebsd-zfs
>
>
> Zpool status during resilvering:
>
> [root@backup /tmp]# zpool status -v
>   pool: tank
>  state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
>         continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>  scan: resilver in progress since Mon May  7 20:18:34 2012
>     249G scanned out of 908G at 18.2M/s, 10h17m to go
>     31.2G resilvered, 27.46% done
> config:
>
>         NAME                        STATE     READ WRITE CKSUM
>         tank                        DEGRADED     0     0     0
>           raidz2-0                  DEGRADED     0     0     0
>             replacing-0             REMOVED      0     0     0
>               15364271088212071398  REMOVED      0     0     0  was
> /dev/da0p3/old
>               da0p3                 ONLINE       0     0     0
> (resilvering)
>             da1p3                   ONLINE       0     0     0
>             da2p3                   ONLINE       0     0     0
>             da3p3                   ONLINE       0     0     0
>             da4p3                   ONLINE       0     0     0
>             da5p3                   ONLINE       0     0     0
>             da6p3                   ONLINE       0     0     0
>             da7p3                   ONLINE       0     0     0
>
> errors: No known data errors
>
> Zpool status later in the process:
> root@backup /tmp]# zpool status
>   pool: tank
>  state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
>         continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>  scan: resilver in progress since Mon May  7 20:18:34 2012
>     833G scanned out of 908G at 19.1M/s, 1h7m to go
>     104G resilvered, 91.70% done
> config:
>
>         NAME                        STATE     READ WRITE CKSUM
>         tank                        DEGRADED     0     0     0
>           raidz2-0                  DEGRADED     0     0     0
>             replacing-0             REMOVED      0     0     0
>               15364271088212071398  REMOVED      0     0     0  was
> /dev/da0p3/old
>               da0p3                 ONLINE       0     0     0
> (resilvering)
>             da1p3                   ONLINE       0     0     0
>             da2p3                   ONLINE       0     0     0
>             da3p3                   ONLINE       0     0     0
>             da4p3                   ONLINE       0     0     0
>             da5p3                   ONLINE       0     0     0
>             da6p3                   ONLINE       0     0     0
>             da7p3                   ONLINE       0     0     0
>
> errors: No known data errors
>
>
> Zpool status after resilvering finished:
> root@backup /]# zpool status
>   pool: tank
>  state: ONLINE
>  scan: resilvered 113G in 14h54m with 0 errors on Tue May  8 11:13:31 2012
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           raidz2-0  ONLINE       0     0     0
>             da0p3   ONLINE       0     0     0
>             da1p3   ONLINE       0     0     0
>             da2p3   ONLINE       0     0     0
>             da3p3   ONLINE       0     0     0
>             da4p3   ONLINE       0     0     0
>             da5p3   ONLINE       0     0     0
>             da6p3   ONLINE       0     0     0
>             da7p3   ONLINE       0     0     0
>
> errors: No known data errors
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@brockmann-consult.de
Internet: http://www.brockmann-consult.de
--------------------------------------------

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FAA14D6.8060302>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation