Date: Tue, 8 May 2012 16:33:14 +0200 From: Michael Gmelin <freebsd@grem.de> To: freebsd-fs@freebsd.org Subject: ZFS resilvering strangles IO Message-ID: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de>
next in thread | raw e-mail | index | archive | help
Hello, I know I'm not the first one to ask this, but I couldn't find a = definitive answers in previous threads. I'm running a FreeBSD 9.0 RELEASE-p1 amd64 system, 8 x 1TB SATA2 drives = (not SAS) and an LSI SAS 9211 controller in IT mode (HBAs, da0-da7). = Zpool version 28, raidz2 container. Machine has 4GB of RAM, therefore = ZFS prefetch is disabled. No manual tuning of ZFS options. Pool contains = about 1TB of data right now (so about 25% full). In normal operations = the pool shows excellent performance. Yesterday I had to replace a = drive, so resilvering started. The resilver process took about 15 hours = - which seems a little bit slow to me, but whatever - what really struck = me was that during resilvering the pool performance got really bad. Read = performance was acceptable, but write performance got down to 500kb/s = (for almost all of the 15 hours). After resilvering finished, system = performance returned to normal. Fortunately this is a backup server and no full backups were scheduled, = so no drama, but I really don't want to have to replace a drive in a = database (or other high IO) server this way (I would have been forced to = offline the drive somehow and migrate data to another server). So the question is, is there anything I can do to improve the situation? = Is this because of memory constraints? Are there any other knobs to = adjust? As far as I know zfs_resilver_delay can't be changed in FreeBSD = yet. I have more drives around, so I could replace another one in the server, = just to replicate the exact situation. Cheers, Michael Disk layout: daXp1128 boot daXp2 16G frebsd-swap daXp3 915G freebsd-zfs Zpool status during resilvering: [root@backup /tmp]# zpool status -v pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool = will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon May 7 20:18:34 2012 249G scanned out of 908G at 18.2M/s, 10h17m to go 31.2G resilvered, 27.46% done config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 replacing-0 REMOVED 0 0 0 15364271088212071398 REMOVED 0 0 0 was /dev/da0p3/old da0p3 ONLINE 0 0 0 (resilvering) da1p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 da3p3 ONLINE 0 0 0 da4p3 ONLINE 0 0 0 da5p3 ONLINE 0 0 0 da6p3 ONLINE 0 0 0 da7p3 ONLINE 0 0 0 errors: No known data errors Zpool status later in the process: root@backup /tmp]# zpool status pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool = will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon May 7 20:18:34 2012 833G scanned out of 908G at 19.1M/s, 1h7m to go 104G resilvered, 91.70% done config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 replacing-0 REMOVED 0 0 0 15364271088212071398 REMOVED 0 0 0 was /dev/da0p3/old da0p3 ONLINE 0 0 0 (resilvering) da1p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 da3p3 ONLINE 0 0 0 da4p3 ONLINE 0 0 0 da5p3 ONLINE 0 0 0 da6p3 ONLINE 0 0 0 da7p3 ONLINE 0 0 0 errors: No known data errors Zpool status after resilvering finished: root@backup /]# zpool status pool: tank state: ONLINE scan: resilvered 113G in 14h54m with 0 errors on Tue May 8 11:13:31 = 2012 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da0p3 ONLINE 0 0 0 da1p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 da3p3 ONLINE 0 0 0 da4p3 ONLINE 0 0 0 da5p3 ONLINE 0 0 0 da6p3 ONLINE 0 0 0 da7p3 ONLINE 0 0 0 errors: No known data errors
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?73F8D020-04F3-44B2-97D4-F08E3B253C32>