From owner-freebsd-fs@FreeBSD.ORG Tue May 8 14:40:00 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3564106566B for ; Tue, 8 May 2012 14:40:00 +0000 (UTC) (envelope-from freebsd@grem.de) Received: from mail.grem.de (outcast.grem.de [213.239.217.27]) by mx1.freebsd.org (Postfix) with SMTP id D767C8FC15 for ; Tue, 8 May 2012 14:39:59 +0000 (UTC) Received: (qmail 92372 invoked by uid 89); 8 May 2012 14:33:16 -0000 Received: from unknown (HELO ?172.20.10.3?) (mg@grem.de@109.43.0.73) by mail.grem.de with ESMTPA; 8 May 2012 14:33:16 -0000 From: Michael Gmelin Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Tue, 8 May 2012 16:33:14 +0200 Message-Id: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de> To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Subject: ZFS resilvering strangles IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 14:40:00 -0000 Hello, I know I'm not the first one to ask this, but I couldn't find a = definitive answers in previous threads. I'm running a FreeBSD 9.0 RELEASE-p1 amd64 system, 8 x 1TB SATA2 drives = (not SAS) and an LSI SAS 9211 controller in IT mode (HBAs, da0-da7). = Zpool version 28, raidz2 container. Machine has 4GB of RAM, therefore = ZFS prefetch is disabled. No manual tuning of ZFS options. Pool contains = about 1TB of data right now (so about 25% full). In normal operations = the pool shows excellent performance. Yesterday I had to replace a = drive, so resilvering started. The resilver process took about 15 hours = - which seems a little bit slow to me, but whatever - what really struck = me was that during resilvering the pool performance got really bad. Read = performance was acceptable, but write performance got down to 500kb/s = (for almost all of the 15 hours). After resilvering finished, system = performance returned to normal. Fortunately this is a backup server and no full backups were scheduled, = so no drama, but I really don't want to have to replace a drive in a = database (or other high IO) server this way (I would have been forced to = offline the drive somehow and migrate data to another server). So the question is, is there anything I can do to improve the situation? = Is this because of memory constraints? Are there any other knobs to = adjust? As far as I know zfs_resilver_delay can't be changed in FreeBSD = yet. I have more drives around, so I could replace another one in the server, = just to replicate the exact situation. Cheers, Michael Disk layout: daXp1128 boot daXp2 16G frebsd-swap daXp3 915G freebsd-zfs Zpool status during resilvering: [root@backup /tmp]# zpool status -v pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool = will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon May 7 20:18:34 2012 249G scanned out of 908G at 18.2M/s, 10h17m to go 31.2G resilvered, 27.46% done config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 replacing-0 REMOVED 0 0 0 15364271088212071398 REMOVED 0 0 0 was /dev/da0p3/old da0p3 ONLINE 0 0 0 (resilvering) da1p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 da3p3 ONLINE 0 0 0 da4p3 ONLINE 0 0 0 da5p3 ONLINE 0 0 0 da6p3 ONLINE 0 0 0 da7p3 ONLINE 0 0 0 errors: No known data errors Zpool status later in the process: root@backup /tmp]# zpool status pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool = will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon May 7 20:18:34 2012 833G scanned out of 908G at 19.1M/s, 1h7m to go 104G resilvered, 91.70% done config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 replacing-0 REMOVED 0 0 0 15364271088212071398 REMOVED 0 0 0 was /dev/da0p3/old da0p3 ONLINE 0 0 0 (resilvering) da1p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 da3p3 ONLINE 0 0 0 da4p3 ONLINE 0 0 0 da5p3 ONLINE 0 0 0 da6p3 ONLINE 0 0 0 da7p3 ONLINE 0 0 0 errors: No known data errors Zpool status after resilvering finished: root@backup /]# zpool status pool: tank state: ONLINE scan: resilvered 113G in 14h54m with 0 errors on Tue May 8 11:13:31 = 2012 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da0p3 ONLINE 0 0 0 da1p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 da3p3 ONLINE 0 0 0 da4p3 ONLINE 0 0 0 da5p3 ONLINE 0 0 0 da6p3 ONLINE 0 0 0 da7p3 ONLINE 0 0 0 errors: No known data errors