From owner-freebsd-fs@FreeBSD.ORG Sat Oct 27 17:24:22 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4278C16A418 for ; Sat, 27 Oct 2007 17:24:22 +0000 (UTC) (envelope-from scode@hyperion.scode.org) Received: from hyperion.scode.org (cl-1361.ams-04.nl.sixxs.net [IPv6:2001:960:2:550::2]) by mx1.freebsd.org (Postfix) with ESMTP id E3A7013C4A6 for ; Sat, 27 Oct 2007 17:24:21 +0000 (UTC) (envelope-from scode@hyperion.scode.org) Received: by hyperion.scode.org (Postfix, from userid 1001) id 9CDFF23C458; Sat, 27 Oct 2007 19:24:20 +0200 (CEST) Date: Sat, 27 Oct 2007 19:24:20 +0200 From: Peter Schuller To: freebsd-fs@freebsd.org Message-ID: <20071027172420.GA64599@hyperion.scode.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="KsGdsel6WgEHnImy" Content-Disposition: inline User-Agent: Mutt/1.5.16 (2007-06-09) Subject: zfs/arc tuning to keep disks busy X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Oct 2007 17:24:22 -0000 --KsGdsel6WgEHnImy Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello, I am not sure whether this is FreeBSD specific or for ZFS in general, as I no longer have an OpenSolaris machine to try this on. One particular very common case whose performance is not entirely optimal, is simply copying[1] a file from one filesystem to another with the filesystems being on different physical drives. For example, in this particular case[2] I am rsync -avWP:ing data from my current /usr (on ZFS, single disk) onto another ZFS pool on different drives. The behavior is roughly this, in chronological order: (1) Read data from source at expected rate, saturating the disk. (2) After some seconds, switch to writing to destination at expected rate, saturating the disk. (3) Stop writing to destination, and goto (1). Optimally it should of course be reading and writing concurrently to allow for saturation of both source and estination. Without knowing implementation details my interpretation is that there are two symptomes here: (1) Flushing out data to the destination occurrs too late, such that writes block processes for extended periods of time in bursts (seconds), rather than pre-emptively flushing to prevents writes from blocking other than due to truly saturating the destination device(s). (2) Even when data is written (like several tens of megabytes 1-2 seconds), the userspace write does not seem to unblock until all pending writes are complete. The timing of writes seems to coincide with the 5 second commit period, which is expected if the amount of data written in 5 seconds fits in the cache. Reads seem to stop slightly after that; which would be consistent with a decision to not push more data onto the cache, instead waiting on the commit to finish. Based on the above observations, my guess is that (1) all dirty data that is in the cache at the start of the checkpoint process is written out in a single trnsaction group, and (2) data in the cache is never evicted until the entire transaction group is fully committed to disk. This would explain the bahavior, since it would exactly have the effect that writes start to block once there is no more room for cached data - and the room becomes available in a burst at commit time, rather than incrementally as data is written out. Is the above an accurate description of what is going on? If so, I wonder if there is a way to force ZFS to pre-emptively start flushing dirty data out onto disk earlier, presumably when the percentage usage of the cache (relative to the amount allowed to be used for writes) is <=3D 50%. If I had to guess that percentage is more like 80-90% right now. Of course, perhaps the cache does not even work remotely like this, but the behavior seems consistent with what you would get if this were the case. Alternatively can one get ZFS to commit smaller transaction groups, thus allowing data to be evicted more quickly, rather than commit *everything* as a single transaction? Though this would go against the point of minimizing the number of commits. [1] No concurrent I/O; just a plain rsync -avWP on an otherwise idle system. [2] I have observed this overall, not just in this case. --=20 / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org --KsGdsel6WgEHnImy Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHI3RDDNor2+l1i30RAmFaAJ9093f42lvaM1b+alPKN4JTL41IPACeOQeU gmZS6l1grYtW8qUHqyPsCEE= =9RwG -----END PGP SIGNATURE----- --KsGdsel6WgEHnImy--