From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 06:21:00 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E0AB2135 for ; Wed, 20 Feb 2013 06:21:00 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id 4328032B for ; Wed, 20 Feb 2013 06:20:59 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.5/8.14.5) with ESMTP id r1K6KkqH077549 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 20 Feb 2013 08:20:46 +0200 (EET) (envelope-from daniel@digsys.bg) Message-ID: <51246B3E.1030604@digsys.bg> Date: Wed, 20 Feb 2013 08:20:46 +0200 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.12) Gecko/20130125 Thunderbird/10.0.12 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Improving ZFS performance for large directories References: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com> <20130201192416.GA76461@server.rulingia.com> <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com> In-Reply-To: <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 06:21:00 -0000 On 19.02.13 22:10, Kevin Day wrote: > Thinking I'd make the primary cache metadata only, and the secondary cache "all" would improve things, I wiped the device (SATA secure erase to make sure) and tried again. This was much worse, i'm guessing because there was some amount of real file data being looked at frequently, the SSD was basically getting hammered for read access with 100% utilization, and things were far slower. This sounds weird. What kind is your L2ARC device, what performance and how is it connected? Typical today's SSDs have read performance of over 500 MB/s if you connect it at SATA3. You could double that with two drives etc. For L2ARC you don't really need write-optimized SSD, because ZFS rate-limits the writes to L2ARC. It is best to connect these on the motherboard's SATA ports. Is the SSD used only for L2ARC? If it is writing too much, that might make it slow at intensive usage, especially if it is not write-optimised (typical "pro" or "enterprise"). Also, you may wish to experiment with the sector size (alignment) when you add it to the pool. The ashift parameter is per-vdev in ZFS and cache and log devices are separate vdevs. Therefore, using gnop to make it appear as 4K or 8K sector drive might improve things. You have to experiment here... > > ARC Size: 92.50% 28.25 GiB > Target Size: (Adaptive) 92.50% 28.25 GiB > Min Size (Hard Limit): 25.00% 7.64 GiB > Max Size (High Water): 4:1 30.54 GiB But this looks strange. Have you increased vfs.zfs.arc_max and vfs.zfs.arc_meta_limit? For an 72GB system, I have this in /boot/loader.conf vfs.zfs.arc_max=64424509440 vfs.zfs.arc_meta_limit=51539607552 I found out that increasing vfs.zfs.arc_meta_limit helped most (my issues were with huge deduped datasets with dedup ratio of around 10 and many snapshots). Even if you intend to keep ARC small (bad idea, as it is being used to track L2ARC as well), you need to increase vfs.zfs.arc_meta_limit, perhaps up to vfs.zfs.arc_max. If you do that, then perhaps primarycache=metadata might even work better. Daniel