From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 17 11:33:41 2007 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E50316A417 for ; Wed, 17 Oct 2007 11:33:41 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from ns.trinitel.com (186.161.36.72.static.reverse.ltdomains.com [72.36.161.186]) by mx1.freebsd.org (Postfix) with ESMTP id 07CC213C447 for ; Wed, 17 Oct 2007 11:33:40 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from proton.storspeed.com (209-163-168-124.static.twtelecom.net [209.163.168.124]) (authenticated bits=0) by ns.trinitel.com (8.14.1/8.14.1) with ESMTP id l9HBXdjF071289; Wed, 17 Oct 2007 06:33:40 -0500 (CDT) (envelope-from anderson@freebsd.org) Message-ID: <4715F30A.5080102@freebsd.org> Date: Wed, 17 Oct 2007 06:33:30 -0500 From: Eric Anderson User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Kostik Belousov References: <20071016113046.GA35318@eos.sc1.parodius.com> <4714A663.5010800@freebsd.org> <20071017100003.GK1191@turion.vk2pj.dyndns.org> <20071017101400.GH6511@deviant.kiev.zoral.com.ua> In-Reply-To: <20071017101400.GH6511@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on ns.trinitel.com Cc: freebsd-hackers@freebsd.org Subject: Re: Filesystem snapshots dog slow X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Oct 2007 11:33:41 -0000 Kostik Belousov wrote: > On Wed, Oct 17, 2007 at 08:00:03PM +1000, Peter Jeremy wrote: >> On 2007-Oct-16 06:54:11 -0500, Eric Anderson wrote: >>> will give you a good understanding of what the issue is. Essentially, your >>> disk is hammered making copies of all the cylinder groups, skipping those >>> that are 'busy', and coming back to them later. On a 200Gb disk, you could >>> have 1000 cylinder groups, each having to be locked, copied, unlocked, and >>> then checked again for any subsequent changes. The stalls you see are when >>> there are lock contentions, or disk IO issues. On a single disk (like your >>> setup above), your snapshots will take forever since there is very little >>> random IO performance available to you. >> That said, there is a fair amount of scope available for improving >> both the creation and deletion performance. >> >> Firstly, it's not clear to me that having more than a few hundred CGs >> has any real benefits. There was a massive gain in moving from >> (effectively) a single CG in pre-FFS to a few dozen CGs in FFS as it >> was first introduced. Modern disks are roughly 5 orders of magnitude >> larger and voice-coil actuators mean that seek times are almost >> independent of distance. CG sizes are currently limited by the >> requirement that the cylinder group (including cylinder group maps) >> must fit into a single FS block. Removing this restriction would >> allow CGs to be much larger. >> >> Secondly, all the I/O during both snapshot creation and deletion is >> in FS-block size chunks. Increasing the I/O size would significantly >> increase the I/O performance. Whilst it doesn't make sense to read >> more than you need, there still appears to be plenty of scope to >> combine writes. >> >> Between these two items, I would expect potential performance gains >> of at least 20:1. >> >> Note that I'm not suggesting that either of these items is trivial. > This is, unfortunately, quite true. Allowing non-atomic updates of the > cg block means a lot of complications in the softupdate code, IMHO. I agree with all the above. I think it has not been done because of exactly what Kostik says. I really think that the CG max size is *way* too small now, and should be about 10-50 times larger, but performance tests would need to be run. Eric