From owner-freebsd-hackers@FreeBSD.ORG Tue Oct 16 11:54:23 2007 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 047FD16A41A for ; Tue, 16 Oct 2007 11:54:23 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from ns.trinitel.com (186.161.36.72.static.reverse.ltdomains.com [72.36.161.186]) by mx1.freebsd.org (Postfix) with ESMTP id 025F613C45B for ; Tue, 16 Oct 2007 11:54:22 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from proton.storspeed.com (209-163-168-124.static.twtelecom.net [209.163.168.124]) (authenticated bits=0) by ns.trinitel.com (8.14.1/8.14.1) with ESMTP id l9GBsJxp055823 for ; Tue, 16 Oct 2007 06:54:21 -0500 (CDT) (envelope-from anderson@freebsd.org) Message-ID: <4714A663.5010800@freebsd.org> Date: Tue, 16 Oct 2007 06:54:11 -0500 From: Eric Anderson User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: freebsd-hackers@freebsd.org References: <20071016113046.GA35318@eos.sc1.parodius.com> In-Reply-To: <20071016113046.GA35318@eos.sc1.parodius.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=failed version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on ns.trinitel.com Subject: Re: Filesystem snapshots dog slow X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Oct 2007 11:54:23 -0000 Jeremy Chadwick wrote: > Since the snapshot code (e.g. mksnap_ffs(8) and friends) was introduced, > dump(8) was modified to nag you if you didn't use the -L argument. "Um, > okay, I'd better use -L" is what came out of my mouth, and I'm sure a > lot of other administrators' when they saw this message. > > But it seems the making a snapshot is an incredibly slow/intensive task. > The documentation I've read indicates that making a snapshot "is > incredibly fast" -- based on my experiences, it isn't. At least it's no > where near as fast as, say, a Netapp filer. The problem is the way the snapshots work in UFS2. It has to do a lot of work to create that snapshot, and the amount of work it does goes up with the amount space you have available (because it relates to the number of cylinder groups you have). The UFS2 snapshot and the WAFL (NetApp's file system) snapshot are *completely* different, and should not be compared in this way. The functionality is (in the end) the same, but otherwise, they are different. > I've found 3 threads (dating 2003, 2005, and 2007) about this problem: > > http://lists.freebsd.org/pipermail/freebsd-current/2003-August/009135.html > http://lists.freebsd.org/pipermail/freebsd-fs/2005-July/001216.html > http://lists.freebsd.org/pipermail/freebsd-stable/2007-January/031882.html Only three threads? :) There's probably hundreds like them.. > This issue is still present on RELENG_7, and I can confirm it on > multiple machines (some running *completely* different hardware than > others). It's a UFS2 problem, and the docs that say 'incredibly fast' are actually referring to small filesystems, that are not busy (with writes). Maybe the docs should be clarified for now. You can submit patches to the docs you found that say that if you'd like to help out. > osiris# df -ki /disk2 > Filesystem 1024-blocks Used Avail Capacity iused ifree %iused Mounted on > /dev/ad6s1d 236511738 4 217590796 0% 2 30570492 0% /disk2 > > osiris# time mksnap_ffs /disk2 /disk2/mysnapshot > 0.000u 1.012s 5:12.23 0.3% 5+1149k 7803+18819io 0pf+0w > > While mksnap_ffs runs, the process remains in wdrain state. gstat(8) > shows immense disk I/O. ms/r occasionally jumps up to 1100 or higher, > but usually hovers around 40-60. [..snip..] > The time doubled. This isn't good. > > Disks are getting larger, filesystems growing, people storing more data. > Hitachi, for example, has guaranteed 4TB disks by the end of 2011. If > this problem has sat idle for at least 4 years already, we'll be in a > lot of trouble come 2011. And let's not forget that every piece of > FreeBSD documentation tells admins to "use dump, it's the best!". This > issue is a good reason to consider using tools like rsync or tar > instead. :-( I recommend reading up a little bit on how the snapshots for UFS2 work. It will give you a good understanding of what the issue is. Essentially, your disk is hammered making copies of all the cylinder groups, skipping those that are 'busy', and coming back to them later. On a 200Gb disk, you could have 1000 cylinder groups, each having to be locked, copied, unlocked, and then checked again for any subsequent changes. The stalls you see are when there are lock contentions, or disk IO issues. On a single disk (like your setup above), your snapshots will take forever since there is very little random IO performance available to you. > I will gladly work with anyone who wishes to tackle this, either by > providing hardware (MB/disks/etc.) for free, or by giving the individual > access to a box that has serial console + a serial debugger available. FreeBSD 7 includes ZFS. Have you thought about using it? The problem isn't that developers don't know the problem exists, or that they don't have hardware, or a serial console access to a system. The problem is that there are only so many developers, and so much time, and this is a big mountain to climb. It's hard to find an experienced person to do the work (for free), when they could be doing anything else they wish. I think, that in the end, for some of these aging issues to get resolved, there needs to be another bounty put out on it. I think rsync.net might even have one started for this issue already - you might think about adding to the bounty, or officially offering hardware through there. Eric