From owner-freebsd-stable@FreeBSD.ORG Wed Oct 15 08:35:39 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F2E911065687 for ; Wed, 15 Oct 2008 08:35:39 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA08.emeryville.ca.mail.comcast.net (qmta08.emeryville.ca.mail.comcast.net [76.96.30.80]) by mx1.freebsd.org (Postfix) with ESMTP id D8D8D8FC08 for ; Wed, 15 Oct 2008 08:35:39 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from OMTA09.emeryville.ca.mail.comcast.net ([76.96.30.20]) by QMTA08.emeryville.ca.mail.comcast.net with comcast id SwU91a0020S2fkCA8wbfZL; Wed, 15 Oct 2008 08:35:39 +0000 Received: from koitsu.dyndns.org ([69.181.141.110]) by OMTA09.emeryville.ca.mail.comcast.net with comcast id Swbe1a0042P6wsM8VwbeLl; Wed, 15 Oct 2008 08:35:39 +0000 X-Authority-Analysis: v=1.0 c=1 a=6I5d2MoRAAAA:8 a=QycZ5dHgAAAA:8 a=RdyvXoCv8TovuBqPJmcA:9 a=NC7AjwH96jTRjyzlzJQA:7 a=ScqFegT4sPMvYcDnzVN9p2497m4A:4 a=EoioJ0NPDVgA:10 a=LY0hPdMaydYA:10 Received: by icarus.home.lan (Postfix, from userid 1000) id 99D07C9419; Wed, 15 Oct 2008 01:35:38 -0700 (PDT) Date: Wed, 15 Oct 2008 01:35:38 -0700 From: Jeremy Chadwick To: Peter Jeremy Message-ID: <20081015083538.GA72190@icarus.home.lan> References: <20081015082428.GE26536@server.vk2pj.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20081015082428.GE26536@server.vk2pj.dyndns.org> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: freebsd-stable@freebsd.org Subject: Re: System hanging during dump X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Oct 2008 08:35:40 -0000 On Wed, Oct 15, 2008 at 07:24:28PM +1100, Peter Jeremy wrote: > Last night, I attempted a full, compressed backup of my 181GB /home > (on a PATA disk) to a remote system. The backup started at 2159 and > everything appeared normal until about 0040 when the system became > non-responsive and this lasted until the dump completed at 1033. This > is the first full backup of /home I've made for several years (due to > lack of space). > > I noticed the non-responsiveness at about 0500 when: > - The dump, gzip and fifo pipeline were running normally. > - A 'systat -v' I had started was running normally (though it > reported an excessive number of 'D' processes). Other values > all appeared normal. > - No response to return key at a zsh prompt > - No response to up/down arrows in mutt > [above all done in pre-existing ssh sessions from another host] > - telnet to port 22 connected but didn't produce a banner. > > The duration above is based on system logs - which show nothing > happened during this period. At the end, there were various anomolous > entries: > Oct 15 10:33:27 server ntpd[750]: too many recvbufs allocated (40) > Oct 15 10:33:30 server sshd[947]: error: accept: Software caused connection abort > Oct 15 10:33:34 server kernel: TCP: [192.168.123.123]:59516 to [192.168.123.200]:25 tcpflags 0x4; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored > > Possibly useful information: > The dump pipeline was: > dump -uaL0 -C 32 -f - /home | reblock | gzip [stdout connected to socket > to remote server] > 'reblock' is basically a 200MB FIFO I wrote to desynchronise the (often > I/O bound) dump from the CPU-bound gzip. > > server% uname -a > FreeBSD server.vk2pj.dyndns.org 7.0-STABLE FreeBSD 7.0-STABLE #18: Sun May 18 15:02:39 EST 2008 root@server.vk2pj.dyndns.org:/var/obj/k7/usr/src/sys/server i386 > server% df -ki > Filesystem 1024-blocks Used Avail Capacity iused ifree %iused Mounted on > /dev/ad0s3d 204648864 181911710 6365246 97% 1703016 11353942 13% /home > > About the only think that happened at around this time was nightly > updates. These start at 0005, fetching CTM cvs-cur updates, applying > them to /home/ncvs, then cvs updating /home/ports. Looking at > timestamps, /home/ports/graphics/icod/CVS/Entries was updated at > 0042 and /home/ports/graphics/imlib2_loaders/CVS/Entries (the next > entry) was updated at 1034. > > Whilst /home is fairly full, I can't see that the snapshot meta and > rollback data would have occupied the 20GB free (and no 'out-of-space' > messages were generated). Is there some limit on the number of inodes > that can be updated whilst a snapshot exists? > > Has anyone else seen anything similar? It's a known problem documented in my Wiki -- see "dump/restore". Note the part about UFS2 snapshot generation. I'm almost certain this is what you're describing. http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues This is one of the many reasons why I moved our backup infrastructure over to use rsnapshot/rsync, despite the atime modification problem. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |