Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 01 Sep 2008 16:51:44 -0700
From:      "Kevin Oberman" <oberman@es.net>
To:        Jeremy Chadwick <koitsu@FreeBSD.org>
Cc:        Derek =?iso-8859-1?B?S3VsacU/c2tp?= <takeda@takeda.tk>, Michael <freebsdports@bindone.de>, freebsd-stable@freebsd.org
Subject:   Re: bin/121684: : dump(8) frequently hangs 
Message-ID:  <20080901235144.4B53B4501A@ptavv.es.net>
In-Reply-To: Your message of "Mon, 01 Sep 2008 14:38:56 PDT." <20080901213856.GA17155@icarus.home.lan> 

next in thread | previous in thread | raw e-mail | index | archive | help
--==_Exmh_1220313104_45719P
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

> Date: Mon, 1 Sep 2008 14:38:56 -0700
> From: Jeremy Chadwick <koitsu@FreeBSD.org>
> 
> On Mon, Sep 01, 2008 at 09:00:12AM -0700, Kevin Oberman wrote:
> > > Date: Mon, 01 Sep 2008 09:36:11 -0400
> > > From: Mike Tancsa <mike@sentex.net>
> > > Sender: owner-freebsd-stable@freebsd.org
> > > 
> > > At 05:07 AM 9/1/2008, Derek Kuli??ski wrote:
> > > 
> > > >Now I'm honestly a bit scared about it (even if it will be fixed
> > > >before 7.1, I'm not sure I'll hurry with the update).
> > > 
> > > There have been a number of commits to releng_7 
> > > that fixed dump issues for me.  A box that used 
> > > to regularly exhibit hung dump processes have 
> > > been working fine since April.  e.g. a kernel from
> > > 7.0-STABLE FreeBSD 7.0-STABLE #4: Wed Apr 30
> > > 
> > > does weekly level 0 dumps and daily differential 
> > > dumps on the file systems below without issue
> > > % df -i
> > > Filesystem    1K-blocks      Used     Avail 
> > > Capacity iused    ifree %iused  Mounted on
> > > /dev/twed0s1a   2026030    284346   1579602    15%    2937   279685    1%   /
> > > devfs                 1         1         0 
> > > 100%       0        0  100%   /dev
> > > /dev/twed0s1d   5077038    575828   4095048 
> > > 12%    1197   658257    0%   /tmp
> > > /dev/twed0s1e  20308398  11072840   7610888 
> > > 59% 1065406  1572416   40%   /usr
> > > /dev/twed0s1f  20308398  13275050   5408678 
> > > 71%   13750  2624072    1%   /var
> > > /dev/twed0s1g 246875258 
> > > 186393906  40731332    82% 9118036 22794922   29%   /zoo
> > > 
> > > However, you should test and make sure it works for you.
> > 
> > I have a 7-Stable system which has not been able to successfully dump(8)
> > for about 2 months. Since it contains almost no important data that is
> > subject to change, it's not too big a deal, but I worry that other
> > systems might start showing the same problems.
> > 
> > I have no idea why it's failing, though, and I have spent little effort
> > in troubleshooting it. I'm running 3 week old stable and I'll be
> > updating to today's RELENG_7 later today.
> 
> Can someone explain what "dump frequently hangs" actually means?
> 
> Does it lock up the entire machine indefinitely (and if so, how long did
> you wait for it to (hopefully) recover)?
> 
> Or does it more or less "deadlock" the machine, making it generally
> unusable, until the dump is completely finished?
> 
> If the latter, I can confirm this problem -- which is why we moved all
> of our production systems away from using dump on UFS2 to simply using
> rsnapshot[1].  I'll try to find the thread (it was a year or so ago)
> where a developer told me more or less what was going on.  The problem
> was that UFS2 snapshot generation, over time, becomes slower and slower
> to generate (this is what dump does on UFS2 systems, with or without the
> -L flag), and is a known design issue.
> 
> If anything, this issue makes ZFS incredibly important with regards to
> -STABLE, where its snapshot generation for backups does not behave this
> was; fast and very easily managable.
> 
> [1]: rsync is great for backups, and very fast, but there's the issue of
> modifying atimes.  I committed a patch to ports/net/rsync which adds an
> --atimes flag, except its behaviour is not what you'd expect: the file
> which was copied, at the destination, has the correct atime (of the
> source), but the source itself ends up getting its atime modified, so
> you're essentially destroying the atime data on the source.
> 
> This is a problem when it comes to programs which use atime to discern
> things, such as classic UNIX mailboxes/mbox.  "Um, why does mutt say I
> don't have any new mail when I do??" In our case, the only person using
> classic UNIX mboxes with a mail client local to the machine was me, so I
> ended up migrating my procmail rules and data to Maildir using mutt,
> solving the problem entirely.
> 
> -- 
> | Jeremy Chadwick                                jdc at parodius.com |
> | Parodius Networking                       http://www.parodius.com/ |
> | UNIX Systems Administrator                  Mountain View, CA, USA |
> | Making life hard for others since 1977.              PGP: 4BD6C0CB |
> 

In my case the dump deadlocks, but the system is unaffected. The dump
just freezes. I need to look at it more closely, but I simply have not
had time. I don't even recall what state it is in when frozen, but it
can be 'kill -9'ed. The problem has persisted through at least one system
upgrade.

I'll try to track down more tomorrow.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: oberman@es.net			Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751

--==_Exmh_1220313104_45719P
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (FreeBSD)
Comment: Exmh version 2.5 06/03/2002

iD8DBQFIvIAQkn3rs5h7N1ERAhzxAJ9pu9Gs5lhOhFq6ctb9lziLcPU2qgCgkJZT
HQZDqFDz+ZrfGJ8aLRfUnMU=
=uhMF
-----END PGP SIGNATURE-----

--==_Exmh_1220313104_45719P--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080901235144.4B53B4501A>