Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 08 Feb 2002 10:55:42 -0800 (PST)
From:      "Duane H. Hesser" <dhh@androcles.com>
To:        Markus Stumpf <maex-freebsd-hackers@Space.Net>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   RE: dump(8) race conditions?
Message-ID:  <200202081855.g18ItgK66957@androcles.com>
In-Reply-To: <20020207185052.A87994@Space.Net>

next in thread | previous in thread | raw e-mail | index | archive | help

On 07-Feb-02 Markus Stumpf wrote:
> We use amanda and dump for backups. Some hosts have rather busy disks
> even during non prime time hours when backup is run.
> 
> From time to time amanda reports dump(8) errors like the following:
> 
> sendbackup: info end
>|   DUMP: Date of this level 5 dump: Wed Feb  6 01:53:12 2002
>|   DUMP: Date of last level 4 dump: Mon Feb  4 02:31:40 2002
>|   DUMP: Dumping /dev/rda4s1e (/share/turing/disk07) to standard output
>|   DUMP: mapping (Pass I) [regular files]
>|   DUMP: mapping (Pass II) [directories]
>|   DUMP: estimated 2423080 tape blocks.
>|   DUMP: dumping (Pass III) [directories]
>|   DUMP: dumping (Pass IV) [regular files]
>|   DUMP: 14.72% done, finished in 0:28
>|   DUMP: 33.78% done, finished in 0:19
>|   DUMP: 52.84% done, finished in 0:13
>|   DUMP: 71.65% done, finished in 0:07
> ?   DUMP: read error from /dev/rda4s1e: Invalid argument: [block -410921522]: count=3072
> ?   DUMP:   DUMP: read error from /dev/rda4s1e: Invalid argument: [sector -410921522]: count=512
> ?   DUMP: read error from /dev/rda4s1e: Invalid argument: [block -410921532]: count=5120
> ?   DUMP: read error from /dev/rda4s1e: Invalid argument: [block -1001057530]: count=1024
> [ ... ]
> 
> First time we saw this we took down the machine to single user, unmounted
> the disk and fsck'd it. No errors where found and the next backups (even
> level 0) made it without errors.
> 
> As we where still suspicious as to what might be the reason for this really
> sporadic error messages from different machines and different disks I
> look through the source of dump.
> 
> If I do interpret the code correctly dump caches directory inode lists.
> Now, if during a dump and after caching the inode infos files get
> removed/shrunk dump has a "dirty" cache and tries to access blocks
> that are not/no longer allocated and the result are the above errors.
> 
> Am I right with my interpretation or are this really hardware errors?
> 

You are essentially correct, and your message is probably a good
reminder for those of us who routinely use dump on active filesystems.

Dump is a two pass system, and any activity which modifies inodes
between the first pass and the second is likely to cause problems,
either for dump or for restore.   It has always been thus, even as
far back as V7 (and probably v6).

Dumps which report errors such as the ones you mention are likely
to cause difficulities on restore.  Sometimes they will be completely
unreadable; sometimes partial or interactive restores will succeed
(for some files).  It is even possible that the dump may be completely
restorable, but with corrupted files.  On the other hand, dumps
which *don't* report errors can still be subtly corrupted.  Elizabeth
Zwicky, in a ten year old paper entitled "Torture Testing Backup
and Archive Programs', discusses a couple of situations where this
can occur.

It is operationally (and sometimes "politically") difficult to dump
on unmounted filesystems, so most of us (I think) "bite the bullet"
and try to dump at times when the subject filesystem is likely to
be quiescent.  It may also be smart to dump more frequently than
otherwise called for, just to increase the odds.

Your message reminds us of the risks we take.

It is worth noting that "activity" can be occurring on a filestystem
and dump will succeed if there is no activity which alters inodes
significantly between passes.

--------------
Duane H. Hesser
dhh@androcles.com

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200202081855.g18ItgK66957>