Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 04 Mar 1998 11:33:57 -0800 (PST)
From:      Simon Shapiro <shimon@simon-shapiro.org>
To:        sbabkin@dcn.att.com
Cc:        wilko@yedi.iaf.nl, tlambert@primenet.com, jdn@acp.qiv.com, blkirk@float.eli.net, hackers@FreeBSD.ORG, grog@lemis.com, karl@mcs.net
Subject:   RE: SCSI Bus redundancy...
Message-ID:  <XFMail.980304113357.shimon@simon-shapiro.org>
In-Reply-To: <C50B6FBA632FD111AF0F0000C0AD71EE4132D3@dcn71.dcn.att.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 04-Mar-98 sbabkin@dcn.att.com wrote:
 ...

>> I wrote a white paper at Oracle some years ago, claiming that
>> databases
>> over a certain size simply cannot be backed up.  I became very
>> UN-popular
>> very quickly.  In you moderate setup, you already see the proof of
>> corectness.
>> 
> IMHO they CAN be backed up. As long as you have enough spare equipment.
> At my previous work in bank where we were paranoid
> about backup and downtime I think I have found a scaleable way
> of doing so. We used it on a relatively small database (~15G) but
> I can't see why it can not be scaled. First, forget about exports. 
> Copy the database files and archived logs. Additionally to the 
> production instance have two more instances. One gets archived logs
> copied and rolled forward immediately. Another one gets archived
> logs copied immediately, but rolled forward only after they aged.
> Copy this third instance to tapes time to time. Copy archived
> logs to tape as fast as they get produced.

Yes.  This scheme works, but you ar not backing up the database, nor is it
scalable.  Operating on a databse (from a backup point of view) makes
arbitrary changes to the files.  If you back them up, you will have an
inconsistent view of the data.

Problem number 2:  If your system's storage I/O is utilized at higher that
50%, you cannot dump the files at all.

> If the production instance crashes, use the second one. If someone
> removed a table and that was more recently than the age of third
> instance, start this instance and get this table from it. If this
> removal was noticed too late, there will be big PITA with restoring
> from tapes. 

What you describe here is application-level mirroring.  It works after a
fasion, but in case the two databases go out of sync, you have no way of
proving which side is correct.  Also, it is not a deterministic system; 
You cannot really commit the master until the slave committed.  This gets
nasty in a hurry.  One database with one mirror may work.  Twenty of them?

> Do offline (better but with downtime) or online backup if you do reset 
> logs. This can be done fast if the I/O subsystem is has enough
> throughput to copy all the disks of database to backup disks in
> parallel, and if the disks can be remapped between machines
> easily. For 4G disks this will be not more than 1 hour.

There are databases which cannot go offline.  Banks have the unique
position where they hold the customer's money behind a locked door :-)
An ISPs Radius database cannot shut down.  A telephone company
authentication server cannot shutdown, A web server should not shut down. 
A mail server can shutdown.  A DNS server cannot shutdown.
You may disagree with some of these classifications, but some of them
cannot be shutdown, and actually cannot get out of sync either.

 ...

> Nope. Databases must have dedicated filesystems. And as long
> there are no files created or removed in these filesystems
> or blocks added or removed to/from any files in them
> (in other words, no change of metadata, what is normal for databases) 
> there is no chance that you will lose your database.
> I know that not everyone follows this rule (looks like everyone
> in AT&T does not do it) but this is their personal problem and
> not the problem of Unix.

I was hoping you will say that :-)
You are talking theory.  I am talking practice.  I have demonstrated cases,
(many times) where you boot a system, mount everything, crash it and upon
re-boot, the filesystem is severely corrupt.
Besides, a living database will change things on disk.  There is no Unix
semantics to pre-alloacte blocks to a file in Unix.  Some of you may
remember the old Oracle ccf utility.  It did exactly that.  Therfore, you
may add a block to file A, which shares superblock sector with file B, have
the system crash three days later, and then fsck will decide that file A
belongs in lost+found, or, less commonly, rearrange it a bit.  If you never
saw it, you simply did not look long enough.

I totally agree that most of it is a filesystem problem, not a Unix problem.
I am working on such filesystem right now.  The problem I have is that the
Unix semantics for creat(2), open(2), etc.  are wrong for such a filesystem.
We can only estimate the degree of noise generated and guess at the
outcome, if I suggested that these system calls need a new definition, or
that new ones are needed.  I'll save that for another day :-)

Please do not misunderstand me;  I like Unix, I love FreeBSD, but perfect
for all occasions neither one is.

Simon


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.980304113357.shimon>