From owner-freebsd-stable@FreeBSD.ORG  Sun Jun 10 17:06:53 2012
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 31D55106564A
	for <freebsd-stable@freebsd.org>; Sun, 10 Jun 2012 17:06:53 +0000 (UTC)
	(envelope-from karl@denninger.net)
Received: from FS.denninger.net (wsip-70-169-168-7.pn.at.cox.net
	[70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id D31F28FC14
	for <freebsd-stable@freebsd.org>; Sun, 10 Jun 2012 17:06:52 +0000 (UTC)
Received: from [127.0.0.1] (localhost [127.0.0.1])
	by FS.denninger.net (8.14.4/8.13.1) with ESMTP id q5AH6prt058927
	for <freebsd-stable@freebsd.org>; Sun, 10 Jun 2012 12:06:52 -0500 (CDT)
	(envelope-from karl@denninger.net)
Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL);
	Sun Jun 10 12:06:52 2012
Message-ID: <4FD4D42B.4030705@denninger.net>
Date: Sun, 10 Jun 2012 12:06:51 -0500
From: Karl Denninger <karl@denninger.net>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:12.0) Gecko/20120428 Thunderbird/12.0.1
MIME-Version: 1.0
To: Adam Strohl <adams-freebsd@ateamsystems.com>
References: <4FD3AD35.3090301@denninger.net>
	<4FD4B9AC.6090604@ateamsystems.com>
	<4FD4BCA1.2010502@denninger.net>
	<4FD4BEC1.1020201@ateamsystems.com>
In-Reply-To: <4FD4BEC1.1020201@ateamsystems.com>
X-Enigmail-Version: 1.4.2
X-Antivirus: avast! (VPS 120610-0, 06/10/2012), Outbound message
X-Antivirus-Status: Clean
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-stable@freebsd.org
Subject: Re: Backups with 9-STABLE -- Options?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 10 Jun 2012 17:06:53 -0000

On 6/10/2012 10:35 AM, Adam Strohl wrote:
> On 6/10/2012 22:26, Karl Denninger wrote:
>> Well, backup with snapshots don't do well EITHER on a database unless
>> you can snapshot BOTH the dbms data store(s) and the transaction log
>> store(s) /*at the exact same instant*/.  If you cannot then you're
>> asking for trouble and are likely to get it.  But I've dealt with that
>> particular "gotcha" problem in a different way for the DBMS I use
>> (Postgresql)
>
> You asked what would happen, not what was the best way to back up a
> SQL DB, but your point is valid.
>
> Snapshots don't fix this issue entirely but drastically reduce the
> chance of a 100% broken backup.
>
> SQL servers should be dumped out to disk (ie; mysql_dump) to avoid
> this or have a dedicated backup client (which means you're probably
> not using dump anyway).
Well, yes and no.

I have this "problem" here and solve it in a number of ways, as the
risks are multiple.  It's not just hardware failure, it's also the risk
of a malicious or broken code (or a stupid DBA, such as myself who
fat-fingers something) doing something utterly insane like "drop ....."
and realizing after hitting the button that wasn't what I wanted -- but
having it dutifully mirrored almost-instantly to the slaves.  Oops. 
It's the same issue you have with ordinary files (the "fool" who does
"rm -rf /" by accident, as root) and then watches the color drain out of
his face.

My solution for Postgres is to take base backups as a "special case",
keep the WAL log files post the base and back those up and run a
mirrored server as a hot standby.  That makes the hardware fault
scenario a 5 minute deal to recover from while the stupid dba flaw
remains recoverable.

>> So basically what you're saying is that SU+J leaves you exposed to
>> having no real backup option that provides a rational guarantee of the
>> ability to restore the backup taken.
>
> That's a bit of a gloss over on what I said.  My point was that you
> might end up missing something if its changing at the time the backup
> was taken.  It really depends on what specifically that server is doing.
Well there's a difference there then. 

I'm trying to get my arms around the risks here.  For a very long time
before we had snapshots I took live Level 0s (because I had to; I
couldn't take the machines involved offline for the multiple hours
required to take the dumps) and never got bit by them (I've been running
FreeBSD since the 1990s!) and HAVE had to restore "in anger" before. 
There's a big difference between a user missing a file he had open in
emacs at the time the system crashed and having a dump that blows up
during restore.  One is a risk that one has to take, even with
snapshots, to some degree as the snapshot can always happen while a file
is open for write and has half its contents in RAM rather than on disk
at the instant of the snapshot.  I accept that risk and have multiple
versions of backups for that reason (the usual daily/weekly/monthly/full
hierarchy), which mitigates but does not entirely eliminate it.  In my
environment the largest risk there are in things like mailbox stores
(accessed via IMAP) which are open for write (not just append) at any
time the user is doing a move or compact operation.

But a dump that blows up restore at the 50% point (for example) and
refuses to proceed is a different animal entirely.  That leaves you with
nothing (or effectively so.)  The question is whether or not that latter
"oh crap!" scenario becomes exposed if you don't use "L" and do use
journaling.

> There is also a consistency issue too, using snapshots makes it so
> that all the files make sense together, instead of the files getting
> more and more recent as the end of the backup block approaches.

True.

I've shut off journaling for now on my 9-STABLE machines (I'm just
starting to roll these into production, now being satisfied with the
stability that I'm willing to trust them with data I can't afford to
lose); in general FreeBSD has gotten to the point that it doesn't crash
on me at all (I have a server at a colo in heavy production use with
roughly 18 months of uptime on it, with patches and code changes all
being made while in-operation for security patches and such) and as such
the crash reboot time is less material than knowing that if the
worst-case happens I can get the data back.

I'd really like to see a discussion as to the various risk modes here. 
That is:

1. Is it REALLY safer to have the root filesystem run WITHOUT
softupdates?  (As was previous default practice)

2. In order of risk of data loss what are the risks and options for SU,
SU+J and neither?  Neither exposes you to huge time delays on a
post-crash boot due to the fsck requirement, but SU can expose you to a
failed background fsck and thus get you the huge time delay too.  Since
SU+J eliminates this the only argument for NOT using it is that it's
more dangerous to your data than running without either or with SU
alone.  Is this true?

3. Is there intent to fix dump -L with SU+J?  If so, is there a
projection on when?  Or is the fix to simply flag it so it doesn't hang
the box and say "don't slam door on finger"?  If the latter then with
SU+J we're back to the 1990s (before snapshots) in terms of risks and
backup strategies; I can live with that but I'd like to know what I
should be planning for an executing strategy against.

-- 
-- Karl Denninger
/The Market Ticker Ž/ <http://market-ticker.org>
Cuda Systems LLC