Date: Sun, 10 Jun 2012 12:06:51 -0500 From: Karl Denninger <karl@denninger.net> To: Adam Strohl <adams-freebsd@ateamsystems.com> Cc: freebsd-stable@freebsd.org Subject: Re: Backups with 9-STABLE -- Options? Message-ID: <4FD4D42B.4030705@denninger.net> In-Reply-To: <4FD4BEC1.1020201@ateamsystems.com> References: <4FD3AD35.3090301@denninger.net> <4FD4B9AC.6090604@ateamsystems.com> <4FD4BCA1.2010502@denninger.net> <4FD4BEC1.1020201@ateamsystems.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 6/10/2012 10:35 AM, Adam Strohl wrote: > On 6/10/2012 22:26, Karl Denninger wrote: >> Well, backup with snapshots don't do well EITHER on a database unless >> you can snapshot BOTH the dbms data store(s) and the transaction log >> store(s) /*at the exact same instant*/. If you cannot then you're >> asking for trouble and are likely to get it. But I've dealt with that >> particular "gotcha" problem in a different way for the DBMS I use >> (Postgresql) > > You asked what would happen, not what was the best way to back up a > SQL DB, but your point is valid. > > Snapshots don't fix this issue entirely but drastically reduce the > chance of a 100% broken backup. > > SQL servers should be dumped out to disk (ie; mysql_dump) to avoid > this or have a dedicated backup client (which means you're probably > not using dump anyway). Well, yes and no. I have this "problem" here and solve it in a number of ways, as the risks are multiple. It's not just hardware failure, it's also the risk of a malicious or broken code (or a stupid DBA, such as myself who fat-fingers something) doing something utterly insane like "drop ....." and realizing after hitting the button that wasn't what I wanted -- but having it dutifully mirrored almost-instantly to the slaves. Oops. It's the same issue you have with ordinary files (the "fool" who does "rm -rf /" by accident, as root) and then watches the color drain out of his face. My solution for Postgres is to take base backups as a "special case", keep the WAL log files post the base and back those up and run a mirrored server as a hot standby. That makes the hardware fault scenario a 5 minute deal to recover from while the stupid dba flaw remains recoverable. >> So basically what you're saying is that SU+J leaves you exposed to >> having no real backup option that provides a rational guarantee of the >> ability to restore the backup taken. > > That's a bit of a gloss over on what I said. My point was that you > might end up missing something if its changing at the time the backup > was taken. It really depends on what specifically that server is doing. Well there's a difference there then. I'm trying to get my arms around the risks here. For a very long time before we had snapshots I took live Level 0s (because I had to; I couldn't take the machines involved offline for the multiple hours required to take the dumps) and never got bit by them (I've been running FreeBSD since the 1990s!) and HAVE had to restore "in anger" before. There's a big difference between a user missing a file he had open in emacs at the time the system crashed and having a dump that blows up during restore. One is a risk that one has to take, even with snapshots, to some degree as the snapshot can always happen while a file is open for write and has half its contents in RAM rather than on disk at the instant of the snapshot. I accept that risk and have multiple versions of backups for that reason (the usual daily/weekly/monthly/full hierarchy), which mitigates but does not entirely eliminate it. In my environment the largest risk there are in things like mailbox stores (accessed via IMAP) which are open for write (not just append) at any time the user is doing a move or compact operation. But a dump that blows up restore at the 50% point (for example) and refuses to proceed is a different animal entirely. That leaves you with nothing (or effectively so.) The question is whether or not that latter "oh crap!" scenario becomes exposed if you don't use "L" and do use journaling. > There is also a consistency issue too, using snapshots makes it so > that all the files make sense together, instead of the files getting > more and more recent as the end of the backup block approaches. True. I've shut off journaling for now on my 9-STABLE machines (I'm just starting to roll these into production, now being satisfied with the stability that I'm willing to trust them with data I can't afford to lose); in general FreeBSD has gotten to the point that it doesn't crash on me at all (I have a server at a colo in heavy production use with roughly 18 months of uptime on it, with patches and code changes all being made while in-operation for security patches and such) and as such the crash reboot time is less material than knowing that if the worst-case happens I can get the data back. I'd really like to see a discussion as to the various risk modes here. That is: 1. Is it REALLY safer to have the root filesystem run WITHOUT softupdates? (As was previous default practice) 2. In order of risk of data loss what are the risks and options for SU, SU+J and neither? Neither exposes you to huge time delays on a post-crash boot due to the fsck requirement, but SU can expose you to a failed background fsck and thus get you the huge time delay too. Since SU+J eliminates this the only argument for NOT using it is that it's more dangerous to your data than running without either or with SU alone. Is this true? 3. Is there intent to fix dump -L with SU+J? If so, is there a projection on when? Or is the fix to simply flag it so it doesn't hang the box and say "don't slam door on finger"? If the latter then with SU+J we're back to the 1990s (before snapshots) in terms of risks and backup strategies; I can live with that but I'd like to know what I should be planning for an executing strategy against. -- -- Karl Denninger /The Market Ticker ®/ <http://market-ticker.org> Cuda Systems LLC
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FD4D42B.4030705>