From owner-freebsd-stable@FreeBSD.ORG Tue Jan 15 12:34:24 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5882A16A418 for ; Tue, 15 Jan 2008 12:34:24 +0000 (UTC) (envelope-from jdc@parodius.com) Received: from mx01.sc1.parodius.com (mx01.sc1.parodius.com [72.20.106.3]) by mx1.freebsd.org (Postfix) with ESMTP id 644ED13C457 for ; Tue, 15 Jan 2008 12:34:24 +0000 (UTC) (envelope-from jdc@parodius.com) Received: by mx01.sc1.parodius.com (Postfix, from userid 1000) id 0993D1CC074; Tue, 15 Jan 2008 04:34:24 -0800 (PST) Date: Tue, 15 Jan 2008 04:34:24 -0800 From: Jeremy Chadwick To: Johan =?iso-8859-1?Q?Str=F6m?= Message-ID: <20080115123424.GA7259@eos.sc1.parodius.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.16 (2007-06-09) Cc: emj@emj.se, freebsd-stable@freebsd.org Subject: Re: Backup solution suggestions X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2008 12:34:24 -0000 On Tue, Jan 15, 2008 at 10:52:56AM +0100, Johan Ström wrote: > I'm looking to invest in some new hardware for backup. probably some kind > of NAS (a 4-disk 1U NAS or something in that size). The thing is that I > won't be the only one with access to this box, thus I would like to secure > my data. In my experience, your best bet when it comes to backups like what you want (1U box with 4 disks, or a 2U box with 8 or more) is to simply buy a server with the specifications you want, and run FreeBSD on it. I cannot recommend commercial products for something of this "scale" (e.g. small/medium). I could list off all the reasons why [as a small hosting provider] I avoid proprietary backup solutions, but the list is quite long. The two main reasons: 1) Proprietary solutions often use proprietary hardware. How do you know what's inside of that mystery box? What if it uses a SATA controller you know has h/w-level bugs in it? What if something in the device fails; are you going to be charged an arm and a leg for a replacement part? Does it even HAVE user-servicable parts? etc... I feel much more confident relying on hardware that I'm familiar with, e.g. I know what motherboard is in the server I buy or build, I know who makes it, I know if it's compatible with FreeBSD or Linux, I know the SATA controller works and isn't flaky, I know the SATA backplane actually works properly and supports hot-swapping, and I know if I need replacement parts I can get them promptly. Also, if the h/w I buy turns out to have compatibility problems or performance issues, I can always return it, get my money back, and try other h/w; with a proprietary solution you're "stuck with it", and if something's broken about it which the vendor can't/won't fix, you're screwed. 2) Proprietary solutions also means proprietary software. This is pretty much guaranteed regardless of what h/w is used. What if the volume manager used for your array has a bug and your data is corrupt? You have no way of really "knowing" this until it's too late, and you only have one person to turn to: the vendor. I prefer to have freedom of choice when it comes to backup methods. "Hmm, dump/restore isn't working out very well, so maybe I'll try ZFS, or bacula, or tar over NFS, or rsync, or...". Here's a little story in regards to "enterprise-ready" solutions, in case you're considering a piece of hardware intended for backups that was engineered by a vendor/manufacturer and is intended for corporations or very large businesses: At my workplace we have 12 very expensive TDM-to-VoIP converters used by the core of our telco platform. After a few weeks of the first couple being installed in our production environment, we began seeing what appeared to be large amounts of packet loss coming from the actual Ethernet interfaces on the VoIP translator cards. We spent ~3 months working with the manufacturer/vendor to try and find out why this was happening, and more importantly, was the problem affecting actual VoIP traffic in any way (garbled audio, jitter, etc.)? The vendor focused on two things: possible configuration problem, and faulty hardware. Replacing the translator cards with new ones (newer firmware, etc.) did not fix the problem. Our configuration was also fine. A few months later, we finally got an answer from someone who was very low-level (a hardware engineer): the Broadcom Ethernet ICs used in the translator cards were known to drop packets under certain circumstances. The vendor claimed to have communicated with Broadcom over the problem, but Broadcom had was not very forthcoming with details of the issue, and pretty much stonewalled them in the end (that's how I interpret it, anyways). Thus, our vendor didn't know what to tell us, because "their hands were tied" (yet they engineered the hardware. Hmm...) The recommended solution? "Yeah, um, so you should buy our new translator cards! They use a different vendors' IC which don't have that problem!" > What I would like is encryption both for the transfer to the box, and > encrypted on disk. The data on disk should not be readable by anyone but me > (ie the other user(s) of the box should not be able to read it, at least > not without a big effort). I'm curious what the reason is for on-disk encryption? Is it necessary for something *only you* will have access to? What's the concern here? > So, I'm wondering what the best solution might be.. Tar'balling all my > stuff and encrypt it with GPG or something and just dump it there with NFS > would be the easiest solution, but maybe not the best. I've been thinking > about running a GELI image on my box, and store that on the NAS over NFS.. > would that be doable/secure/stable? I would recommend avoiding NFS unless the machine you're running nfsd/mountd/portmap on has no direct way to talk to the Internet. It's impossible to get NFS-related daemons to bind solely to one IP/interface on FreeBSD, which imposes a security risk. If the machine is behind NAT, you're very likely safe (unless the public has some way of accessing another machine on that NAT network). Thus, if you choose to go the NFS route, have it on a segregated network. That said -- what we use in our production environment is dump/restore over SSH over a dedicated LAN. I wrote a series of scripts that do this, using SSH keys for the SSH portion. Incrementals are done 6 days a week, with fulls done once a week. Does it work? Yes. Have I had to restore from it? Yes, twice. Did it work OK? Yes, but was not as simple as "restore the backup to this disk, throw the disk in the server, and voila FreeBSD is back up and running". It's more of "replace the disk, install FreeBSD on it, configure the box like before, then restore the user data..." Once all of our systems are running RELENG_7, I plan on utilising ZFS heavily. ZFS offers backup/restore capability, including over a network, and it's very fast. Now if only installing FreeBSD onto ZFS was made simple, ditto with booting off of ZFS... Now, on a personal level -- I do backups at home too. My home system has 4 disks in it -- one for the OS (UFS2), one for backups (UFS2), and two for a ZFS RAID-0-like volume. For the OS disk and filesystems (e.g. / /var /usr /tmp /home), I use rsync. For the ZFS volume, I use ZFS snapshots in an incremental fashion (6 days of incrementals, 1 day of full) and do "zfs send {volume} > /backup_disk/volume.X" to do the backups. In case you're wondering about how long they all take and how much data is backed up, here's some times of full level 0 backups: ==> Backing up / to /backups/rootfs/ (method: rsync) ==> Start time: Sun Jan 13 02:45:01 PST 2008 ==> End time: Sun Jan 13 02:45:01 PST 2008 ==> Backing up /var to /backups/var/ (method: rsync) ==> Start time: Sun Jan 13 02:45:01 PST 2008 ==> End time: Sun Jan 13 02:45:06 PST 2008 ==> Backing up /usr to /backups/usr/ (method: rsync) ==> Start time: Sun Jan 13 02:45:06 PST 2008 ==> End time: Sun Jan 13 02:46:03 PST 2008 ==> Backing up /home to /backups/home/ (method: rsync) ==> Start time: Sun Jan 13 02:46:03 PST 2008 ==> End time: Sun Jan 13 02:46:03 PST 2008 ==> Backing up storage to /backups/storage.zfs.%%% (method: zfs) ==> Start time: Sun Jan 13 02:46:03 PST 2008 ==> End time: Sun Jan 13 03:29:33 PST 2008 Filesystem 1024-blocks Used Avail Capacity Mounted on /dev/ad8s1a 507630 211410 255610 45% / /dev/ad8s1d 8122126 108502 7363854 1% /var /dev/ad8s1e 4058062 420 3732998 0% /tmp /dev/ad8s1f 32494668 2023282 27871814 7% /usr /dev/ad8s1g 139955812 11640 128747708 0% /home /dev/ad10s1d 473009638 146843210 288325658 34% /backups storage 957526016 124001408 833524608 13% /storage And here's what you see on /backups: total 144005480 drwxr-xr-x 6 root wheel 512 16 Oct 10:08 home/ drwxr-xr-x 24 root wheel 512 13 Jan 23:49 rootfs/ -rw-r--r-- 1 root wheel 126996957624 13 Jan 03:29 storage.zfs.0 -rw-r--r-- 1 root wheel 747136 14 Jan 02:46 storage.zfs.1 -rw-r--r-- 1 root wheel 541937432 15 Jan 02:45 storage.zfs.2 -rw-r--r-- 1 root wheel 4408684056 9 Jan 02:46 storage.zfs.3 -rw-r--r-- 1 root wheel 4716827040 10 Jan 02:47 storage.zfs.4 -rw-r--r-- 1 root wheel 5362108640 11 Jan 02:47 storage.zfs.5 -rw-r--r-- 1 root wheel 5362108640 12 Jan 02:47 storage.zfs.6 drwxr-xr-x 17 root wheel 512 1 Dec 09:06 usr/ drwxr-xr-x 23 root wheel 512 6 Jan 01:36 var/ For the ZFS incremental storage.zfs.2 (541MB of data), the time was very quick (9 seconds) ==> Backing up storage to /backups/storage.zfs.%%% (method: zfs) ==> Start time: Tue Jan 15 02:45:26 PST 2008 ==> End time: Tue Jan 15 02:45:35 PST 2008 I have dump/restore on UFS2 via ssh times if you want them as well. They're not pretty. > Another idea would be to go with some regular 1U box running some FBSD, > doing scp to the box and geli local on the box but that would require me to > have the encryption keys on that box (which would be shared so thus no good > idea). I would recommend going this route, at least in regards to the 1U box running FreeBSD. See above comment about GELI. scp to the box would be fine; why does this part worry you? > Any other ideas? Being able to rsync to the backup storage instead of just > sending big encrypted tarballs would be very nice (and I guess that would > be possible with geli version) See above, re: why is encryption needed? -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |