From owner-freebsd-stable@FreeBSD.ORG Tue Jan 15 13:00:50 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E1DA216A418 for ; Tue, 15 Jan 2008 13:00:49 +0000 (UTC) (envelope-from johan@stromnet.se) Received: from core.stromnet.se (core.stromnet.se [83.218.84.131]) by mx1.freebsd.org (Postfix) with ESMTP id 4B5C813C461 for ; Tue, 15 Jan 2008 13:00:49 +0000 (UTC) (envelope-from johan@stromnet.se) Received: from localhost (unknown [83.218.84.135]) by core.stromnet.se (Postfix) with ESMTP id E20D0D46403; Tue, 15 Jan 2008 14:00:47 +0100 (CET) X-Virus-Scanned: amavisd-new at stromnet.se Received: from core.stromnet.se ([83.218.84.131]) by localhost (core.stromnet.se [83.218.84.135]) (amavisd-new, port 10024) with ESMTP id kece0qDrD3yG; Tue, 15 Jan 2008 14:00:44 +0100 (CET) Received: from [172.28.1.102] (90-224-172-102-no129.tbcn.telia.com [90.224.172.102]) by core.stromnet.se (Postfix) with ESMTP id B5B68D46406; Tue, 15 Jan 2008 14:00:44 +0100 (CET) In-Reply-To: <20080115123424.GA7259@eos.sc1.parodius.com> References: <20080115123424.GA7259@eos.sc1.parodius.com> Mime-Version: 1.0 (Apple Message framework v753) Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Message-Id: <06DAF546-AB57-4B1D-89CE-3DCF66678A2C@stromnet.se> Content-Transfer-Encoding: quoted-printable From: =?ISO-8859-1?Q?Johan_Str=F6m?= Date: Tue, 15 Jan 2008 14:00:22 +0100 To: Jeremy Chadwick X-Mailer: Apple Mail (2.753) Cc: emj@emj.se, freebsd-stable@freebsd.org Subject: Re: Backup solution suggestions X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2008 13:00:50 -0000 First of all, thanks for your extensive answer! On Jan 15, 2008, at 13:34 , Jeremy Chadwick wrote: > On Tue, Jan 15, 2008 at 10:52:56AM +0100, Johan Str=F6m wrote: >> I'm looking to invest in some new hardware for backup. probably =20 >> some kind >> of NAS (a 4-disk 1U NAS or something in that size). The thing is =20 >> that I >> won't be the only one with access to this box, thus I would like =20 >> to secure >> my data. > > In my experience, your best bet when it comes to backups like what you > want (1U box with 4 disks, or a 2U box with 8 or more) is to simply =20= > buy > a server with the specifications you want, and run FreeBSD on it. I > cannot recommend commercial products for something of this =20 > "scale" (e.g. > small/medium). > > I could list off all the reasons why [as a small hosting provider] I > avoid proprietary backup solutions, but the list is quite long. The > two main reasons: > > 1) Proprietary solutions often use proprietary hardware. How do you > know what's inside of that mystery box? What if it uses a SATA > controller you know has h/w-level bugs in it? What if something in =20= > the > device fails; are you going to be charged an arm and a leg for a > replacement part? Does it even HAVE user-servicable parts? etc... > > I feel much more confident relying on hardware that I'm familiar with, > e.g. I know what motherboard is in the server I buy or build, I =20 > know who > makes it, I know if it's compatible with FreeBSD or Linux, I know the > SATA controller works and isn't flaky, I know the SATA backplane > actually works properly and supports hot-swapping, and I know if I =20 > need > replacement parts I can get them promptly. Also, if the h/w I buy =20 > turns > out to have compatibility problems or performance issues, I can always > return it, get my money back, and try other h/w; with a proprietary > solution you're "stuck with it", and if something's broken about it > which the vendor can't/won't fix, you're screwed. > > 2) Proprietary solutions also means proprietary software. This is > pretty much guaranteed regardless of what h/w is used. What if the > volume manager used for your array has a bug and your data is > corrupt? You have no way of really "knowing" this until it's too =20 > late, > and you only have one person to turn to: the vendor. All good points there, cannot argue against that. Certainly something =20= to think about before doing any purchases. The only thing against =20 that right now is size (we've got "cheap" access to a rack with =20 limited depth), havent realy found any good 1U chassis that arent to =20 deep. Admittedly I haven't spent veery much time looking yet but.. :) > > I prefer to have freedom of choice when it comes to backup methods. > "Hmm, dump/restore isn't working out very well, so maybe I'll try ZFS, > or bacula, or tar over NFS, or rsync, or...". > >> What I would like is encryption both for the transfer to the box, and >> encrypted on disk. The data on disk should not be readable by =20 >> anyone but me >> (ie the other user(s) of the box should not be able to read it, at =20= >> least >> not without a big effort). > > I'm curious what the reason is for on-disk encryption? Is it =20 > necessary > for something *only you* will have access to? What's the concern =20 > here? I think I wrote that I *wont* be the only one with access to the box. =20= Sorry if that wasn't clear. It will be shared with a friend (or rather his company) of mine. I do =20= trust him, but to keep some level of security I don't want him (or =20 rather, someone with access to his box) to be able to read my files =20 (and the other way arround for his files). > >> So, I'm wondering what the best solution might be.. Tar'balling =20 >> all my >> stuff and encrypt it with GPG or something and just dump it there =20 >> with NFS >> would be the easiest solution, but maybe not the best. I've been =20 >> thinking >> about running a GELI image on my box, and store that on the NAS =20 >> over NFS.. >> would that be doable/secure/stable? > > I would recommend avoiding NFS unless the machine you're running > nfsd/mountd/portmap on has no direct way to talk to the Internet. =20 > It's > impossible to get NFS-related daemons to bind solely to one IP/=20 > interface > on FreeBSD, which imposes a security risk. If the machine is behind > NAT, you're very likely safe (unless the public has some way of > accessing another machine on that NAT network). Thus, if you =20 > choose to > go the NFS route, have it on a segregated network. The box will be on a separate LAN only accessible by our two boxes. =20 No internet connectivity. But the client boxes ofcourse have internet =20= connectivty (but that would only be NFS clients, not servers). > > That said -- what we use in our production environment is dump/restore > over SSH over a dedicated LAN. I wrote a series of scripts that do > this, using SSH keys for the SSH portion. Incrementals are done 6 =20 > days > a week, with fulls done once a week. I use a similar scheme now, using BackupPC. However that is to my box =20= at home which is not a very good solution due to bandwidth =20 limitations (5MBit only).. The first copy takes ages, the incremental =20= ones not as much.. It's around 20-30GB of data currently. The NAS/=20 backup box would be located on an 100MBit/1000MBit unmetered link. > > Does it work? Yes. Have I had to restore from it? Yes, twice. =20 > Did it > work OK? Yes, but was not as simple as "restore the backup to this > disk, throw the disk in the server, and voila FreeBSD is back up and > running". It's more of "replace the disk, install FreeBSD on it, > configure the box like before, then restore the user data..." > > Once all of our systems are running RELENG_7, I plan on utilising ZFS > heavily. ZFS offers backup/restore capability, including over a > network, and it's very fast. Now if only installing FreeBSD onto ZFS > was made simple, ditto with booting off of ZFS... > > Now, on a personal level -- I do backups at home too. My home system > has 4 disks in it -- one for the OS (UFS2), one for backups (UFS2), =20= > and > two for a ZFS RAID-0-like volume. > > For the OS disk and filesystems (e.g. / /var /usr /tmp /home), I use > rsync. For the ZFS volume, I use ZFS snapshots in an incremental > fashion (6 days of incrementals, 1 day of full) and do "zfs send > {volume} > /backup_disk/volume.X" to do the backups. > > In case you're wondering about how long they all take and how much =20 > data > is backed up, here's some times of full level 0 backups: > > =3D=3D> Backing up / to /backups/rootfs/ (method: rsync) > =3D=3D> Start time: Sun Jan 13 02:45:01 PST 2008 > =3D=3D> End time: Sun Jan 13 02:45:01 PST 2008 > =3D=3D> Backing up /var to /backups/var/ (method: rsync) > =3D=3D> Start time: Sun Jan 13 02:45:01 PST 2008 > =3D=3D> End time: Sun Jan 13 02:45:06 PST 2008 > =3D=3D> Backing up /usr to /backups/usr/ (method: rsync) > =3D=3D> Start time: Sun Jan 13 02:45:06 PST 2008 > =3D=3D> End time: Sun Jan 13 02:46:03 PST 2008 > =3D=3D> Backing up /home to /backups/home/ (method: rsync) > =3D=3D> Start time: Sun Jan 13 02:46:03 PST 2008 > =3D=3D> End time: Sun Jan 13 02:46:03 PST 2008 > =3D=3D> Backing up storage to /backups/storage.zfs.%%% (method: zfs) > =3D=3D> Start time: Sun Jan 13 02:46:03 PST 2008 > =3D=3D> End time: Sun Jan 13 03:29:33 PST 2008 > > Filesystem 1024-blocks Used Avail Capacity Mounted on > /dev/ad8s1a 507630 211410 255610 45% / > /dev/ad8s1d 8122126 108502 7363854 1% /var > /dev/ad8s1e 4058062 420 3732998 0% /tmp > /dev/ad8s1f 32494668 2023282 27871814 7% /usr > /dev/ad8s1g 139955812 11640 128747708 0% /home > /dev/ad10s1d 473009638 146843210 288325658 34% /backups > storage 957526016 124001408 833524608 13% /storage > > And here's what you see on /backups: > > total 144005480 > drwxr-xr-x 6 root wheel 512 16 Oct 10:08 home/ > drwxr-xr-x 24 root wheel 512 13 Jan 23:49 rootfs/ > -rw-r--r-- 1 root wheel 126996957624 13 Jan 03:29 =20 > storage.zfs.0 > -rw-r--r-- 1 root wheel 747136 14 Jan 02:46 =20 > storage.zfs.1 > -rw-r--r-- 1 root wheel 541937432 15 Jan 02:45 =20 > storage.zfs.2 > -rw-r--r-- 1 root wheel 4408684056 9 Jan 02:46 =20 > storage.zfs.3 > -rw-r--r-- 1 root wheel 4716827040 10 Jan 02:47 =20 > storage.zfs.4 > -rw-r--r-- 1 root wheel 5362108640 11 Jan 02:47 =20 > storage.zfs.5 > -rw-r--r-- 1 root wheel 5362108640 12 Jan 02:47 =20 > storage.zfs.6 > drwxr-xr-x 17 root wheel 512 1 Dec 09:06 usr/ > drwxr-xr-x 23 root wheel 512 6 Jan 01:36 var/ > > For the ZFS incremental storage.zfs.2 (541MB of data), the time was =20= > very > quick (9 seconds) > > =3D=3D> Backing up storage to /backups/storage.zfs.%%% (method: zfs) > =3D=3D> Start time: Tue Jan 15 02:45:26 PST 2008 > =3D=3D> End time: Tue Jan 15 02:45:35 PST 2008 > > I have dump/restore on UFS2 via ssh times if you want them as well. > They're not pretty. ZFS is indeed very nice, I'm running it at home for a not-so-=20 important server.. I love it! Have been working without a single =20 hickup since I started using it (end of November). We've been thinking of doing using a fbsd machine with ZFS, but the =20 dump/restore scheme wouldnt help us since the machines beeing backupd =20= doesnt run ZFS (didnt exist on Fbsd/wasnt stable enough when those =20 where setup). So relying on ZFS's dump/restore for the backupee-=20 >backup box is, I'm afraid, not an option. However the snapshots =20 could ofcourse be usable on the backup box, ie copying the files =20 first time, creating a snapshot, rsyncing new versions, new shapshot =20 & new rsync and so on, if I've understood the snapshots correct =20 (havent played with them very much yet). However this wont work either, or at least probably not very =20 effective since the data should be encrypted and not in plaintext. > >> Another idea would be to go with some regular 1U box running some =20 >> FBSD, >> doing scp to the box and geli local on the box but that would =20 >> require me to >> have the encryption keys on that box (which would be shared so =20 >> thus no good >> idea). > > I would recommend going this route, at least in regards to the 1U box > running FreeBSD. See above comment about GELI. scp to the box =20 > would be > fine; why does this part worry you? Well, explained above, I *wont* be the only one with access to it. > >> Any other ideas? Being able to rsync to the backup storage instead =20= >> of just >> sending big encrypted tarballs would be very nice (and I guess =20 >> that would >> be possible with geli version) > > See above, re: why is encryption needed? > Above again. Again, thanks you very much for all your time and thoughts, very much =20= appreciated! -- Johan=