Date: Thu, 30 Oct 2008 15:34:47 -0700 From: Freddie Cash <fjwcash@gmail.com> To: freebsd-stable@freebsd.org Subject: Re: ZFS Message-ID: <200810301534.47326.fjwcash@gmail.com> In-Reply-To: <43E87CCF-6D36-4F82-BF54-7B705CB1EFB5@yellowspace.net> References: <FFF7941F7B184445881228ABAD4494B34E7345@intsika.ct.esn.org.za> <200810220838.45900.fjwcash@gmail.com> <43E87CCF-6D36-4F82-BF54-7B705CB1EFB5@yellowspace.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On October 30, 2008 02:55 pm Lorenzo Perone wrote: > On 22.10.2008, at 17:38, Freddie Cash wrote: > > Personally, we use it in production for a remote backup box using > > ZFS and Rsync (64-bit FreeBSD 7-Stable from August, 2x dual-core > > Opteron 2200s, 8 GB DDR2 RAM, 24x 500 GB SATA disks attached to two > > 3Ware 9650/9550 controllers as single-disks). Works beautifully, > > backing up 80 FreeBSD and Debian Linux servers every night, creating > > snapshots with each run. > > Restoring files from an arbitrary day is as simple as navigating to > > the needed .zfs/snapshot/<snapname>/<path>/ and scping the file to > > wherever. > > And full system restores are as simple as "boot livecd, partition/ > > format disks, run rsync". > So your system doesn't suffer panics and/or deadlocks, or you just > cope with them as "collateral damage" (which, admitted, is less of > a problem with a logging fs)? Back in August, when we first started the implementation, we deadlocked it once or twice a week. By the time it went live in September, we deadlocked it several times a week, but that turned out to be due to the CPU/RAM in the production machine not being able to keep up with even 20 rsync runs (2x Opeteron 200, 8 GB DDR1-SDRAM). We moved the harddrives over to another system with the Opteron 2200s (similar to the testing machine) and DDR2-SDRAM and have only deadlocked it 2x in 6 weeks. Through all that, we noticed a pattern: if we had more than 4 rsyncs running that were doing straight copies (ie added new servers to the backup, and this was their first run), then the server would deadlock. Had to be power-cycled. But if the rsyncs are doing mostly incremental updates (file compares, updating changed files, writing new files), then we can run with all 80 without issues. So, we've taken to only adding 1 new server at a time to the backup process, and waiting for it to fully sync to the backup server before adding the next one. (We stop the rsync run at 6:50am, so some servers can take up to three days for the initial sync to complete, as the remote end only has 768 Kbps ADSL upload speeds.) It's only been up for a week now since the last deadlock, but now that we've discovered the issue (too many writes from too many rsyncs simultaneously), we think it will be a lot longer until the next one. :) We're anxiously awaiting the release of 8.0, with the much expanded kmem_max, so we can put 16 GB of RAM in here, give 4 GB to the ARC, and give the rest to rsync, which should speed things up and stabilise it more. > If that's the case, would you share the details about what you're using > on that machine (RELENG_7?, 7_0? HEAD?) and which patches > /knobs You used? I have a similar setup on a host which > backs up way fewer machines and locks up every... 3-9 weeks or so. > That host only has about 2GB ram though. All the gory details follow. It's fairly long. There are no custom or extra patches installed. Hardware: Tyan h2000M motherboard (S3992) 2x Opteron 2200 CPUs (dual-core) @ 2 GHz 4x 2 GB DDR2-667 ECC SDRAM 3Ware 9550SXU-ML16 PCI-X RAID controller 3Ware 9650SE-ML12 PCIe RAID controller 12x 400 GB Seagate SATA harddrives 12x 500 GB WD SATA harddrives 2x 2 GB CompactFlash cards in CF-to-IDE adapters Chenbro 5U case with 4-way redundand PSUs and 24x hot-swappable bays All 24 harddrives are configured as "SingleDisk" arrays, which makes them appear as individual, normal drives to the OS, but allows the RAID controller to use the disk write cache and the card write cache. ad0 and ad1 are the CF cards, and are part of a gmirror (gm0) da0 through da23 are part of a raidz2 pool called "storage" / is on gm0 The following are ZFS filesystems: /usr /usr/src compressed (ljz) /usr/obj /usr/ports compressed (ljz) /usr/ports/distfiles /usr/local /home /tmp /var /storage /storage/backup compressed (gzip) swap is an 8 GB zvol ZFS recordsize is set to 64K on storage, and inherited by the rest. uname -a: FreeBSD megadrive.sd73.bc.ca 7.0-STABLE FreeBSD 7.0-STABLE #0: Tue Aug 19 10:39:29 PDT 2008 root@megadrive.sd73.bc.ca:/usr/obj/usr/src/sys/ZFSHOST amd64 /boot/loader.conf: # Loader options autoboot_delay="10" beastie_disable="NO" loader_logo="beastie" module_path="/boot/kernel" # Kernel modules to load at boot zfs_load="YES" # Kernel tunables to set at boot (mostly for ZFS tuning) # Disable DMA for the CF disks # Set kmem to 1.5 GB (the current max on amd64) # Set ZFS Adaptive Read Cache (arc) to about half of kmem (leaving half for the OS) hw.ata.ata_dma=0 kern.hz="100" vfs.zfs.arc_min="512M" vfs.zfs.arc_max="768M" vfs.zfs.prefetch_disable="1" vfs.zfs.zil_disable="0" vm.kmem_size="1596M" vm.kmem_size_max="1596M" # Devices to disable at boot (mainly ISA/non-PnP devices) hint.fd.1.disabled="1" hint.sio.0.disabled="1" hint.sio.1.disabled="1" hint.sio.2.disabled="1" hint.sio.3.disabled="1" hint.ppc.0.disabled="1" Kernel config is GENERIC minus a bunch of unneeded drivers, using SCHED_ULE. /etc/sysctl.conf: # General network settings net.isr.direct=1 # Whether to enable Direct Dispatch for netisr # IP options net.inet.ip.forwarding=0 # Whether to enable packet forwarding net.inet.ip.process_options=0 # Disable processing of IP options net.inet.ip.random_id=1 # Randomise the IP header ID number net.inet.ip.redirect=0 # Whether to allow redirect packets #net.inet.ip.stealth=0 # Whether to appear in traceroute output # ICMP options net.inet.icmp.icmplim=200 # Limit ICMP packets to this many/s net.inet.icmp.drop_redirect=1 # Drop ICMP redirect packets net.inet.icmp.log_redirect=0 # Don't log ICMP redirect packets # TCP options net.inet.tcp.blackhole=1 # Drop packets destined to unused ports net.inet.tcp.inflight.enable=1 # Use automatic TCP window-scaling net.inet.tcp.log_in_vain=0 # Don't log the blackholed packets net.inet.tcp.path_mtu_discovery=1 # Use ICMP type 3 to find the MTU to use net.inet.tcp.recvspace=131072 # Size in bytes of the receive buffer net.inet.tcp.sack.enable=1 # Enable Selective ACKs net.inet.tcp.sendspace=131072 # Size in bytes of the send buffer net.inet.tcp.syncookies=1 # Enable SYN cookie protection # UDP options net.inet.udp.blackhole=1 # Drop packets destined to unused ports net.inet.udp.checksum=1 # Enable UDP checksums net.inet.udp.log_in_vain=0 # Don't log the blackholed packets net.inet.udp.recvspace=65536 # Size in bytes of the receive buffer # Debug options debug.minidump=1 # Enable the small kernel core dump debug.mpsafevfs=1 # Enable threaded VFS subsystem # Kernel options kern.coredump=0 # Disable kernel core dumps kern.ipc.somaxconn=512 # Expand the IP listen queue kern.maxvnodes=250000 # Bump up the max number of vnodes # PCI bus options hw.pci.enable_msix=1 # Enable Message Signalled Interrupts Extended hw.pci.enable_msi=1 # Enable Message Signalled Interrupts hw.pci.enable_io_modes=1 # Enable alternate I/O access modes # Other options vfs.usermount=1 # Enable non-root users to mount filesystems -- Freddie Cash fjwcash@gmail.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200810301534.47326.fjwcash>