Date: Tue, 17 May 2016 12:46:56 +0200 From: "Ronald Klop" <ronald-lists@klop.ws> To: "FreeBSD Filesystems" <freebsd-fs@freebsd.org>, "Rainer Duffner" <rainer@ultra-secure.de> Subject: Re: zfs receive stalls whole system Message-ID: <op.yhlr8ifwkndu52@ronaldradial.radialsg.local> In-Reply-To: <op.yhlr40k3kndu52@ronaldradial.radialsg.local> References: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de> <op.yhlr40k3kndu52@ronaldradial.radialsg.local>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 17 May 2016 12:44:50 +0200, Ronald Klop <ronald-lists@klop.ws> wrote: > On Tue, 17 May 2016 01:07:24 +0200, Rainer Duffner > <rainer@ultra-secure.de> wrote: > >> Hi, >> >> I have two servers, that were running FreeBSD 10.1-AMD64 for a long >> time, one zfs-sending to the other (via zxfer). Both are NFS-servers >> and MySQL-slaves, the sender is actively used as NFS-server, the >> recipient is just a warm-standby, in case something serious happens and >> we don’t want to wait for a day until the restore is back in place. The >> MySQL-Slaves are actively used as read-only servers (at the application >> level, Python’s SQL-Alchemy does that, apparently). >> >> They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think >> one has 144, the other has 192). >> While they were running 10.1, they used HP P420 RAID-controllers with >> individual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs. >> I use zfsnap to do hourly, daily and weekly snapshots. >> >> Sending worked well, especially after updating to 10.1 >> >> Because the storage was over 90% full (and I really hate this >> RAID0-business we have with the HP RAID controllers), I rebuilt the >> servers with HPs OEMed H220/221 controllers (LSI 2308 in disguise) and >> an external disk shelf, hosting 12 additional disks was added- and I >> upgraded to FreeBSD 10.3. >> Because we didn’t want to throw out the original disks, but increase >> available space a lot, the new disks are double the size of the >> original disks (600 vs. 1200 GB SAS). >> I also created GPT-partitions on the disks and labeled them according >> to the disk’s position in the cages/shelf, created the pools with the >> got-partition-names instead of the daX-names. >> >> Now, when I do a zxfer, sometimes the whole system stalls while the >> data is sent over, especially if the delta is large or if something >> else is reading from the disk at the same time (backup agent). >> >> I had this before, on 10.0 (I believe, we didn’t have this in 9.1 >> either, IIRC) and it went away in 10.1. >> >> It’s very difficult (well, impossible) to debug, because the system >> totally hangs and doesn’t accept any keypresses. >> >> Would a ZIL help in this case? >> I always thought that NFS was the only thing that did SYNC writes… > > Databases love SYNC writes too. (But that doesn't say anything about the > unresponsive system). > I think there is a statistic somewhere in FreeBSD to analyze the sync vs > async writes and decide if a ZIL will help or not. (But that doesn't say > anything about the unresponsive system either). > > Ronald. One question. You did not enable dedup(lication)? Ronald.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?op.yhlr8ifwkndu52>
