From owner-freebsd-fs@freebsd.org Tue May 17 10:47:00 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 78943B3F475 for ; Tue, 17 May 2016 10:47:00 +0000 (UTC) (envelope-from ronald-lists@klop.ws) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.81]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 408231979 for ; Tue, 17 May 2016 10:47:00 +0000 (UTC) (envelope-from ronald-lists@klop.ws) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1b2cWg-0001vj-1L; Tue, 17 May 2016 12:46:58 +0200 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: "FreeBSD Filesystems" , "Rainer Duffner" Subject: Re: zfs receive stalls whole system References: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de> Date: Tue, 17 May 2016 12:46:56 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: Quoted-Printable From: "Ronald Klop" Message-ID: In-Reply-To: User-Agent: Opera Mail/1.0 (Win32) X-Authenticated-As-Hash: 398f5522cb258ce43cb679602f8cfe8b62a256d1 X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: -0.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=ALL_TRUSTED, BAYES_50 autolearn=disabled version=3.4.0 X-Scan-Signature: a9e4b997d6a751f3e45cb47a3c2b1d2c X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 May 2016 10:47:00 -0000 On Tue, 17 May 2016 12:44:50 +0200, Ronald Klop = wrote: > On Tue, 17 May 2016 01:07:24 +0200, Rainer Duffner = > wrote: > >> Hi, >> >> I have two servers, that were running FreeBSD 10.1-AMD64 for a long = >> time, one zfs-sending to the other (via zxfer). Both are NFS-servers = = >> and MySQL-slaves, the sender is actively used as NFS-server, the = >> recipient is just a warm-standby, in case something serious happens a= nd = >> we don=E2=80=99t want to wait for a day until the restore is back in = place. The = >> MySQL-Slaves are actively used as read-only servers (at the applicati= on = >> level, Python=E2=80=99s SQL-Alchemy does that, apparently). >> >> They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think= = >> one has 144, the other has 192). >> While they were running 10.1, they used HP P420 RAID-controllers with= = >> individual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs. >> I use zfsnap to do hourly, daily and weekly snapshots. >> >> Sending worked well, especially after updating to 10.1 >> >> Because the storage was over 90% full (and I really hate this = >> RAID0-business we have with the HP RAID controllers), I rebuilt the = >> servers with HPs OEMed H220/221 controllers (LSI 2308 in disguise) an= d = >> an external disk shelf, hosting 12 additional disks was added- and I = = >> upgraded to FreeBSD 10.3. >> Because we didn=E2=80=99t want to throw out the original disks, but i= ncrease = >> available space a lot, the new disks are double the size of the = >> original disks (600 vs. 1200 GB SAS). >> I also created GPT-partitions on the disks and labeled them according= = >> to the disk=E2=80=99s position in the cages/shelf, created the pools = with the = >> got-partition-names instead of the daX-names. >> >> Now, when I do a zxfer, sometimes the whole system stalls while the = >> data is sent over, especially if the delta is large or if something = >> else is reading from the disk at the same time (backup agent). >> >> I had this before, on 10.0 (I believe, we didn=E2=80=99t have this in= 9.1 = >> either, IIRC) and it went away in 10.1. >> >> It=E2=80=99s very difficult (well, impossible) to debug, because the = system = >> totally hangs and doesn=E2=80=99t accept any keypresses. >> >> Would a ZIL help in this case? >> I always thought that NFS was the only thing that did SYNC writes=E2=80= =A6 > > Databases love SYNC writes too. (But that doesn't say anything about t= he = > unresponsive system). > I think there is a statistic somewhere in FreeBSD to analyze the sync = vs = > async writes and decide if a ZIL will help or not. (But that doesn't s= ay = > anything about the unresponsive system either). > > Ronald. One question. You did not enable dedup(lication)? Ronald.