From owner-freebsd-fs@freebsd.org  Tue May 17 10:47:00 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 78943B3F475
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 10:47:00 +0000 (UTC)
 (envelope-from ronald-lists@klop.ws)
Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl
 [195.190.28.81]) (using TLSv1 with cipher AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 408231979
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 10:47:00 +0000 (UTC)
 (envelope-from ronald-lists@klop.ws)
Received: from smtp.greenhost.nl ([213.108.104.138])
 by smarthost1.greenhost.nl with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16)
 (Exim 4.72) (envelope-from <ronald-lists@klop.ws>)
 id 1b2cWg-0001vj-1L; Tue, 17 May 2016 12:46:58 +0200
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: "FreeBSD Filesystems" <freebsd-fs@freebsd.org>, "Rainer Duffner"
 <rainer@ultra-secure.de>
Subject: Re: zfs receive stalls whole system
References: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
 <op.yhlr40k3kndu52@ronaldradial.radialsg.local>
Date: Tue, 17 May 2016 12:46:56 +0200
MIME-Version: 1.0
Content-Transfer-Encoding: Quoted-Printable
From: "Ronald Klop" <ronald-lists@klop.ws>
Message-ID: <op.yhlr8ifwkndu52@ronaldradial.radialsg.local>
In-Reply-To: <op.yhlr40k3kndu52@ronaldradial.radialsg.local>
User-Agent: Opera Mail/1.0 (Win32)
X-Authenticated-As-Hash: 398f5522cb258ce43cb679602f8cfe8b62a256d1
X-Virus-Scanned: by clamav at smarthost1.samage.net
X-Spam-Level: /
X-Spam-Score: -0.2
X-Spam-Status: No, score=-0.2 required=5.0 tests=ALL_TRUSTED,
 BAYES_50 autolearn=disabled version=3.4.0
X-Scan-Signature: a9e4b997d6a751f3e45cb47a3c2b1d2c
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 10:47:00 -0000

On Tue, 17 May 2016 12:44:50 +0200, Ronald Klop <ronald-lists@klop.ws>  =

wrote:

> On Tue, 17 May 2016 01:07:24 +0200, Rainer Duffner  =

> <rainer@ultra-secure.de> wrote:
>
>> Hi,
>>
>> I have two servers, that were running FreeBSD 10.1-AMD64 for a long  =

>> time, one zfs-sending to the other (via zxfer). Both are NFS-servers =
 =

>> and MySQL-slaves, the sender is actively used as NFS-server, the  =

>> recipient is just a warm-standby, in case something serious happens a=
nd  =

>> we don=E2=80=99t want to wait for a day until the restore is back in =
place. The  =

>> MySQL-Slaves are actively used as read-only servers (at the applicati=
on  =

>> level, Python=E2=80=99s SQL-Alchemy does that, apparently).
>>
>> They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think=
  =

>> one has 144, the other has 192).
>> While they were running 10.1, they used HP P420 RAID-controllers with=
  =

>> individual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs.
>> I use zfsnap to do hourly, daily and weekly snapshots.
>>
>> Sending worked well, especially after updating to 10.1
>>
>> Because the storage was over 90% full (and I really hate this  =

>> RAID0-business we have with the HP RAID controllers), I rebuilt the  =

>> servers with HPs OEMed H220/221 controllers (LSI 2308 in disguise) an=
d  =

>> an external disk shelf, hosting 12 additional disks was added- and I =
 =

>> upgraded to FreeBSD 10.3.
>> Because we didn=E2=80=99t want to throw out the original disks, but i=
ncrease  =

>> available space a lot, the new disks are double the size of the  =

>> original disks (600 vs. 1200 GB SAS).
>> I also created GPT-partitions on the disks and labeled them according=
  =

>> to the disk=E2=80=99s position in the cages/shelf, created the pools =
with the  =

>> got-partition-names instead of the daX-names.
>>
>> Now, when I do a zxfer, sometimes the whole system stalls while the  =

>> data is sent over, especially if the delta is large or if something  =

>> else is reading from the disk at the same time (backup agent).
>>
>> I had this before, on 10.0 (I believe, we didn=E2=80=99t have this in=
 9.1  =

>> either, IIRC) and it went away in 10.1.
>>
>> It=E2=80=99s very difficult (well, impossible) to debug, because the =
system  =

>> totally hangs and doesn=E2=80=99t accept any keypresses.
>>
>> Would a ZIL help in this case?
>> I always thought that NFS was the only thing that did SYNC writes=E2=80=
=A6
>
> Databases love SYNC writes too. (But that doesn't say anything about t=
he  =

> unresponsive system).
> I think there is a statistic somewhere in FreeBSD to analyze the sync =
vs  =

> async writes and decide if a ZIL will help or not. (But that doesn't s=
ay  =

> anything about the unresponsive system either).
>
> Ronald.

One question. You did not enable dedup(lication)?

Ronald.