Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 May 2016 01:07:24 +0200
From:      Rainer Duffner <rainer@ultra-secure.de>
To:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   zfs receive stalls whole system
Message-ID:  <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>

next in thread | raw e-mail | index | archive | help
Hi,

I have two servers, that were running FreeBSD 10.1-AMD64 for a long =
time, one zfs-sending to the other (via zxfer). Both are NFS-servers and =
MySQL-slaves, the sender is actively used as NFS-server, the recipient =
is just a warm-standby, in case something serious happens and we don=E2=80=
=99t want to wait for a day until the restore is back in place. The =
MySQL-Slaves are actively used as read-only servers (at the application =
level, Python=E2=80=99s SQL-Alchemy does that, apparently).

They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think =
one has 144, the other has 192).
While they were running 10.1, they used HP P420 RAID-controllers with =
individual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs.
I use zfsnap to do hourly, daily and weekly snapshots.

Sending worked well, especially after updating to 10.1

Because the storage was over 90% full (and I really hate this =
RAID0-business we have with the HP RAID controllers), I rebuilt the =
servers with HPs OEMed H220/221 controllers (LSI 2308 in disguise) and =
an external disk shelf, hosting 12 additional disks was added- and I =
upgraded to FreeBSD 10.3.
Because we didn=E2=80=99t want to throw out the original disks, but =
increase available space a lot, the new disks are double the size of the =
original disks (600 vs. 1200 GB SAS).=20
I also created GPT-partitions on the disks and labeled them according to =
the disk=E2=80=99s position in the cages/shelf, created the pools with =
the got-partition-names instead of the daX-names.

Now, when I do a zxfer, sometimes the whole system stalls while the data =
is sent over, especially if the delta is large or if something else is =
reading from the disk at the same time (backup agent).

I had this before, on 10.0 (I believe, we didn=E2=80=99t have this in =
9.1 either, IIRC) and it went away in 10.1.

It=E2=80=99s very difficult (well, impossible) to debug, because the =
system totally hangs and doesn=E2=80=99t accept any keypresses.

Would a ZIL help in this case?
I always thought that NFS was the only thing that did SYNC writes=E2=80=A6=









Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0C2233A9-C64A-4773-ABA5-C0BCA0D037F0>